Convolved substructure: analytically decorrelating jet substructure observables

A number of recent applications of jet substructure, in particular searches for light new particles, require substructure observables that are decorrelated with the jet mass. In this paper we introduce the Convolved SubStructure (CSS) approach, which uses a theoretical understanding of the observable to decorrelate the complete shape of its distribution. This decorrelation is performed by convolution with a shape function whose parameters and mass dependence are derived analytically. We consider in detail the case of the D2 observable and perform an illustrative case study using a search for a light hadronically decaying Z′. We find that the CSS approach completely decorrelates the D2 observable over a wide range of masses. Our approach highlights the importance of improving the theoretical understanding of jet substructure observables to exploit increasingly subtle features for performance.


Introduction
Jet substructure is now playing a central role at the Large Hadron Collider (LHC), where it has provided a new set of powerful tools to search for physics beyond the Standard Model. For example, jet substructure tools have been used to tag highly Lorentz boosted Standard Model bosons (W/Z/H), significantly improving searches for new high mass states (see e.g. [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]). With an ever improving understanding of jet substructure observables, these tools have now also been used to search for low mass resonances by directly studying the mass distribution of the tagged jets themselves. This has been applied both to the Standard Model search for H → bb [19,20], and to searches for new light Z bosons, deriving bounds in a previously unprobed region of parameter space [21][22][23]. 1 These searches represent an impressive advance in the sophistication of jet substructure techniques.
Unlike for high mass resonance searches, these low mass searches use the mass of the jet itself. This makes it important that the jet substructure observable used for tagging is independent of the mass of the jet. Otherwise, the cut on the tagging observable can significantly distort the jet mass spectrum, making it difficult to search for resonances. This was first highlighted in [25], where a procedure, termed DDT, was introduced to decorrelate the observable from the jet mass and p T . More precisely, the DDT decorrelates the first moment of the observable. Due to the importance of this problem, several other groups have applied machine learning to develop tagging observables that are decorrelated with the jet mass and p T [26,27].
In this paper, we show how we can use an understanding of substructure observables to completely decorrelate them with the jet mass. In particular, we will show that the standard way of incorporating non-perturbative hadronization effects, namely convolution with a model shape function, motivates a simple way of performing the decorrelation: convolution with a function that maps the distribution at any mass to the distribution at a reference mass. We will call this approach to decorrelation Convolved SubStructure (CSS). 4 The CSS approach naturally preserves the domain and normalization of the tagging observable, and allows a decorrelation of the complete shape of the observable, not just the first moment. The philosophy of our approach is slightly distinct from [26,27], namely it attempts to decorrelate a given standard observable, such as D 2 [41,48], or N 2 [65], using a theoretical understanding of that particular observable, and as such, is similar in spirit to the original DDT [25]. Indeed, we will show that the first moment of our approach reproduces the DDT, and therefore the CSS approach should be thought of as a systematic generalization of the DDT beyond the first moment.
A schematic depiction of our approach is shown in figure 1. At a given mass, the distribution predicted by the factorization theorem for an observable such as D 2 is given as a convolution of a non-perturbative shape function [66][67][68][69] F NP ( ; m) which encodes the effects of hadronization, with the perturbative distribution (here and throughout the text, will denote a dimensionless convolution variable, and m denotes the mass). Both the perturbative distribution, as well as the non-perturbative shape function depend on the jet mass, and therefore both introduce correlations between the observable and the jet mass. However, with an understanding of these different functions, we can map the distribution at a given mass to a reference mass if we know both the non-perturbative shape function, F NP ( , m), as well as the mapping between the perturbative distributions, F P ( ; m 1 , m 2 ), which is a perturbatively calculable function. The end result is that we can derive a function, F CSS ( ; m 1 , m 2 ), which completely decorrelates the observable by mapping it to a reference mass point. 5 This defines the CSS decorrelated D 2 observable: (1.1) 3 By grooming we mean modified mass drop (MMDT) [38,39] or soft drop [37] groomers, which for β = 0 are equivalent. 4 We note that CSS is also the common abbreviation for the pioneers of factorization, namely Collins, Soper and Sterman [60][61][62][63][64]. We find this fitting since our approach is based on a factorized understanding of the observable. 5 Technically we map the graph (the set points {(x, f (x)) : x ∈ D}) of the observable to the graph at a reference mass point. Figure 1. The evolution of a two-prong observable, taken here to be D 2 , with the jet mass is governed by the corresponding evolution of its perturbative and non-perturbative components. Here F NP ( ; m) encodes the effects of hadronization, while F P ( ; m 1 , m 2 ) is a perturbatively calculable function describing the mapping between the perturbative distributions at the masses m 1 and m 2 (They are technically defined as convolutions in as described in the text, which has been suppressed in the figure.). By combining these mappings we can completely decorrelate the observable by mapping it to a reference mass value.
Here is a dimensionless convolution variable, and m 1 and m 2 denotes the masses that the function maps between. The exact function can be determined through an understanding of both the perturbative and non-perturbative aspects of the distribution, namely, where ⊗ denotes convolution. This combination of mappings is shown schematically in figure 1. While it is of course trivial that such a function exists, the simple structure of the observable enables us to provide a simple analytic form for the function F CSS , allowing for a fast numerical implementation, as well as an understanding of how it scales with m 1 and m 2 . Furthermore, the function F CSS can be systematically improved starting from this initial function, using an expansion in orthogonal polynomials, as developed in [69]. 6 An outline of this paper is as follows. In section 2 we discuss the sources of correlation between a two-prong substructure observable such as D 2 , and the jet mass, treating both the perturbative and non-perturbative aspects of this correlation, and we show that in both cases they can be modeled using shape functions. Furthermore, we analytically derive the mass scaling of the shape function parameters. In section 3 we discuss how we can use 6 The perturbative distribution can of course be calculated, while the non-perturbative contribution must currently be modeled. However, due to the structure of the factorization theorem for the tagging observable, one can confidently predict the scaling of the non-perturbative corrections with the jet mass (that is, their contributions to the moments of the distribution) in a systematically improvable manner, thus fixing the functional form of the jet mass dependence in the expansion of the shape function with respect to the orthogonal polynomials (specifically, generalized Laguerre polynomials).
-3 -JHEP05(2018)002 this understanding to decorrelate jet substructure observables using shape functions, and introduce the CSS approach. We then illustrate concretely how the decorrelation can be done in practice. In section 4 we perform a brief study, illustrating the effectiveness of the decorrelation procedure for Z → qq. We conclude in section 5.
2 Correlation with mass for jet substructure observables In this section we discuss the sources of correlation between a two-prong observable, such as D 2 , and the jet mass (for brevity, we will not always explicitly say groomed jet mass, although we always work with groomed observables), to illustrate how these correlations arise. In section 2.1, we discuss the dependence of non-perturbative physics on jet mass, and introduce the modeling of hadronization effects using shape functions. In section 2.2 we discuss perturbative sources of correlation, and show that they can also be well captured by a simple shape function.
In this paper we will consider the concrete example of the D 2 observable, for which a factorization formula is known [46,47]. This allows us to make precise statements about the perturbative and non-perturbative behavior of the observable. The D 2 observable is defined in terms of the energy correlation functions [70] e (β) .

(2.3)
Here R ij is the distance between particles i and j in the pseudorapidity-azimuth plane, and β > 0 is an angular weighting parameter whose typical value is β = 1 or β = 2. For notational simplicity we will often drop the angular exponent, writing the observable simply as D 2 . For a jet with two prong substructure we have D 2 1, while for a more standard QCD jet without a resolved substructure D 2 ∼ 1.

Non-perturbative effects
Jet substructure observables are sensitive to low scales within a jet, and are therefore naturally susceptible to non-perturbative effects. Non-perturbative contributions can arise both from the underlying event (UE), as well as from the standard hadronization process within the jet. In [46], it was shown that due to the grooming procedure, non-perturbative effects from the underlying event are negligible. We will therefore neglect them in what follows.
Using the factorization formula for the D 2 observable derived in [46,47], it can be shown that the dominant non-perturbative effects from hadronization are captured by a collinear-soft function (2.4) -4 -

JHEP05(2018)002
Here the Y i are products of Wilson lines along the subjet directions, T andT denote time and anti-time ordering respectively. The measurement function and soft drop constraints are implemented by the energy flow operatorsÊ 3 and Θ SD , whose exact form is not relevant for the current discussion. These operators can be written in terms of the energy-momentum tensor [71][72][73][74]. Importantly, due to the application of the grooming algorithm, the collinear-soft function, and hence the non-perturbative hadronization corrections, depend only on the color structure of the jet itself, and not on the color structure of the global event, making them a property of the observable. While the collinear-soft function in eq. (2.4) can be calculated perturbatively, it is currently not possible to calculate it non-perturbatively. Instead, a functional parametrization of the non-perturbative matrix element, which is referred to as a shape function, F NP , is used [66][67][68][69]. Shape functions have been used in a variety of contexts in jet physics [41,45,46,[75][76][77]. For the particular case of D 2 , this allows the non-perturbative D 2 distribution to be written as a convolution of the perturbative distribution and the shape function (2.5) The scalings entering this expression are determined by the scalings of the collinear-soft function in eq. (2.4), and were derived in [46,47]. We will take our model shape function to have the simple functional form 7 This function has a first moment Ω D ∼ Λ QCD , is normalized to unity, and we may think of this specific shape function as but the first term in an orthogonal expansion which specifies the non-perturbtive corrections to all moments of the distribution, where we have truncated to specifically fix only the first moment. Here α is a parameter, which specifies the functional form. We will choose α such that the function vanishes as x → 0. We find that α = 2-3 provides a good description of the non-perturbative correction. Since the dominant effect is a shift of the first moment, which is fixed, it is only at small value of D 2 that there is dependence on α. The physical interpretation of this function is that it smears the energies within the jet at the scale Λ QCD . In certain cases universal properties of the first moment of shape functions can be proven [78,79]. These moments, as well as higher moments have been extracted from event shape data, for example from the thrust event shape [80].
Ref. [46] studied the non-perturbative shape parameter Ω D , and found • Ω D is independent of the quark or gluon nature of the jet.
• The scaling predicted by eq. (2.4), namely that the non-perturbative shift in the distribution is inversely proportional to the mass, is well respected in parton shower Monte Carlo simulations. Figure 2. The shift of the D 2 distribution due to hadronization. (a) The perturbative and hadronized distributions as found in Pythia, and as modeled using the non-perturbative shape function described in the text. (b) The dependence of the non-perturbative shift, ∆ NP D as a function of the groomed mass, which introduces a source of correlation of the D 2 distribution with the groomed jet mass.

JHEP05(2018)002
In figure 2, we show the effects of hadronization on the D 2 observable found in Pythia, and as modeled using the shape function of eq. (2.6). We see that the simple shape function reproduces quite well the effects of the hadronization.
Although it is conventional to work with a shape function parameter that has mass dimension 1, such as Ω D , for our purposes it will be convenient to introduce the dimensionless shift in the first moment of the D 2 distribution, which we denote ∆ NP D . For the case of the non-perturbative hadronization corrections, we have the relation (2.7) When using the dimensionless variable, we use the shape function which is the same functional form as in eq. (2.6), but we have dropped the tilde to emphasize that the dimension of the argument has changed. The dependence of ∆ NP D as extracted from Pythia is shown in figure 2b, as well as a fit for the non-perturbative parameter Ω D . To extract this scaling, we have fit the shift parameter in the tail region of the distribution, where we expect that a shift of the distribution is valid. The uncertainties represent a conservative estimate due to the fact that the precise region in which one should be performing the fit is not always clear. The strong dependence on the mass of the jet is clearly visible, which introduces a non-perturbative correlation between the D 2 distribution and the jet mass. It is also important to note that the shift ∆ NP D is dependent only on m J , and not on p T , as can be derived from the factorization formula [46,47]. This simplification is only true for groomed distributions.
Inverting the logic of this section, if we are able to transform between the perturbative and non-perturbative distributions using a convolution with a simple function, this also implies that we can perform the deconvolution to obtain the perturbative distribution. Doing this would remove the correlation of the D 2 distribution with the jet mass arising from hadronization corrections. However, to completely decorrelate the distribution, we also need to understand how to decorrelate the perturbative distributions, which can also depend on the jet mass. This will be addressed in section 2.2. In section 3 we will then give a numerically simple way of performing the decorrelation via convolution.

Perturbative effects
In addition to a dependence of the hadronization corrections on the jet mass, there is also a dependence of the perturbative D 2 distribution on the jet mass that introduces a further correlation between the D 2 distribution and the jet mass. Unlike the hadronization correction, where only the scaling of the hadronization corrections as a function of the jet mass is calculable, the perturbative distribution can be calculated to a given accuracy, and hence the complete dependence of the distribution on the jet mass can be understood. In figure 3a we show a plot of the perturbative D 2 distribution at next-to-leading logarithm matched to leading order 1 → 3 splitting functions in the large D 2 region in order to reproduce the correct endpoint behavior. In the figures this accuracy is referred to as NLL+LO. See [47] for a more detailed discussion of the order counting. Here the H → gg process was used to produce gluon jets. We can see that there is a mild, but non-negligible dependence on the jet mass within the peak region. A more quantitative measure, ∆ P D , the shift in the mean relative to the distribution at m = 35 GeV is shown in figure 4a. This is only a small effect for the groomed D 2 , which has a fixed endpoint at 1/(2z cut ), independent of the jet mass. It is ultimately this fact that leads to a large degree of stability of the distribution. For the ungroomed D 2 distribution the endpoint depends strongly on the jet mass so that the distribution displays a much more complicated dependence on the jet mass. Following the logic of the previous section, if we understand the form of the correlation between the D 2 distribution and the jet mass, we can also remove this correlation. Motivated by the implementation of the shape function for the non-perturbative contribution, we can also attempt to decorrelate the perturbative component of the distribution by convolving with a function which takes the perturbative distributions to some reference value. Since the mean of the D 2 distribution increases with decreasing mass, to decorrelate by convolution with a simple shape function, we will always use as a reference mass value the lowest mass value of interest. Namely, we write Here we have made explicit the mass dependence of the functions, which is separated from the argument of the function by a semi-colon. The fact that such a (possibly singular) function exists is trivial, and it can be determined by division in Laplace or Fourier space (i.e. by deconvolution). Furthermore, this function is (in principle) exactly calculable from the factorization theorem, given predictions for the perturbative D 2 distribution at any given accuracy at any jet mass. However, to have a reasonable prediction for the D 2 distribution requires a matched calculation. This implies that results for the distribution are necessarily numerical instead of analytic, making it difficult to understand the deconvolu--8 -

JHEP05(2018)002
tion analytically. We would therefore like to find a simple function that provides a good approximation to the exact result. Although we cannot analytically predict the exact shape function (in a practical way), we can use our analytic NLL+LO result to compute moments of the perturbative distribution. We expect that the dominant effect of the correlation between the D 2 observable and the jet mass will be a shift of the first moment, as can be seen from figure 3a. The shift in the mean relative to the distribution at m = 35 GeV, ∆ P D , is shown in figure 4a. The shift in the first moment of the distribution arises due to the renormalization group evolution of the functions appearing in the factorization theorem of refs. [46,47]. We can therefore write the shift in the first perturbative order as where γ D is a constant, which we extract from our calculation of the distribution at two mass points. The prediction from this functional form is shown in the dashed line in figure 4a, which provides an excellent description of the numerical results at many other values of the jet mass, confirming the perturbative evolution of the first moment.
To perform the perturbative decorrelation, we will use as the base decorrelation function the functional form of eq. (2.6). Since we can analytically predict the shift ∆ P D , we can use this function to exactly decorrelate the mean. However, by tuning the angular exponent, with the mean fixed, we can further attempt to decorrelate the complete shape of the distribution. The value of α can be extracted by decorrelating the log-mean of the distribution, which can be computed analytically from our NLL+LO calculation. The evolution of the log mean with mass is shown in figure 4, both without decorrelation, and after decorrelation using the function of eq. (2.6) for several values of α. We find that for α in the range of α = 2-3, we have good decorrelation of the log mean. Furthermore, it is quite insensitive to the exact value of α used, which shows that the correlation is dominated by a shift in the mean. The decorrelation of the full distribution for α = 2.4 is shown in figure 3b. As compared with figure 3a, we see a good decorrelation of the full shape of the distribution. This shows that the dependence of the D 2 observable on the mass is in fact remarkably simple, being driven by a shift in the first moment captured by eq. (2.10), with deviations from this to account for the behavior at the endpoints being captured by the simple class of functions in eq. (2.6).
We conclude this section by emphasizing that this analysis could be improved by iteratively building up a shape function starting from the base function of eq. (2.6) using an expansion in orthogonal functions, as has been done in [69], requiring all moments to be decorrelated exactly. However, for our purposes we will find that the simple function of eq. (2.6) works extremely well, as will be illustrated in our case study in section 4.

Convolved substructure
Motivated by the above observation that both the perturbative and non-perturbative components of the distribution can be decorrelated using simple shape functions, we propose -9 -JHEP05(2018)002 that we can use shape functions as an efficient way to completely decorrelate two-prong substructure observables by mapping them to reference mass. This is what we will call the Convolved Substructure, or CSS procedure. Since the shape functions used in hadronization are typically used to shift the distribution to a larger value, for the D 2 observable, we will also choose the reference mass to be the lowest mass of interest, ensuring that the shift in the mean required for decorrelation is positive.
We define the CSS decorrelated D 2 observable by Here F CSS is an as of yet unspecified function with unit norm. While we have used the specific example of D 2 , this approach should apply much more generally, however, we expect that it will only be for IRC safe observables with sufficiently favorable factorization properties that analytic scalings for the F CSS function can be derived. Within this subset of observables, we believe that this represents a completely general and efficient way of performing the decorrelation. Unlike previously proposed analytic approaches, it aims to decorrelate all moments of the distribution, and naturally preserves the domain and norm of the distribution. Furthermore, motivated by the success in describing non-perturbative corrections using a simple basis of functions [69], we will show that we can choose a simple analytic form of the function F CSS as the initial approximation. Further improvements can be systematically added, if needed. It is also interesting to see that this approach includes as a special case the standard DDT, which is a shift of the first moment. Performing a Taylor expansion for a small shift, we have where ∆ D is the first moment of the function F CSS , This reproduces (a constrained form of) the DDT, which decorrelates the first moment. We note that while the DDT procedure was originally introduced as a shift which decorrelates the first moment of the distribution, it has since been generalized to decorrelate, for example, the background efficiency at a given cut. Nevertheless, it can still only decorrelate a single chosen moment of the distribution. We will re-emphasize this point in our numerical comparisons in section 4. Note that when used for incorporating non-perturbative effects, the linear shift applies in a particular region of the distribution, but the full shape function is needed at small values. We will see in section 3.1 that this is also true when used for -10 -

JHEP05(2018)002
decorrelation, with the full convolution reducing to a linear shift throughout most of the distribution, and the full non-linear nature of the function only becoming relevant near the endpoints of the distribution. The exact function F CSS to shift from the mass m 1 to a reference mass m 2 , with m 2 < m 1 , can be written as as was illustrated in figure 1. Here the ⊗ denotes convolution in the variable , and the inverse denotes an inverse in the convolutional sense (i.e. a deconvolution). Instead of performing the decorrelation in this form, we will simplify our discussion and use a single effective function. This can certainly be improved, however, we will already find that with a single function we will find an excellent decorrelation. We will use the decorrelation function of the previous section, namely 8 With this parametrization, we have that the first moment is ∆ D for all values of α, but we allow for a general power law behavior as x → 0, specified by α. When considering a full example at the LHC, we will find that a value of α slightly larger than two will give an excellent fit. Taking the first moment of eq. (3.4), we find that Again, we assume that the reference mass that we are shifting the distributions to, namely m 2 , satisfies m 2 < m 1 . In sections 2.1 and 2.2 we have used the factorization formula for the D 2 observable derived in [46,47] to predict the mass dependence of both the perturbative, ∆ P D , and non-perturbative, ∆ NP D , moments appearing in eq. (3.6). In principle, the exact values of the moments can be extracted for given processes and observables, by studying the distributions with and without hadronization, as was done above.
The decorrelation using this procedure on our NLL+LO calculation is shown in figure 5, which shows both the perturbative and non-perturbative distributions, as well as the final CSS curve, and can be viewed as an analytic realization of the strategy outlined in figure 1. Good, but not perfect decorrelation is observed, and we will see in section 4 that the decorrelation procedure seems to work even better in Pythia than for the analytic example shown here. 9 For ease of applicability, we find it more convenient to give a formula for ∆ D (m 1 , m 2 ), with two constants that can be directly extracted by fitting the decorrelation at several points, as will be demonstrated in a practical example in section 4. Using our 8 That the final convolution in eq. (3.4) can be approximated by a single function of the same form can be understood by looking at the functional form in Laplace space, where these functions take the form of rational functions to the power α using the first term in the expansion for FCSS. Due to the inverse convolution appearing in eq. (3.4), the Laplace transform of the convolution of the three functions has the same polynomial degree as the Laplace transform of a single such function. 9 There is also a tradeoff between exactly reproducing the mean and accurately capturing other aspects of the shape.
-11 -JHEP05(2018)002 understanding of the functional dependence on the jet mass for both the perturbative and non-perturbative contributions to the moment discussed in sections 2.1 and 2.2, we have the general form of the moment for the CSS approach as where the second line is an approximation that is good for most numerical purposes. Again, we emphasize that the reference mass, m 2 is taken to satisfy m 2 < m 1 , so that this shift is positive. Here the c NP , c P andc P are constants that can be fit for numerically, and describe the non-perturbative and perturbative scalings respectively. We note that although it may appear unnatural, the coefficients c NP andc P have different mass dimensions, since c NP is associated with a power-law variation, whilec P is associated with a logarithmic variation. From a practical perspective, the CSS decorrelation function can be constructed by fixing the value of α appearing in eq. (3.5) using a single value of the mass. For D 2 , we find values of α ∈ [2, 3] work well, with no strong preference for a given value. Using several values of the mass, one can then fit for c NP andc P to give a smooth function that describes the evolution of the moment of the shape function. Knowing the analytic scaling of the function is therefore important, as it allows the shape to be fixed using dedicated Monte Carlo at a few specific mass points, and does not require Monte Carlo at every single value of the mass to determine the form. We will illustrate this for a case study of -12 -

JHEP05(2018)002
Z → qq in section 4, where we will find that this gives a remarkably good (almost perfect) decorrelation of the D 2 observable.

Practical implementation
In practice, the convolution procedure described above needs to be applied jet-by-jet and not at the distribution level. The convolution of two distributions corresponds to the addition of the random variables described by the distributions. Therefore, one possibility for translating the distribution-level results from earlier to event-by-event results is to add to every observed D 2 value a random value drawn from the distribution F CSS (x; α, Ω D ) from eq. (3.5). This is not ideal because (a) the randomness can introduce features in the classification performance for finite statistics and (b) there are various technical reasons like reproducibility that make injecting randomness unattractive. Another way to accomplish the convolution but using a deterministic approach is to use the (inverse) cumulative distribution function (CDF). Given a random variable X with CDF C(x) = Pr(X < x), C(X) is a new random variable that follows a uniform distribution. For any other CDF G, one can then form the random variable G −1 (C(X)), which follows the probability distribution g(x) = ∂ y G(y)| y=x that corresponds to G. Let We can now define the CDFs C(x) = x 0 c(x )dx and G(x; α, ∆ D ) = x 0 g(x ; α, ∆ D )dx . Then, the jet-by-jet transformation is given by (3.10) This simple mapping allows us to numerically implement the CSS procedure in an efficient manner. An explicit example of the mapping given by eq. (3.10) for the example of Z → qq, which is discussed in detail in section 4, is shown in figure 6. This figure demonstrates the construction of the CSS D 2 , following the procedure from section 3.1. The CDF for each D 2 distribution is computed (C for D 2 and G for D 2 CSS), as shown in figure 6a and then the transformation in eq. (3.10) is shown in figure 6b. While the CSS curves may look mostly linear, there are important non-linear features at high and low D 2 . These will be discussed in detail in section 4, and will play an important role in decorrelating the complete D 2 distribution, and not just the first moment. The perturbative expansion of the CSS procedure to its first moment, as was discussed around eq. (3.2) gives rise to a linear behavior, and the fact that the mapping in figure 6b is mostly linear simply shows that this is a good approximation. Note that the DDT procedure would result in straight lines in figure 6 with a mass-dependent offset. 4 A case study: D 2 for Z → qq An important and recent application of variable decorrelation is the search for a low mass hadronic resonance, Z → qq [21-23], which we therefore use as a case study. The generic quark and gluon background is too large to observe a dijet resonance directly, but when the Z is produced in association with initial state radiation, it can be sufficiently boosted for its decay products to be collimated inside a single jet. For our study, both the Z and the generic quark and gluon background are simulated with Pythia 8.183 [81,82]; the former by changing the mass of a Standard Model Z boson and the latter with all hard QCD processes. All stable final state particles excluding neutrinos and muons are clustered into jets with FastJet 3.1.3 [83] using the anti-k t algorithm [83,84] with R = 0.8. In order to make sure that the Z particles with masses up to 300 GeV are mostly contained inside a single jet, jets are required to have p T > 1 TeV. Jets are then re-clustered using the Cambridge/Aachen algorithm [85][86][87] and groomed with mMDT/soft drop using z cut = 0.1. From the groomed jet's constituents, the jet mass is calculated along with D 2 using the EnergyCorrelator FastJet contrib [83,88]. Throughout this section we will use an angular exponent of β = 2 for the D 2 observable, but for notational simplicity, we will suppress the argument.
To perform the CSS decorrelation, we will shift all distributions to the reference mass of m = 50 GeV, and we will consider jets with masses in the range 50 GeV < m < 250 GeV, namely a factor of 5 variation. This is approximately the mass range used in the current LHC searches [21][22][23]. In a realistic application, it may be convenient to shift the distributions in different mass regions to different reference values. For example, for low mass searches, the Z mass provides a natural mass scale where the analysis changes, and therefore it may prove useful to shift jets with mass m > m Z to the reference mass of m Z , and jets with mass m < m Z to the lower mass limit of the search. In this way, the required decorrelation in each mass window is minimized. However, the goal of this section is simply to illustrate that we can completely decorrelate the D 2 distribution over a wide range of jet masses. In figure 7 we show the standard D 2 distribution, as well as the decorrelated distributions using the CSS and DDT approaches, for five narrow bins in the groomed jet mass. The DDT is applied by shifting where the averages x|y (this means the average of x given y) are computed using the QCD background jets. By construction, the average of the resulting DDT distribution is independent of m: The CSS procedure is applied using the shape function, F CSS , of eq. (3.5) with α = 2.4 and Ω D as indicated in the figure. The value of α was fixed for a single value of the mass, however, fortunately, we find that we are quite insensitive to the precise choice of α. The values of Ω D are plotted in figure 8 along with a fit to the analytic form, which we see provides an excellent description. The extractions of the shift at these five mass values can be viewed as fixing the coefficients of the analytic mass dependence of the decorrelation procedure of eq. (3.7), and providing a prediction for every other value of the mass, as would be required experimentally. Here we see the advantage of knowing the analytic form, namely that one only needs dedicated Monte Carlo at several specific mass values. The signal distributions are also shown to give a feeling for the range of interest of the D 2 observable for discrimination.
A number of features of the different decorrelation procedures are clearly evident from these figures. First, we see in figure 7b that the CSS decorrelated observable has essentially no dependence on the jet mass. The complete shape of the distribution is identical for the wide range of masses shown. By contrast, the shape of the DDT version in figure 7c changes with mass even though the mean is fixed. This is particularly true on the left side of the peak. The difference between methods arises from the non-linear nature of the CSS mapping, as was mentioned in figure 6. A zoomed in view of the small D 2 region is shown in figure 9, which highlights the difference between the two approaches. Both the CSS and DDT mappings are effectively linear to the right of the D 2 peak, where we see that both decorrelate the observable very well, but the non-linear mapping is required to perform the decorrelation of the shape of the distribution at small values of D 2 . It is in this region that the shape of the distribution changes non-trivially with mass, and the difference between distributions at different masses cannot simply be described by a shift. The ability of the CSS approach to correctly reproduce the change in shape of the distribution in this region of the distribution, which is the most important region for  Figure 9. A comparison of the groomed D 2 DDT distribution in (a) and the groomed D 2 CSS distribution in (b) at small values. The CSS approach decorrelates the entire shape of the distribution, including at low values of D 2 , where the shape of the distribution changes non-trivially, and which is the relevant region for discrimination. discrimination, is quite remarkable. The differences between the two different decorrelated distributions are shown in figure 7d, which also highlights that the differences between the two decorrelation procedures become large at small values of D 2 . We also note that here we have chosen to decorrelate the background (QCD) distributions, and therefore the signal distributions exhibit some dependence on mass.
As a further quantitative comparison between the CSS and DDT approaches, in figure 10 we compare different integrals of the distributions, namely the mean, and the probability that D 2 ≤ 0.4 (the lower tail fraction), which we denote Pr(D 2 < 0.4). By construction, the mean of the DDT D 2 distribution is independent of mass, as seen in figure 10a. However, the shape does change with mass as indicated by the lower tail fraction in figure 10b (lower D 2 is more signal-like). On the other hand, since the CSS approach decorrelates the complete shape of the distribution both the mean and the tail fraction are nearly independent of mass. We must also emphasize that the DDT approach could equally well be applied to flatten the Pr(D (2) 2 < 0.4) (or any other given integral of the distribution). However, it would then not decorrelate the mean. In other words, it can be used to decorrelate a single moment at a time. On the other hand, the CSS approach aims to decorrelate all moments.
Finally, it is important to check that the CSS procedure does not degrade the tagging performance of the D 2 observable. This was shown for the DDT approach in [25]. Applying the mapping shown in the right plot of figure 6 also to Z events results in the distributions that were already shown in figure 7. Lower values of D 2 are more signal-like so an upperthreshold on the D 2 distribution is an effective two-prong tagger. Figure 11 quantifies the tradeoff between signal and background efficiency with and without the CSS procedure. As desired, there is a minimal difference in the ROC curve after applying CSS. This difference could be further minimized by performing the CSS decorrelation in narrower mass windows.   Figure 10. (a) The mean D 2 and (b) Pr(D 2 < 0.4) for QCD jets as a function of mass. By construction the DDT decorrelates a single moment, chosen here to be the first moment, but does not decorrelate higher moments. On the other hand, the CSS procedure is designed to decorrelate the entire shape of the distribution.

Conclusions
In this paper, we have shown how a given jet substructure observable, such as N 2 or D 2 , can be decorrelated with the jet mass using an understanding of its perturbative and non-perturbative behavior. Inspired by the use of shape functions for modeling nonperturbative effects, we introduced the Convolved SubStructure (CSS) approach, which uses a shape function, convolved with the substructure observable's distribution, to map it to a reference mass. The shape function incorporates effects due to both perturbative and non-perturbative physics, and we used a recently derived factorization formula to analytically derive the mass dependence of both these contributions. Unlike previous approaches with similar philosophies, the CSS approach completely decorrelates the entire shape of the distribution. Furthermore, it is systematically improvable by expanding the shape function in a basis of orthogonal functions [69], and uses maximally the theoretical understanding of the observable.
We have shown in detail how the CSS approach can be practically implemented in an extremely simple manner, and studied its behavior for the example of a light Z → qq search using the D 2 observable. We found that using a simple two parameter shape function we were able to obtain an excellent decorrelation of the entire D 2 distribution over a wide range of mass values. The shape function parameter defining the shift of the first moment of the distribution has a functional dependence on the jet mass that can be understood from first principles, and is fixed by demanding that the first moment of the mapped distributions are the same as the reference mass distribution. Higher moments can be handled similarly, but since we require the shape function to maintain the domain and norm of the distribution, we find that already the decorrelation of the first moment  Figure 11. A scan in an upper cut on D 2 traces out a Receiver Operator Characteristic (ROC) curve quantifying the tradeoff between Z' (signal) efficiency and QCD (background) efficiency for various groomed jet mass bins, shown in a linear plot in a) and a log plot in b). The CSS procedure is not found to significantly degrade the discrimination power of the observable.
effectively decorrelates the whole spectrum. Furthermore, the discrimination power of the CSS observable was not significantly degraded. In real applications, the tradeoff between discrimination power and decorrelation must be evaluated, and it may be practical to perform the decorrelation in mass windows. One important aspect that we did not study in this paper is whether an identical mapping applies at detector level. Even if it is not the case, our approach is general, and another simple functional form that performs the decorrelation could be found. It will also be interesting to apply the CSS approach to other observables, such as N 2 , for which the DDT approach has been applied successfully [21,23]. Again, a slightly modified convolution function may be required, depending on the behavior of the observable. We therefore hope that the CSS approach can be used to decorrelate a variety of substructure observables, improving the reach and performance of searches for low mass particles at the LHC.