The Lund jet plane

Lund diagrams, a theoretical representation of the phase space within jets, have long been used in discussing parton showers and resummations. We point out that they can be created for individual jets through repeated Cambridge/Aachen declustering, providing a powerful visual representation of the radiation within any given jet. Concentrating here on the primary Lund plane, we outline some of its analytical properties, highlight its scope for constraining Monte Carlo simulations and comment on its relation with existing observables such as the zg variable and the iterated soft-drop multiplicity. We then examine its use for boosted electroweak boson tagging at high momenta. It provides good performance when used as an input to machine learning. Much of this performance can be reproduced also within a transparent log-likelihood method, whose underlying assumption is that different regions of the primary Lund plane are largely decorrelated. This suggests a potential for unique insight and experimental validation of the features being used by machine-learning approaches.


Introduction
Jets, the collimated bunches of hadrons that result from the fragmentation of energetic quarks and gluons, are among the most fascinating objects that are used at colliders.The study of the internal structure of jets has become a prominent area of research at CERN's Large Hadron Collider, both theoretically [1] and experimentally [2].This is a reflection of its power to probe the Higgs sector of the Standard Model (SM) and to search for physics beyond the Standard Model (BSM), but also of the considerable scope for learning more about the quantum chromodynamics (QCD) associated with the development of jets.
In addition to the study of specific manually crafted observables, several groups have highlighted the power of machine-learning (ML) approaches to exploit jet-substructure information, using a variety of ML architectures.As inputs they have mainly considered discretised images of the particles inside a jet [34][35][36][37], clustering histories from sequential recombination jet algorithms [38][39][40], or a basis of substructure observables [41][42][43].The performances that they obtain for signal versus background discrimination are often substantially better than those based on manually constructed observables.This good performance comes, however, at the price of limited clarity as to what jet substructure features are actually being exploited.One consequence is that it is difficult to establish to what extent widely-used modelling tools, e.g.parton-shower Monte Carlo generators and detector simulations, reliably predict those features, an aspect that is critical for the quantitative interpretation of collider searches. 1he purpose of this article is to introduce a representation of the internal structure of jets that helps bridge the fault-line between manually constructed observables and ML approaches.In particular, we ask whether it is possible to organise the information within a jet such that (a) it can be straightforwardly measured and understood in data (b) it can be manually organised into transparent and physically motivated discrimination observables (without ML) and (c) it can serve as an input to ML for signal/background discrimination, specifically one whose main discriminating characteristics can be clearly identified and understood.
The representation that we use is inspired by Lund diagrams [46], which serve as a theoretical representation of the phase-space within jets and are often used in discussions of Monte Carlo parton shower algorithms and resummation of logarithmically enhanced terms in perturbation theory.In a Lund diagram the available phase-space is mapped to a triangle in a two dimensional (logarithmic) plane that shows the transverse momentum and the angle of any given emission with respect to its emitter.Each given emission creates new phase space (a triangular leaf) for further emissions.One of the key observations of this paper is that Lund diagrams need not merely be a construct for theoretical calculations.They can be constructed for individual jets, essentially by following the clustering tree of the Cambridge/Aachen [47,48] jet algorithm.The pattern of emissions, notably within the first triangular phase-space region, the primary Lund plane, carries considerable information about the jet.

Lund diagrams and the primary Lund plane
To help understand how the primary Lund plane is constructed, Fig. 1 shows three representations for each of two jets.
The top representation shows the set of particles in the jet, with the direction and length of each line segment schematically representing the direction and scalar momentum of the corresponding particle.The black particle (a) is the primary particle, i.e. the one that initiated the jet.Particles (b) and (c) are emissions inside the jet.
The middle representation gives the full Lund diagrams for each of the two jets.The phase-space for emission from each particle is represented as a triangle in a ln ∆ and ln k t plane, where ∆ and k t are respectively the angle and transverse momentum of an emission with respect to its emitter.The triangles are colour-coded to match the colours of the particles in the upper row.The black triangle represents the primary phase space, i.e. emission from (a) (our classification of which particle emits which other ones is based on the concept of angular ordering of emissions).Considering the left-hand jet, the blue particle (b) in the jet is represented as a blue point at the appropriate (∆, k t ) coordinate on the (black) triangle associated with its emitter (a).The blue particle has its own phasespace region, the blue triangle, which is known as a secondary Lund triangle, or "leaf" where the particle could have, but in this case didn't, emit.Similarly for the red particle, (c), which is also emitted from (a).In contrast, for the right-hand jet, (c) was emitted from (b) and so its point appears on the (secondary) blue triangle associated with particle (b), while its red phase-space triangle emerges as a tertiary triangle, or leaf, off (b)'s triangle.
Finally, the bottom diagram shows the primary Lund plane, which contains just the positions of the emissions from (a), but no information about what further secondary emissions may have been produced.It is this simpler representation that we will use throughout most of the article.

Construction of the primary Lund plane
Our starting point for constructing the primary Lund plane is to (re-)cluster a jet's constituents with the Cambridge-Aachen (C/A) algorithm [47,48]. 2 The C/A algorithm identifies the pair of particles i and j closest in rapidity (y = ln E+pz E−pz , with E and p z the particle's energy and longitudinal momentum with respect to the colliding beams) and and azimuth φ, i.e. with the minimal value of ∆ 2 ij = (y i − y j ) 2 + (φ i − φ j ) 2 .It then recombines them into a "pseudojet" with momentum p = p i + p j .This procedure is repeated until all particles (and pseudojets) have been recombined, or are separated by ∆ ij larger than some parameter R.
To create a primary Lund plane representation of a jet we then work backwards through the C/A clustering.One starts with the full jet and then proceeds as follows: 1. Decluster the current object to produce two pseudojets, p a and p b , labelled such that p ta > p tb , where p ti is the transverse momentum of i with respect to the colliding beams.We will consider p b to be the emission and p a + p b to be the emitter.In the limit where p b carries little momentum relative to p a , p a + p b and p a can be thought of being the same particle, simply differing through the loss of a small amount of momentum by the radiation of a gluon p b .
2. Determine a number of variables associated with the declustering, e.g.
In the limit p tb p ta and ∆ 1, k t is the transverse momentum of particle b (the emission) relative to its emitter, ψ is an azimuthal angle around the (sub)jet axis, and z is the momentum fraction of the branching.In our default definition of the Lund plane, the coordinates associated with this declustering will be ln ∆ and ln k t .One may also, however, make other choices of coordinates, such as for example ln ∆ and ln κ, or ln ∆ and ln k t /p t,jet (with p t,jet the jet transverse momentum).
3. Repeat the procedure by going to step 1 for the harder branch, p a .This procedure gives a tuple of variables {k (1) t , ∆ (1) , . ..}, . .., {k (n) t , ∆ (n) , . ..} for each of the branchings off the main emitter.The k t and ∆ elements of the tuples (specifically their logarithms) can be interpreted as set of coordinates of points in the Lund plane, corresponding to the full set of primary branchings, as in the lower row of Fig. 1.The tuple elements other than k t and ∆ provide complementary information for each point.One could additionally follow the lower p t branch at each declustering.This would effectively create secondary, tertiary, etc., Lund planes (or triangles), i.e. one for each emission, giving the full Lund diagram as in the middle row of Fig. 1.We postpone the study of full Lund diagrams to future work, although a brief discussion of the use of a secondary Lund plane is given in appendix B.

Averaged Lund plane density and basic analytical properties
The simplest analysis of the Lund plane is to examine the average density of points per jet and per unit area in the ln k t -ln ∆ plane, which we denote One can also define a density in terms of dimensionless variables, e.g. 3) The quantity ρ(∆, k t ) is represented in Fig. 2a for a sample of (C/A, R = 1) jets with p t > 2 TeV, simulated using the dijet process in Pythia 8.230 [49] with the Monash13 tune [50].For the case of a quark-initiated jet (about 80% of the jets in the sample Fig. 2a), to leading order in perturbative QCD and for ∆ 1, one expects where (0 < z < 1); z is an effective momentum fraction and coincides with z in Eq. (2.1) when there is a single emission.For z 1 the z-dependent factor in ρ is equal to 2 and so the density of primary Lund emissions is just proportional to the strong coupling, The upper diagonal edge in the figure is a consequence of the kinematic limit, k t < 1 2 p t,jet ∆.At low scales α s (k t ) gets large, which accounts for the bright red band around k t = 1 GeV.At values of ∆ ∼ 1, initial state radiation (ISR) and multi-parton interactions (MPI/UE) contribute to increasing the density, which is reflected in the contours of constant colour bending upwards to the left.The different regions are outlined schematically in Fig. 2b.
Beyond leading perturbative order, several further physical effects contribute to the structure of the Lund plane.The upper boundary gets smeared out because of degradation of the leading subjet energy as one declusters the jet. 3 The leading subjet can also change flavour as one moves down the clustering tree, in particular when there is an emission close to the upper, kinematic boundary.This can then alter the density of emissions at smaller angles, i.e. subsequent declusterings.The underlying physics of these two effects is closely connected with small-R resummations, cf.Refs.[51,52].Non-global [53] and clustering [54,55] logarithms introduce correlations between regions of the Lund plane at similar ∆ values but different k t 's.For each effect that introduces a correlation, there is typically also an impact on the average Lund density beyond leading order.We leave the detailed study of these contributions to future work.

Use for measurements and constraints on Monte-Carlo generators
The Lund jet plane density ρ in Eq. (2.2) can be directly measured experimentally and compared to analytic predictions and parton-shower Monte-Carlo simulations.Here we concentrate on the latter.For such quantitative studies it is convenient to examine slices of the Lund plane density at fixed k t and fixed ∆.Two of each are shown in Fig. 3, illustrating the potential of the Lund plane for providing insight into event generators.The figure compares the output of three different generators, Pythia 8.230 (Monash13 tune), Sherpa 2.2.4 [56] and Herwig 7.1.1[57] (angular-ordered shower).The slices at fixed ln k t illustrate a somewhat different trend between the angular-ordered generator and the other two, which both have transverse-momentum ordered showers.The differences span the ±20−30% range and we have found that they are robust against non-perturbative effects.The slices at fixed ∆ illustrate the coverage of both the high-k t , perturbative  region, where the density is an infrared and collinear safe quantity, and the low-k t , nonperturbative region.In the latter, for k t below a few GeV, one also sees differences between generators of about 15%.The ability to clearly identify separate perturbative and nonperturbative regions provides a powerful advantage relative to quantities such as jet-shapes that have been measured in the past, and whose distributions tend, to some extent, to mix perturbative and non-perturbative sensitivity.
A final remark here is that at low k t , the Lund plane density could be seen as providing an effective definition of the strong coupling in the infrared, which one might also be able to relate to the α 0 parameter of Refs [58,59].

Relation with other observables
Lund plane density observables are closely related to a wide range of other jet observables.
In some cases there is an exact relation between the Lund-plane density and that other observable.To illustrate two such cases, z g [60] and N SD [61], we use the Lund plane density ρ(∆, κ) variant from Eq. (2.3), defined using the dimensionless ∆ and κ variables of Eq. (2.1).The choice of ρ versus the ρ density as defined in Eq. (2.2) depends on whether one privileges the study of emissions close to the collinear boundary (ρ is preferable) or the presence of an explicit k t physical scale, e.g. to examine the transition to the nonperturbative region (ρ is preferable).
The z g variable of Ref. [60], which is also being studied in heavy-ion collisions [62,63], can be directly deduced from the ρ density.Recall that the z g variable is defined as the momentum fraction that appears in the first β = 0 soft-drop (or equivalently modified mass drop) occurrence [19,64], where the soft-drop procedure has a parameter z cut indicating z values below which soft splittings are simply discarded.Recalling that κ = z∆ in Eq. (2.1b), the distribution of z g is given by where the Sudakov-type factor S(∆, z cut ) is also expressed in terms of the Lund-plane density Eq. (2.6) is an exact relation.Note that the average number of iterated soft-drop steps [61], N SD is simply given by4 N SD = − ln S(0, z cut ). (2.8) For other observables the relation between the Lund plane and the observable's distribution only holds up to some given logarithmic accuracy.For example the cross section Σ(B) for a jet in some p t bin to have a broadening [65] (also known as girth) smaller than some value B is given, at leading-logarithmic accuracy (terms α n s ln n+1 B in ln Σ(B)), by (2.9) Analogous relations hold for the jet mass, angularities [66] and two-point energy-energy correlations functions [11].

Application to boosted-W tagging
We now turn to the potential of the Lund plane for identifying hadronically decaying boosted electroweak bosons, concentrating specifically on the example of W identification.Fig. 4 shows the averaged (primary) Lund plane for hadronic W decays (p t > 2 TeV), to be compared to Fig. 2 for dijets.Two main differences are clearly visible to the human eye.One is the diagonally oriented patch in the W case, around ln 1/∆ = 2.5 and ln k t /GeV = 4, which is connected with the fixed-mass two-pronged structure of the W : lines of constant mass in the Lund plane are up-right going diagonals.The other important feature in the W case is the considerable depletion of emissions in the upper-left region and below the W -mass structure.The depletion is principally a consequence of the colour-singlet nature of the W .We investigate two broad approaches to making use of the information in the Lund plane.One is a log-likelihood type approach, while the other will be to use machine learning.W jets, averaged primary Lund plane 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 S ( , k t ) Figure 4: Lund-plane emission density, ρ S (∆, k t ), for hadronically decaying boosted W bosons, in W W events, using the same jet-clustering and selection as in Fig. 2.

Log-likelihood use of Lund Plane
The log-likelihood approach uses two main inputs: the first requires the identification of the "leading" emission, ( ), which in the W case is likely to be associated with the two-prong decay.We take this leading emission to be the first emission in the Lund declustering sequence that satisfies z > z cut with z cut = 0.025, which corresponds to the emission that would be selected by the mMDT tagger [19] with the same z cut or equivalently by the Soft-Drop (SD) procedure [64] with β = 0 and that z cut .We define a L log likelihood function using the ratio of dN X /dm ( ) dz ( ) (X = S, B), the differential distribution in the mass and z variables of the leading emission (m ( ) , z ( ) ) for a simulated signal sample S (W bosons) with N S jets, and the analogous quantity for a background (QCD dijet) sample B. In practice we bin logarithmically in m ( ) and z ( ) to construct a discretised approximation to L (m ( ) , z ( ) ).
The second likelihood input is designed to bring sensitivity to the pattern of nonleading (n ) emissions, i.e. the pattern of additional radiation, within the primary Lund plane, that decorates the basic two-prong structure.It involves a function where ρ (n ) X is determined just over the non-leading emissions, as a function of the angle ∆ ( ) of the leading emission, with X = S, B corresponding either to the W signal (X = S) or to the QCD background sample (X = B).Our overall log-likelihood signal-background discriminator for a given jet is then given by where the normalisation term N is up to an overall constant.In the sum over non-leading emissions, i = in Eq. (3.4), each non-leading emission i contributes information (through the L n (∆ (i) , k t ; ∆ ( ) ) term) about whether its corresponding region of the Lund plane tends to be more populated by signal or background emissions.The normalisation term N accounts for the average difference in the number of non-leading emissions between signal and background jets.
It is instructive to think about the conditions under which Eq. (3.4) would be the optimal discriminator that can be constructed from the sequence in primary Lund-plane declusterings: (1) the identification of the leading emission associated with the W 's twoprong structure should be correct; (2) non-leading emissions in the Lund plane should effectively be independent of each other, which is the basis of the sum over i = in Eq. (3.4); (3) that pattern of independent emission should depend on ∆ ( ) but not on m ( ) .Each of these approximations has its imperfections, but none is expected to be particularly badly violated.
To help illustrate how the log-likelihood approach works in practice, we show the leading-emission distribution density, 1 for background (X = B) and signal (X = S) jets in Fig. 5.The the background is diffuse, while the signal is peaked around m ( ) = M W and concentrated at larger z ( ) values, as one would expect.The non-leading emission density, ρ (n ) B , is shown for background (dijets) in Fig. 6a, for jets where the leading emission has 1.5 < ln 1/∆ ( ) < 2 and z ( ) > z cut (roughly ln k t /GeV > 2.5).For similar rapidities and lower k t 's there is a modest depletion in the number of emissions.This is a partial shadow cast by the leading emission: non-leading emissions with similar ∆ and ψ to the leading emission will be clustered with it.The other main feature of note in Fig. 6a is the empty area in the upper-left region of the plot: given that the emission classified as leading had 1.5 < ln 1/∆ ( ) < 2, there cannot have been any emissions with a small ∆ and z > z cut .
Fig. 6b shows the L n likelihood function.Most of its discriminating power will come from the extensive dark blue region.For each emission that is present in this region, one gets a negative L n contribution to L tot , which drives L tot to be more background like (i.e.negative).Instead if there are few emissions, the positive contribution of the N term results in a more signal-like (positive) final L tot value.Note that the dark blue region of Fig. 6b stretches down to k t 's below a few GeV, and since that region tends to contain a significant number of emissions in the background case (cf.Fig. 6a), one expects that some of the sensitivity in W v. QCD discrimination will come from low-k t non-perturbative effects.This highlights the importance of direct experimental measurements of such regions. 5One can explicitly check the influence of the low-k t region on the tagging performance by imposing a minimum k t cut in the construction of the Lund plane.This is discussed further in Section 3.5 below.
In our practical implementation of the log-likelihood approach, we will use ln m bins of size 0.025, ln z bins of size 0.2 for L ; for L n we will take bins in ln k t and ln ∆ of 0.2 and bins in ln ∆ ( ) of 0.5.The likelihood functions will be calculated using 500,000 simulated signal and background jets, while performance will be evaluated on an independent sample of 200,000 signal and background jets.

Machine-learning use of Lund Plane
Our second approach to using the Lund-plane information for W tagging is to provide it as an input to a variety of machine learning (ML) methods.
The input can be provided in the form of a sequence of {ln 1/∆, ln k t } pairs; we use this kind of sequence with dense (DNN) and Long Short-Term Memory (LSTM) neural networks [68]. 6In practice the sequence is zero-padded to form a 60 × 2 dimensional matrix.The DNN consists of four layers of size 200 with ReLU activation, and a final two-dimensional layer with softmax activation.For the LSTM network, we use a cell with 128-dimensional output connected to a dropout layer with rate 20%, with a final dense layer of dimension two and softmax activation. 7In addition to the ln 1/∆ and ln k t variables, one could add variables such as ψ, the logarithm of the subjet mass (ln m), or the particle multiplicity in the subjet.With realistic experimental resolutions, we have found that they bring a small additional gain in background rejection, in the 10 − 25% range. 8Keeping in mind that the variability in performance associated with different training choices is also in the 10% range, we chose not to include these extra variables for our final results.However they could be further investigated in future work.
Alternatively one can create a 2-dimensional Lund image for each event (in which only a few pixels are turned on) and provide it as an input to a convolutional neural network (CNN), where additional information such as the azimuthal angle can be encoded through the pixel intensity by adding new channels.Each jet is represented as a 50 × 50 pixel image.These images are used to train a neural network consisting of three twodimensional convolutional layers with ReLU activation and 128 output filters each, which are each connected to a max pooling layer and a spatial dropout layer with rate 5%.The first convolution window is of size 10 × 10 pixels, with the following two layers having windows of size 4 × 4. The last convolutional layer is connected to a dense network with 256 neurons and another dropout layer leading to a final two-dimensional output layer with softmax activation.This network can also be trained on jet images, where each pixel corresponds to a bin in (y, φ)-space around the jet axis.In this case the pixel intensity is given by the normalised scalar p t sum of particles within that phase space region.
Our machine learning is implemented in Keras 2.0.8 [70], using TensorFlow 1.2.1 [71] as the backend.All model weights are initialised with a He uniform variance scaling initialiser [72], and each training is performed using a batch size of 128, with Adam as optimisation algorithm [73] and a categorical cross-entropy loss function.The parameters for the machine learning are similar, where relevant, to those in Refs.[35,36], though we used a greater number of pixels for the images and larger networks.Training is carried out on a sample of 500 000 jets for each of the W and background samples.During each epoch 80% of the sample is used for adapting the network weights.At the end of each epoch performance is tested on a validation sample consisting of 10% of the events (that were not used during the epoch's training).If that performance has not improved over the past 4 epochs training halts.The maximum number of epochs is 15 and training typically halts at epoch 10−15.The final performance of the network is then evaluated using a further independent 10% of the sample.
Other recent work has also made use of declustering sequences with machine learning tools.Ref. [38] has used recursive neural networks on the complete declustering tree (with various clustering algorithms), using the momenta of the subjets at each stage as inputs.
Ref. [39], has used the anti-k t clustering sequence as a way of ordering all constituent particle momenta, which are then provided in that order to an LSTM.

Jet-shape discriminant
In addition to the log-likelihood and machine-learning based approaches, we will also include a comparison with an optimised choice of jet-shape discriminator.For this purpose, we apply the SoftDrop algorithm (β = 2, z cut = 0.05) and use the resulting groomed jet to calculate both the jet mass and the D 2 observable [13] with β = 2 (itself very similar to C 2 [11] for any given jet mass).The jet mass and D 2 are then given as an input to a boosted-decision tree (BDT) in the TMVA [74] package.This pair of observables, used with just a mass cut, not a BDT, 9 was found to be close to optimal in terms of background rejection among a comparison of 88 shape-mass combinations in the recent Les Houches (LH) study [75] (Fig. III.29), for a fixed signal efficiency of 0.4.Its performance is substantially above that of the default ATLAS and CMS jet-shape discriminant choices.We refer to it as D [loose] 2 . 10  We could also have chosen the dichroic D 2 variable, used as a benchmark in [75], whose performance is only slightly worse but is more resilient against non-perturbative effects.We will return to the question of resilience below in section 3.5.

Simulation, detector effects and reconstruction
In evaluating the performances of different methods, an important consideration concerns the inclusion of realistic experimental resolutions.This is relevant both for reconstructing the W mass and as regards the radiation part in the rest of the Lund plane.The baseline that we adopt for our comparisons is to use the Delphes [76] fast detector simulation, version 3.4.1,with the delphes_card_CMS_NoFastJet.tclcard to simulate both detector effects and particle flow reconstruction.The particle flow outputs have artefacts on angular scales associated with the hadronic and electromagnetic calorimeter, and these have an adverse effect on performance, both in terms of mass resolution and availability of Lund plane information.Accordingly we use a "subjet-particle rescaling algorithm" (SPRA1) at the level of small-radius (R h = 0.12) subjets to retain angular information from electromagnetic calorimeter deposits and charged tracks at small angular scales while retaining full hadronic calorimetry energy information at larger scales.The SPRA algorithm is closely related to earlier methods that proposed jet-wide, subjet and hadronic calorimeter rescaling of charged tracks or electromagnetic calorimeter deposits [77][78][79][80][81][82][83][84].The details of the Delphes particle flow effect on the Lund plane, of the SPRA algorithm and of the improvements it brings are given in appendix C.
Our results will use simulated dijet events as the background and W W events as the signal, selecting jets, clustered with the C/A algorithm with radius R = 1 using FastJet3.3.0 [85], with p t > 2 TeV and |y| < 2.5 at a centre-of-mass energy of √ s = 14 TeV.We use Pythia 8.230 with its Monash 13 tune to generate the events.The samples coincide with those used in the recent Les Houches study [75].We choose to concentrate on a high-p t sample for two reasons: (1) the LHC is increasingly focusing on this region and (2) one expects that high-p t jets contain the most information, because it is at high-p t that the colour-singlet nature of the boson has the most impact on the radiation pattern.Results at lower p t are given in Appendix B.
Note that we have not included pileup in our simulations.We therefore work within the assumption that methods for pileup mitigation such as SoftKiller [86], PUPPI [87], constituent subtractor [88] or machine learning approaches [89] can successfully remove 9 The Lund-likelihood approach can be sensibly used with a fixed mass window, simply by discretising the L likelihood ratio to have a single mass bin.However it is not so straightforward to force machinelearning methods to use a fixed mass window.Therefore to obtain meaningful comparisons across all types of methods we must use the D2 variable in combination with the full mass information. 10In the LH report, this observable is denoted as D  the contamination in the regions that are critical for discrimination.It is also possible to use area subtraction [90,91], given that at each stage of the declustering one has (sub)jets with a well-defined area.However, area subtraction is more likely to be susceptible to fluctuations than other methods.A further possibility is to supplement pileup mitigation with methods such as filtering [3], trimming [92] or recursive soft-drop [93], applied only to larger subjets, in order to reduce their contamination from pileup.

Results
Results for the performance of the different tagging methods of sections 3.1-3.3are shown in Fig. 7.The upper panel shows the background rejection factor for each method as a function of signal tagging efficiency.The lower panel shows the ratio of that rejection factor to the one obtained for the Lund-likelihood method (the blue line).Four of the methods are based on machine learning, as discussed in section 3.2: three of them use Lund plane inputs (Lund-image (with CNN), Lund+DNN, Lund+LSTM), while the other uses a normal jet image (with CNN).Finally the plot also includes the jet-shape plus mass approach, labelled D [loose] 2 .
Overall, the LSTM approach with Lund inputs performs best across nearly the full range of signal efficiencies.Taking W = 0.4 as a reference, the Lund-likelihood is within a factor of 0.7−0.8 of the LSTM performance, while the other machine learning methods are slightly worse than the Lund-likelihood method.The D 2 -based shape discrimination is a factor of two worse than the Lund-likelihood method. 11At higher (lower) signal efficiencies, the machine learning approaches appear to perform relatively slightly better (worse).
The pattern of performances is fairly insensitive to the details both of the Lundlikelihood procedure and the machine learning.For example, in the Lund-likelihood approach, using the subjet that gives a mass closest to the W mass, rather than the first one to pass the mMDT z cut condition, affects signal rejection performance only at the ∼ 10% level (making it better at high efficiencies, worse at lower efficiencies).Similarly, as mentioned in section 3.2, adding mass and azimuth (ψ) information to the LSTM has only a modest 10−25% effect after accounting for detector effects (after detector simulation, this gain is present only when one uses SPRA), which does not appear to be particularly significant relative to other training uncertainties.
We also note that the ROC curves becomes noisy at small signal efficiencies.This can at least in part be attributed to statistical uncertainties associated with the finite size of our training/testing samples, in particular when estimating the background rejection factor, where only a small fraction of the events pass the tagger.For example, the Lundlikelihood method uses a sample of 200 000 events for testing.This corresponds to a statistical uncertainty of about 30% at W = 0.2.Correspondingly, the LSTM uses 50 000 events for the testing phase, meaning an expected statistical uncertainty that is twice as large (given similar background rejection).

Resilience to non-perturbative effects
One can argue that performance is not the only feature one may request from a boosted object tagger.In particular, one may require that the tagger remain relatively insensitive to model-dependent non-perturbative effects.Such insensitivity could translate into a reduced uncertainty on the determination of the tagger's signal efficiency and background rejection rates.It could also allow for the possibility of understanding the tagger's behaviour with first-principles perturbative QCD calculations.
To carry out studies of sensitivity to non-perturbative effects, we will compare performance between parton and hadron level.Parton-level results cannot be sensibly passed through a detector simulation, so the study must be carried out with actual particles (i.e.partons or hadrons).However, as we discussed in section 3.4 and appendix C, real detector effects can have a significant impact on the mass resolution in particular, which can affect the conclusions of any multivariate study that uses the mass.Accordingly we carry out two sets of studies in parallel.In the first set of studies, we classify jets in terms of whether they satisfy a loose requirement on the (possibly groomed) jet mass, 65 < m < 105 GeV, and then do not further use any mass information.Such a study is fairly realistic in terms Figure 8: Plots of performance ( W / √ QCD ), for fixed (hadron+MPI) signal efficiency W = 0.4, versus resilience to non-perturbative effects, (ζ of Eq. (3.6)).Grey triangles (and the red one) correspond to the full range of shape observables studied in the LH 2017 study [75].The blue circles and black triangles correspond to the Lund-likelihood and Lund-LSTM methods respectively, with each point along a line corresponding to different lower cuts on the Lund-plane ln k t /GeV value, below which declusterings are ignored (i.e.not passed to the LSTM or likelihood method; training is repeated for each different k t cut).In (a) the shape observables are used together with a cut on a mass variable, 65 < m < 105 GeV (the mass may be groomed, or ungroomed, depending on the point); for the Lund likelihood, Eq. (3.1) is evaluated with a single (groomed) mass bin, covering the same mass range as for the shapes, plus an outflow mass bin.In (b) shape variables are combined with the full particle-level resolution mass information through a boosted decision tree (BDT) and the cut that defines W = 0.4 is placed on the BDT output; for the Lund-likelihood and LSTM methods, full resolution Lund-plane information is used (including the mass for the likelihood method). of how much mass information is accessible in a detector, but cannot be performed with machine learning, because the latter is likely to "cheat" and learn the mass information from other variables in the jet.To be able to also examine machine learning, we therefore carry out a second set of studies, in which full particle-level information is available, allowing reconstruction of the W mass peak.All methods then exploit the unrealistically good particle-level mass resolution on that W mass peak.
In Fig. 8, we show the performance achieved by the different tagging approaches versus their resilience to underlying event and hadronisation corrections.This is calculated following the procedure introduced in section III.2 of the 2017 Les Houches proceedings [75].The performance, W / √ QCD , is plotted versus the resilience ζ, which is calculated using both hadron+MPI-level efficiencies and parton-level efficiencies (all computed for a set of cuts on a shape variable, or multi-variate tagger output, that gives a hadron+MPI-level signal efficiency W = 0.4), where ∆ = − and = 1 2 ( + ).The left-hand plot shows the results obtained in a specific mass-bin, comparing our likelihood method with the results from the LH report [75]. 12The right-hand plot shows the results with full mass information, and includes results with machine learning.Both parton-level and hadron+MPI-level efficiencies are calculated using a discriminator determined/trained using hadron+MPI-level events (this statement holds for all likelihood, LSTM and BDT-based results).
Figure 8 shows grey triangles for each of the 88 combinations of a single shape variable and mass used in the LH 2017 report [75] (the shape and mass being combined via a BDT in the right-hand plot).The grey line is the upper envelope of those points.The specific D [loose] 2 variant discussed in section 3.3 is highlighted in red and one can see that it has the best performance among all shape+mass taggers.
For methods that use the Lund plane information one can impose a lower limit, k t,cut , on the value of k t for which Lund-plane declusterings are considered.Declusterings with lower k t values are simply ignored, both at the training stage and subsequently when evaluating performance and resilience.For the Lund-LSTM method, the tagger is trained separately for each k t,cut value.Larger values of k t,cut are expected to yield taggers that are more resilient to non-perturbative effects.The results for the Lund-likelihood and Lund-LSTM methods are shown as blue and black points respectively (linked by lines) in Fig. 8, each point corresponding to a specific value of the k t cut.
Without a k t cut for the Lund-based taggers, performances qualitatively mirror those in Fig. 7 at the corresponding value of W = 0.4: the Lund-LSTM method performs best, then comes the Lund-likelihood method, followed by D [loose] 2 .Quantitative differences relative to Fig. 7 are a consequence of the lack of detector simulation and the use of a broad mass bin (Fig. 8a) or full mass resolution (Fig. 8b).The quantitative differences are especially large in the latter case, as one would expect (e.g. for the LSTM, W / √ QCD 20 at W = 0.4 translates to 1/ QCD ∼ 2500, compared to 1/ QCD 700 in Fig. 7).
For the Lund-likelihood method, imposing a low k t cut, ln k t,cut < −1, has little impact on the performance or resilience relative to the situation without any cut.Further raising the value of k t,cut initially leads to a rapid loss in performance and modest improvement in resilience.This suggests that there is information in the non-perturbative region when discriminating boosted W jets from QCD jets.For k t,cut = 1 GeV (ln k t,cut = 0), performance is slightly better than the best shape variable at comparable resilience.Only for yet higher values of k t,cut does resilience improve substantially, and then the Lund-likelihood performance remains above that of the shape variables (well above for Fig. 8a).Thus it appears that the Lund-likelihood method performs well not just in terms of raw performance, but also, with a k t cut, in terms of performance for a given degree of sensitivity to non-perturbative effects.
For the Lund-LSTM method, even a small cut on k t rapidly leads to a loss of performance.For ln k t,cut −1, its performance falls below that of the Lund-likelihood method and that remains the case as k t,cut is further increased.In fact, for ln k t,cut −0.5, the performance of the Lund-LSTM method even starts to fall slightly below the most optimal shape variables.This is somewhat puzzling and hints at potential fragility of machinelearning approaches.
Overall we see that while an ML based approach can achieve substantially better performance, the models obtained are not particularly resilient to non-perturbative corrections.We note however that other training methods, e.g. based on adversarial networks [94][95][96], could improve the robustness of the derived taggers to specific effects such as hadronisation, MPI and pileup.
While we have focused here on resilience to corrections from hadronisation and MPI, one could similarly study the resilience of the methods against pileup or detector effects.

Conclusions
The Lund plane offers a powerful new way to study and exploit the internal structure of jets.In contrast to traditional shape observables it connects much more directly to individual regions of phase space.This makes it useful across a range of applications in jet physics.It also brings many declustering based jet observables, such as the iterated soft-drop multiplicity and z g into a single unified framework.
One way of studying the Lund plane is in terms of its average density, as a function of angle and transverse momentum.This density is amenable to calculation within both resummed and fixed-order perturbative methods.We limited our discussion of such a calculation to first order, section 2.2, and identified a number of the contributions that would become relevant at higher orders.Experimentally, we believe that much of the Lund plane phase-space can be reliably determined.This conclusion is based on Delphes fastdetector simulations in conjunction with subjet-particle rescaling type algorithms (SPRA, Appendix C), to recover information at small angles that might otherwise be obscured by finite calorimeter resolution.This offers a clear potential for carrying out experimental measurements of the pattern of radiation in both the perturbative and non-perturbative regimes.One application of such measurements would be to constrain Monte Carlo simulation programs, which as we saw in section 2.3 show up to 30% differences in their predictions of the Lund plane density from one program to another.Another application would be to directly identify which kinematic regions of a jet's radiation pattern are modified in heavy-ion collisions, thus shedding light on the mechanisms of partonic energy loss in a hot, dense medium. 13 use case for the Lund plane that we have explicitly examined is for tagging boosted electroweak bosons.Compared to the jet-image type inputs that have been the mainstay of "visual" machine-learning approaches to jet substructure tagging so far (e.g.boosted-W tagging), many of the features that can be exploited are immediately visible to the human eye.With certain machine-learning methods (notably LSTM's) the Lund-plane inputs appear to yield superior W -tagging performance as compared to jet images.This is despite the fact that by discarding information about secondary leaves of the Lund diagram, we are actually providing less information to the machine learning methods than comes with jet images.We note that for reliable comparisons of the relative quantitative performance of different methods it was important to take into account detector effects.
The fundamental information that is contained in the Lund plane, i.e. the kinematics of declustering sequences, has been used in other recent work on machine learning [38,39].However the Lund plane as a visualisation provides powerful insight into the physical structure of that information and into how that information differs according to the origin of the jet, cf.Fig. 4 for W jets versus Fig. 2 for QCD jets.In particular, the Lund plane's simplicity, and the relatively moderate degree of correlation between different parts of the plane, have the consequence that much of the performance obtained by machine-learning algorithms can be reproduced using conceptually simple log-likelihood approaches.This opens the prospect for a substantial degree of experimental and theoretical understanding of the robustness of the Lund plane information for tagging.That understanding may be useful also in terms of the construction of high-performance decorrelated taggers [99].
There is a potential for a range of other applications, including top-tagging, quarkgluon discrimination, further improvements of boosted electroweak boson tagging, or extensions to the recently proposed soft-drop photon isolation approach [100].Furthermore, Lund-plane type studies need not necessarily be restricted to the study of final-state jets, but may also be informative for the initial state, for example to help discriminate different mechanisms of Higgs-boson production.
Finally, while we have restricted most of our discussion here to the primary Lund plane (other than a brief discussion in appendix B), one cannot help but wonder about the potential benefits to be had from exploiting the structure of the full Lund diagram for jets, cf. the middle row of Fig. 1.This may be relevant both for developing our generic understanding of the structure of jets, and for certain tagging applications, for example with recursive neural networks (as in Ref. [38]) or tree-LSTM architectures [101] to capture the full clustering tree in the machine-learning training.
We therefore look forward to a wide range of studies with Lund-diagram related observables in future work.
gsalam/2017-lund-from-MC for a corresponding implementation.In heavy-ion collisions, background (i.e.UE) contamination appears to be a non-negligible issue, as it may be also in high-pileup pp collisions, depending on the precise pileup-mitigation scheme being used.Various potential approaches to address this were highlighted in section 3.4.

A The Lund plane for (C/A-reclustered) anti-k t jets
Throughout the main text we have determined the Lund plane for jets obtained with an initial Cambridge/Aachen clustering.One could instead use the anti-k t [102] algorithm to find the initial jets, and then recluster their constituents with the C/A algorithm in order to obtain the Lund plane.The averaged primary Lund plane obtained with this procedure is shown in Fig. 9.It is almost identical to Fig. 2, except near ∆ = 0 and ln k t = 0, where the anti-k t jets appear to have an additional structure: an up-right going diagonal structure for ln k t /GeV ∼ 0 and ln R/∆ 0.75.
A reasonable hypothesis is that this structure is associated with the clustering of soft (mostly underlying-event) particles near the edge of the jet.To help understand this in more detail, Fig. 10 shows the rapidity-azimuth distribution of particles in a single C/A jet (upper row) versus a C/A-reclustered anti-k t jet (lower row), at various stages of the declustering.At each stage, the softer subjet is shown in red.The declustering steps are aligned such that the last step shown corresponds to a similar pair of subjets in the two sequences.At the earlier stages, for C/A, one sees that large values of ∆ are associated with large-area softer (red) subjets.In contrast, with reclustered anti-k t jets, the softer subjets for large values of ∆ tend to have smaller areas, constrained by the circular boundary of the original anti-k t jet.Smaller areas imply a smaller amount of p t coming from the underlying event.Therefore in the Lund plane the peak of subjets at large ∆ (small ln R/∆) should come at lower k t for the reclustered anti-k t jets than for the C/A jets.
It is natural to ask whether the pattern of softer-subjet area versus ∆ seen in Fig. 10 holds beyond the case of a single jet.To answer this question, Fig. 11 shows the average subjet area as a function of ln R/∆ for C/A jets (in blue) and C/A-reclustered anti-k t jets (in red).The bands represent the standard-deviation of the jet areas.The event sample and jet selection are identical to those used in Fig. 9.In the C/A case, the softer subjet  The averaged primary Lund plane density (a) for jets initially obtained with anti-k t clustering (whose constituents are then reclustered with the Cambridge/Aachen algorithm) and its ratio (b) to the averaged Lund plane density for jets originally obtained with Cambridge/Aachen clustering (Fig. 2).Note the structure around ∆ = 0 and ln k t = 0 that is present here and not in Fig.

→ • • •
Figure 10: Illustration of declustering sequences for a Cambridge/Aachen jet (upper row) and the same neighbourhood of an event clustered with the anti-k t algorithm and subsequently reclustered with Cambridge/Aachen (lower row).The jets include ghost particles [90,91] so as to illustrate the area of the jets involved at each declustering stage.The plot shows each stage of the declustering, with the softer subjet shown in red (b in the notation of section 2.1) and the harder subjet shown in blue (a).area increases for smaller ln R/∆, i.e. for increasing ∆, and is consistent with an area A b that scales as A b ∼ ∆ 2 .If the density of p t per unit area from the underlying event is ρ, then one expects the k t of typical softer subjets to go as k t ∼ ρA b ∆ ∼ ∆ 3 .In contrast, for ln R/∆ < 0.5, the typical area for reclustered anti-k t softer subjets tends to decrease as ln R/∆ decreases.The scaling near ∆ = 0 is found to be roughly A b ∼ ∆ −2.6 , which would lead to k t ∼ ∆ −1.6 .Note, however that the area scaling behaviours given here include a component where the subjet is moderately hard and so the scaling behaviours for pure underlying-event jets may differ in their details.
A point to note about reclustered anti-k t jets is that it is possible to have ∆ > R, cf. the points at negative ln R/∆ in Fig. 11.This occurs only rarely and tends to be driven by specific configurations of hard particles in the jet.
A final comment is that since the difference in structure between C/A and C/Areclustered anti-k t jets' Lund planes is in a region that is anyway dominated by soft particles from the underlying event, we expect it to have little impact on discrimination power if one uses reclustered anti-k t rather than C/A jets.Explicit studies with the Lund-likelihood method bear out this expectation.
Note that if one uses anti-k t (or k t [103]) jets without first reclustering their constituents with the C/A algorithm, one should expect substantial modifications of the structure of the Lund-plane.In particular, C/A-(re)clustered jets have the property that the expression for the density in Eq. (2.4) is expected to be modified at higher orders mainly by singlelogarithmic factors α n s L n (at least a leading N c ).In contrast, directly using the anti-k t and k t (de)clustering sequences to determine the Lund plane could be expected to lead to double logarithmic modifications to the density in Eq. (2.4), i.e. factors α n s L 2n , associated with strong correlations between different parts of the Lund plane.

B Moderate energy jets and secondary Lund planes
For most of this article, jets have been considered with a p t > 2 TeV cut on the transverse momentum.However, when considering lower energy jets, the peak in the primary Lund plane associated with the W splitting, cf.expected to reduce the performance achieved by taggers based just on the primary Lund plane variables (though the larger fraction of gluon-induced background jets at lower p t may partially counteract this).
The reduced performance at lower p t from primary-Lund plane methods is visible in Fig. 12, which shows the signal efficiency against the background rejection for p t > 500 GeV jets, using the primary Lund-plane log-likelihood method, the primary Lund-plane LSTM method and also the D observable (sensitive also to emissions beyond primary Lundplane ones).Those curves are to be compared to the corresponding ones in Fig. 7 for p t > 2 TeV jets.Using W = 0.4 as a reference point, one sees that all methods are worse in background rejection at lower p t .The loss is a factor of 5 − 7 for the primary-Lund-plane based methods, while it's only a little more than a factor of 2 for the D 's performance at moderate-p t is comparable to that of the primary Lund-plane methods.At higher (lower) signal efficiencies, D [loose] 2 does worse (better) than the primary-Lund-plane methods.
The D 2 observable effectively takes into account information not just from the primary Figure 13: Averaged density, ρ (2ndry) (∆, k t ), of the secondary Lund plane associated with the leading (mMDT(0.025))emission, for jets in which the leading emission's angle satisfied 1 < ln 1/∆ ( ) < 1.5.From left to right: for jets in dijet events, for jets in W W events, and the ratio of the two.The percentages in square brackets indicate the fractions of jets for which the leading emission is in the given bin, for background and signal respectively.but also secondary Lund planes, information that is discarded by the primary Lund-plane log likelihood and LSTM methods.Fig. 13 shows the secondary Lund planes for the leading primary emission, defined as in section 3.1 as the first emission in the C/A declustering sequence that satisfies z > 0.025 (cf. the "mMDT(0.025)"label in the figures).The lefthand plot corresponds to QCD jets, the middle one to W jets.The right-hand plot, which shows the W/QCD secondary Lund-plane ratio, helps illustrate the nature of the discriminating information contained in the secondary Lund plane.In particular, the leading emission in QCD jets will tend to have more large-angle radiation, while for W jets the emissions are more likely to be relatively hard and collinear.
To help test the hypothesis that D 2 is doing well effectively because of the secondary information, we have attempted to explicitly add secondary Lund-plane information also to the log-likelihood and LSTM methods.
For the log-likelihood method we found that the best performance came with two steps: replacing the mMDT(0.025)identification of the leading emission with an identification based on finding the primary emission that had the smallest value of | ln(p ta p tb ∆ 2 /m 2 W ) ln z|; the additional likelihood for the secondary plane, similar to the non-leading primary plane likelihood, is combined with the primary L tot from Eq. (3.4) not by direct addition, but via a 2-dimensional map of the two likelihoods.The fact that this last step was needed suggests that there may be scope for better understanding correlations between the primary and secondary Lund planes.
For the LSTM method, we start by identifying the secondary Lund plane using the same method as in the previous log-likelihood approach.The primary and secondary Lund plane declusterings are then given as input to two separate LSTMs with a dropout layer, Figure 14: Reconstructed mMDT (z cut = 0.025) W mass peak at particle level ("truth"), with Delphes particle flow [76], and additionally with the rescaling algorithms described in the text to improve resolution.The average and standard deviation for the reconstructed W peak at the different levels are shown based on a fit of a Gaussian distribution between 50 and 110 GeV.The generation and selection of jets is as described in section 3.4, using the W W process, in particular selecting jets with the requirement p t > 2 TeV, |y| < 2.5.with 128 units for the primary plane and 64 for the secondary one.The output of the LSTMs are then concatenated and passed to a dense layer with 100 dimensions, and then given to a final two-dimensional layer with softmax activation.
The performances of the methods including the leading emission's secondary Lund plane are shown as dashed lines in figure 12, and are labelled "+2 ndry ".They improve the background rejection by 20 − 30% in the W ∼ 0.3 − 0.6 range, such that our Lund-based tagging including the secondary plane now outperforms D 2 down to about W ∼ 0.3.With the LSTM approach the performance is matched also at lower values of W .Note that at high p t we did not find a significant benefit in including secondary Lund plane information.
We leave a more extensive study of the secondary Lund plane and its impact on jet tagging for future work, keeping in mind also that today's parton showers may not always correctly reproduce the patterns of correlations between emissions [104].
C Detector effects and the subjet-particle rescaling algorithm (SPRA) At particle ("truth") level, in a W sample, with p t,jet > 2 TeV, the mass of the (sub)jet selected by the mMDT procedure is very sharply peaked around the true W mass, with an effective resolution of about 1.5 GeV.In contrast if one uses the Delphes fast detector simulation [76], with particle flow (PF) and the delphes_card_CMS_NoFastJet.tclcard, one finds an mMDT mass resolution of about 9 GeV.This is illustrated in Fig. 14 (we return to the SPRA curves below).Such a large difference in resolution between particle and detector-level can have a big impact on conclusions about performance, especially given that both ML and likelihood-based approaches tend to derive considerable discrimination power from the mass variable.This is true even if the mass is not directly passed as an input to a ML approach, because the mass can quite effectively be deduced from the Lund-plane k t and ∆ variables.
Detector effects can also have a significant impact on the Lund plane at angular scales commensurate with the hadronic calorimeter spacing and lower.This is illustrated in Fig. 15a, which shows the ratio of the Lund plane for dijets as obtained with Delphes particle flow relative to the particle-level truth.In the lower-left corner there is a prominent dark blue region indicating missing Lund plane subjets at detector level: this can be interpreted as a consequence of finite p t thresholds (a given p t maps onto a downwardsright going diagonal).In the right-hand part of the Lund plane, just below the kinematic limit, there are two prominent enhanced regions (in red), with corresponding deficits at lower k t : their positions in ln 1/∆ coincide with the intrinsic angular resolution scales of the hadronic (HCal) and electromagnetic calorimeters (ECal), which in the central part of the detector have η, φ segmentations of 0.087 × 0.087 and 0.0174 × 0.0174 respectively.Those segmentations translate to ln 1/∆ values of about 2.4 and 4.1.
The origin of the enhancements is relatively straightforward to understand.Consider the structure associated with the hadronic calorimeter scale, ∆ HCal .On average, about 10% of particles in jets are undecayed neutral hadrons (for example K L ).Schematically, the particle flow algorithm can identify the energy deposit from such particles as the difference between the energy in a given hadronic calorimeter tower and that observed in the charged tracks that enter the tower.The Delphes PF implementation assigns that energy difference to a point in η, φ that is randomly distributed over the calorimeter tower area.If the neutral hadron has a true separation ∆ true ∆ HCal and transverse momentum k t,true relative to the jet core, the reconstructed separation and transverse momentum will be where the scaling of the transverse momentum, k t , relative to the jet core, follows from the assumption that the neutral hadron's transverse momentum p t relative to the beam direction is correctly determined (k t = p t ∆, cf.Eq. (2.1a)).For a jet core containing O (10−20) particles, there will typically be at least one neutral hadron, which will be reconstructed with an angular scale ∆ HCal and a transverse momentum that is larger than its true transverse momentum.It is this that creates the red bump around ln 1/∆ ln 1/∆ HCal 2.4: with the particle flow algorithm there is nearly always something in that region, whereas at truth level there isn't.This is arguably also the origin of the long tail to high masses for the Delphes curve in Fig. 14.An analogous phenomenon occurs on the electromagnetic calorimeter granularity scale.The depletions at lower k t values may be shadows induced by the enhanced regions.(Given an emission at some ∆, a fraction of emissions at lower k t and similar ∆ and ψ get clustered with it and so are assigned to its secondary Lund plane).
with a single HCal cell were rescaled to match the total HCal + ECal energy.Ref. [78] extended the procedure, applying the scaling within minijets.Nowadays, CMS has an approach referred to as "split PF photons+neutrals" [83], which is conceptually similar insofar as it distributes neutral-hadron energy across the ECal cells (it also includes tracking improvements for high-energy jets).Schaetzel and Spannowsky investigated rescaling the charged tracks in jet to match the jet's total calorimeter energy ("track-flow" in the nomenclature of Ref. [82]; here we will call it charge-rescaling).A procedure that is functionally equivalent, track-assisted mass, has been studied by ATLAS [84] to improve its mass resolution.Other studies of the question include Refs.[80][81][82]105].
To mitigate the impact of detector granularity, we adopt a subjet particle rescaling algorithm (SPRA) that is similar in spirit to that of Ref. [78].We have two variants, SPRA1 and SPRA2.For SPRA1, we take a jet and recluster its Delphes particle-flow objects into subjets using the C/A algorithm with radius R h = 0.12, commensurate with the hadronic calorimeter granularity. 14Taking each subjet in turn we scale each PF charged-particle (h ± ) and photon (γ) candidate that it contains by a factor f 1 , and discard the other particles (i.e.neutral hadron candidates).If the subjet does not contain any photon or charged-particle candidates, we instead retain all of the subjet's particles with their original momenta.After having applied this procedure to each subjet, we then recluster the full set of resulting particles, i.e. from all subjets, into a single large jet and evaluate the mass and Lund plane on that set of particles.The SPRA2 variant is a similar but carries out two levels of rescaling.After applying the SPRA1 algorithm, we recluster the resulting particles with a radius R e = 0.03, commensurate with the electromagnetic calorimeter granularity.Taking each (R e ) subjet in turn, we scale each PF charged-particle (h ± ) candidate that it contains by a factor f 2 , and discard the other particles (i.e.photons and possibly some neutral hadron candidates).If the subjet does not contain any charged-particle candidates, we instead retain all of the subjet's particles with their original momenta.Again, after having applied the procedure to each subjet, we take all the particles and produce a single large-radius jet from them.Fig. 14 shows that there is some gain in mass resolution from the SPRA1 algorithm, from about 9 GeV to 7 GeV.There is, however, only limited gain in going to the double rescaling, SPRA2, algorithm.
If we now consider the Lund plane reconstruction, Fig. 15, we see that SPRA1 (Fig. 15c) eliminates most of the structure associated with the hadronic calorimeter scale, while SPRA2 (Fig. 15d) also alleviates some of the structure associated with the electromagnetic calorimeter scale.Overall, the conclusion is that with the help of the SPRA algorithms, a large part of the Lund plane can be measured fairly reliably, with detector effects that remain limited to within 20−30%.This is visible also in the plots of Lund-plane slices in Fig. 16.
An even simpler approach is to adopt a jet-wide track rescaling, where every charged track is multiplied by a factor f chg = p t,jet i∈jet(h ± ) p t,i .
(C.4) and only those rescaled tracks are used as an input.This performs fairly well, cf. the Lund-plane density ratio in Fig. 15b.A final test of the SPRA algorithms is shown in Fig. 17, which compares the background rejection power of the log-likelihood W -tagger, as a function of signal efficiency, for truth particles and for Delphes PF with and without SPRA.A first observation is that for a signal efficiency of 0.5, the background rejection is about six times worse with Delphes PF as compared to truth particles (the factor is even larger with machine learning taggers).The SPRA1 algorithm brings about a factor of two improvement relative to plain PF.The further gain from the SPRA2 algorithm is limited (and perhaps not statistically significant).Accordingly for our main W -tagger performance results in section 3.5 we use SPRA1, which is arguably also the most similar to the procedure used by CMS in Ref. [83].However at Figure 17: Performance of the Lund-likelihood discriminator, at truth level and with Delphes PF, with and without the SPRA algorithms, and also with charge rescaling.As a function of signal efficiency, the plot shows the ratio of background rejection relative to that obtained for Delphes PF+SPRA1.The log-likelihood maps have been determined separately for each setup (truth, Delphes PF, etc.).The generation and selection of jets is as described in section 3.4, using the dijet process, selecting jets with the requirement p t > 2 TeV, |y| < 2.5.higher p t 's or for measurements of the Lund plane, it is probably advantageous to use SPRA2.Fig. 17 also shows charge-rescaling, which performs less well than the SPRA approaches, though still better than Delphes particle flow alone.

Figure 1 :
Figure 1: Different representations for two jets.Top: the particles inside the jet.Middle: the full Lund diagram.Bottom: the primary Lund plane.See text for further details.

(Figure 2 :
Figure 2: (a) The average primary Lund plane density, ρ, for jets clustered with the C/A algorithm and R = 1 having p t > 2 TeV and |y| < 2.5, in a simulated QCD dijet sample.(b) Schematic representation of the different regions of the Lund plane.

Figure 3 :
Figure 3: Emission density along slices of the Lund plane, at fixed k t (top) and ∆ (bottom), comparing three event generators.

Figure 5 :
Figure 5: Distribution of the leading emission 1 N X

Figure 7 :
Figure 7: Background rejection (1/ QCD ) versus signal efficiency ( W ), per jet, for different W -tagging methods.The lower panel shows the ratio to the Lund+likelihood method.

Figure 9 :
Figure 9:The averaged primary Lund plane density (a) for jets initially obtained with anti-k t clustering (whose constituents are then reclustered with the Cambridge/Aachen algorithm) and its ratio (b) to the averaged Lund plane density for jets originally obtained with Cambridge/Aachen clustering (Fig.2).Note the structure around ∆ = 0 and ln k t = 0 that is present here and not in Fig.2.
Figure 9:The averaged primary Lund plane density (a) for jets initially obtained with anti-k t clustering (whose constituents are then reclustered with the Cambridge/Aachen algorithm) and its ratio (b) to the averaged Lund plane density for jets originally obtained with Cambridge/Aachen clustering (Fig.2).Note the structure around ∆ = 0 and ln k t = 0 that is present here and not in Fig.2.

√s = 14 Figure 11 :
Figure 11: Average area (points), and standard deviation (band), of the softer subjet (subjet b) in the Lund plane declusterings, shown as a function of ∆ for C/A jets (blue) and for C/A-reclustered anti-k t jets (red).

Fig. 4 ,
moves to the left and the shadow region to its left, associated with the colour-singlet nature of the W , become less visible.This is 0

Figure 12 :
Figure12: Background rejection (1/ QCD ) versus signal efficiency ( W ), with a transverse momentum cut on the jets of 500 GeV.The lower panel shows the ratio to the Lund+likelihood method.The solid curves are to be compared to the corresponding ones in Fig.7for jets with p t > 2 TeV (note the different scale in the lower panel).

Figure 16 :
Figure16: Lund-plane slices comparing the truth result to Delphes PF with and without the SPRA rescalings.The slices are shown at fixed k t as a function of ∆ (left) and at fixed ∆ versus k t (right).The artefacts visible in the top-left plot at scales of the hadronic (∆ ∼ 0.087) and electromagnetic (∆ ∼ 0.0174) calorimeters are well brought under control by the SPRA1 and SPRA2 rescalings respectively.The generation and selection of jets is as described in section 3.4, using the dijet process, selecting jets with the requirement p t > 2 TeV, |y| < 2.5.