Tagging Boosted Ws with Wavelets

We present a new technique for distinguishing the hadronic decays of boosted heavy particles from QCD backgrounds based on wavelet transforms. As an initial exploration, we illustrate the technique in the particular case of hadronic $W$ boson decays, comparing it to the ``mass drop'' cut currently used by the LHC experiments. We apply wavelet cuts, which make use of complementary information, and in combination with the mass drop cut results in an improvement of $\sim$7% in discovery reach of hadronic $W$ boson final states over a wide range of transverse momenta.


Introduction and motivation
Now that experiments have discovered a light Higgs boson whose properties are roughly in line with Standard Model (SM) expectations [1,2], attention naturally turns to the question of stabilizing the electroweak scale and physics beyond the SM [3][4][5][6]. We now know that a weakly-coupled scalar boson exists, and protecting its mass from large quantum corrections is critical. The physics which achieves this goal is very likely to be coupled to the electroweak sector, and particularly to the weak gauge bosons, which are thus natural messengers to new physics. The usual strategy to identify weak bosons at a hadron collider is to identify their leptonic decays, as hard leptons unassociated with jets are rare and thus have smaller backgrounds. However, there are compelling reasons to consider hadronic decays as well. Hadronic W decays make up roughly two thirds of all decays, and their inclusion in searches can dramatically improve statistics. The primary challenge to this goal is the enormous rate for QCD production of jets, leading to large numbers of jet pairs whose invariant mass "by accident" reconstruct to something close to the mass of the W boson.
New electroweak physics must be somewhat heavy in order to evade current constraints from colliders, suggesting that decays are likely to produce relativistic electroweak bosons. This boosted feature in turn leads to properties that provide handles one can exploit to sift true W s from the QCD background. A boosted W decays into two jets whose typical angular separation is characterized by the mass and momentum of the parent boson. In the limit of extreme boost, the two child jets tend to merge into a single cluster of hadronic energy, but retain the two hard kernels. These hard subjets are the key to distinguishing hadronically decaying W bosons from the QCD background, and their exploitation has formed a very productive industry in collider physics over the past few years, with strategies having been developed [7][8][9][10][11][12] for tagging top quarks [13][14][15][16][17][18][19][20][21][22][23], Higgs bosons [24][25][26][27], heavy gauge bosons [28][29][30][31][32], and even hypothetical particles [33,34].
In this work we explore an alternate approach to boosted object tagging. Previous strategies have focused on simple variables such as the jet mass and upon deconvolving the jet algorithms to understand the way in which a jet is built from its constituents as ways of understanding the high-scale process which has given rise to the jet. These approaches have been refined in various ways as our understanding of soft and collinear QCD has improved, and have always taken their motivation from the underlying physics which is trying to be identified.
We step back from the physics-inspired tagging techniques and attempt to apply a welldeveloped tool which has been successfully used in many other fields to this problem. This tool-driven approach to tagging leads to very different observables which are nonetheless sensitive to the differences in the substructure of the events that we are trying to identify. The technology we bring to bear is the wavelet transform, a well-understood mathematical technique which has been successfully applied to many scientific analyses (such as mapping the fluctuations in the CMB) as well as computing uses such as data compression and noise reduction in images and audio. As we will demonstrate below, combining these observables with preexisting boosted object identification techniques leads to a modest improvement in the acceptance for weak bosons at identical jet rejection rates.
In the next section (Sec. 2), we will introduce the wavelet transform and discuss some of its uses and relevant properties for our purposes. In Sec. 2.1 we present our methods for utilizing the wavelet transform as a boosted W boson tagger, the results of which are presented in Sec. 3. Finally, we will conclude and discuss directions of possible future work using these techniques in Sec. 4.

Wavelet Analysis of Jet Physics
Wavelets are a type of localized Fourier transform, interpolating between the two extremes of presenting information purely in the bases of position and frequency. They have been employed in many different fields, as disparate as cardiology, image processing, CMB physics, and data compression and denoising. In applications to collider physics, there is a natural mapping of calorimetric data onto a grayscale image, where the brightness of the image pixel corresponds to the energy deposited into the corresponding calorimeter cell.
The simplest wavelet transform which is applicable to a fundamentally discrete two dimensional problem such as a calorimeter is the discrete wavelet transform using two dimensional Haar wavelets. In this case each type of initial "mother" wavelet is chosen to be two pixels in size in each direction and convolved with the data such that each pixel has been sampled once by each type of wavelet. In addition to the map of wavelet convolutions, a residual map is formed as the average of the data over each 2 × 2 area. This averaged data then has the same procedure applied to it, effectively sampling the original data with a wavelet size of four pixels. This is performed iteratively until all scales contained within the data have been probed by the appropriate wavelet. In this way, the average of all pixels combined with the complete set of wavelet coefficients constitutes a (lossless) representation of the original image in terms of its frequency content, with each map of the power at a given frequency saturating the resolution appropriate for that frequency. Already, it is clear that in addition to the small scale structure associated with local clusters of energy in the calorimeter, the wavelet transform also characterizes global properties of the event such as the summed hadronic energy and jet momentum imbalance.
There are a number of challenges to effectively applying this strategy to searches for local features such as jet substructure indicative of a boosted W boson decay. From a purely practical point of view, the hadronic calorimeter (HCAL) cells contain far less positional information than is actually available. Vast improvements in angular resolution are possible by incorporating particle flow data from the electromagnetic calorimeter (ECAL) and tracker into the reconstruction of hadronic energy within each cell (e.g. [35]). We will discuss defining an appropriate choice of 'pseudo-calorimeter', which can simplify the wavelet analysis of substructure, below.
Another issue is that the discrete wavelet transform is not translationally invariant, which has the unfortunate consequence that a feature of a particular size can manifest in differently sized wavelets depending upon where it happens to lie in the detector. For instance, if there were a dataset of four pixels which contained a perfect copy of one of the Haar wavelets in two of those pixels it might be seen in both the two-and the four-pixel wavelets if it were placed in the central two pixels, or only in the two-pixel wavelet if it was in any other position. The stationary wavelet transform effectively computes the discrete wavelet transform for all possible choices of origin within the image, which regains the property of (discrete) translational invariance at the cost of keeping some redundant information. This forces all structures to appear in the smallest size wavelets that can successfully capture them (as well as all larger sizes).

W -Tagging with Wavelets
While it could be possible to proceed without an explicit choice of jet algorithm, we find that it simplifies the analysis to begin by clustering all of the jets in a given event using an algorithm which finds the interesting regions of jet activity. This allows us to take advantage of jet grooming techniques to reduce background from stray radiation that is unlikely to be associated with the jet itself, and makes contact with existing substructure strategies to identify boosted W s.
In practice, we consider the Cambridge-Aachen [36,37] jet algorithm 1 , with R = 1.0, where R ≡ (∆η) 2 + (∆φ) 2 is the cone size as defined in the pseudo-rapidity-azimuth (η −φ) plane. The jet is then pruned [7] to reduce background from pile-up by re-clustering it subject to a veto of soft and large angle recombinations between pseudo-jets in the clustering process. There are two cut parameters that are used to define the pruning algorithm, R cut and z cut as defined in [7]. We will choose fixed benchmark values of R cut = 0.25 and z zcut = 0.1 in our analysis.
Having identified and cleaned up a boosted W candidate jet, we map its energy decomposition into the η − φ plane as a monochrome "jet image". A typical hadronic W event has two distinct "hot spots" in this image, whereas a typical QCD jet has a single hot spot with some ambient radiation around it. We can simplify the substructure processing by choosing the resolution of this image appropriately, such that typical W bosons, for a wide range of p T , are expected to span roughly the same number of pixels in the images. Since the partons from a W decay are typically expected to have a lab frame angular separation of ∼ 2m W /p T , choosing a resolution which depends on the jet p T as has the consequence that W s at all p T s are expected to span roughly the same number of pixels (8) in each of our "images". At the lowest p T we consider (200 GeV), this resolution is about the resolution of the HCAL itself, whereas at p T ∼ 1000 GeV it is about 10 times better, corresponding to a typical ECAL resolution. Based on this choice of resolution, we construct the region of interest as a 32 × 32 grid centered around the axis of the jet being studied. The next step is to decompose the image by convolving it with a set of wavelet filters. Each filter is a 2 n × 2 n pixel image (with n ranging from 1 to 5). The filters are uniform images with the value of each pixel being 1/p T . For each scale n, we find the window in our 32 × 32 image that maximizes the overlap between the filter and the jet image. This filter is known in the image processing literature as the "father" Haar wavelet. Unlike the mother wavelet filter, which measures the difference in a 2 n × 2 n window, the father wavelet measures the average across the window. We construct the overlaps of the filter with windows that include the pixel containing the jet axis. Thus, for n = 2 we need to consider 4 windows, for n = 3 there are 16 windows of interest, and so on. For each n, we find the particular father wavelet coefficient that maximizes the overlap of the filter with the image. We collect all these coefficients for different window sizes and label them as f n .
The spectrum of coefficients, f n has an interesting behavior for hadronically decaying W s. The spectral coefficients start off small but then experience a jump as first one prong of the W is enclosed (f ∼ 0.5) and then a second jump (to f ∼ 1) when the second prong is captured. In either case, the spectrum is characterized by large jumps in the spectral coefficients for window sizes of ∼ 4 × 4 (for the first prong) and ∼ 8 × 8 (for the second prong). This is in contrast to the case of an ordinary QCD jet, which typically has a single prong, along which the jet axis must inevitably be closely aligned. Thus, the typical spectral coefficients starts with f already close to 1 and then quickly approach 1 as n increases. This suggests that distinguishing a boosted W jet from a QCD jet can make use of the (discrete) second derivatives of the spectral coefficients at n = 2 and n = 3, which measure the "kinkiness" of the spectrum at the relevant scales of the image. We define the wavelet parameter w j as the larger of the absolute value of these two quantities for a given image: (2. 2) The distinction between QCD and W jets is that we expect a large value of w j for W -jets but not for QCD jets. Below, we explore imposing a cut w j > w cut for the jet to be classified as a W -jet. These features are illustrated in Figs. 1a and 2a, which show jet images for a typical W jet (with p T = 1104 GeV) and a QCD jet (with p T = 870 GeV), respectively. In each image, the pixel containing the jet axis is indicated by the black dot. Because of the choice of image resolution via Eq. (2.1), the image of the W event shows two hot spots separated by ∼ 8 − 10 pixels, with the jet axis slightly closer to one of the prongs. The filters of different sizes that maximize the overlap with the image and contain the jet axis are shown as the blue outlined squares of the appropriate size. The spectral coefficients f n are plotted as a function of the window size in Figs. 1b and 2b. Comparing the two spectra, we see that the W event exhibits the characteristic kinks corresponding to picking up first one prong of the W and then the second, leading to a large value of w j , whereas the QCD event has a spectrum that is very nearly flat and a correspondingly small value of w j .

Mass Drop Tagging
We compare our wavelet-based W tagger with a current procedure used by the CMS (e.g. Ref [30]) and ATLAS (e.g. Ref [39]) experiments based on jet mass and mass drop cuts [24]. Other techniques that have been proposed to identify boosted hadronic W decays include cuts on the 2-subjettiness (τ 2 /τ 1 ) [8] and Q-jets [11]. A multivariate analysis using a combination of observables has been suggested in [29]. In assessing the performance of the wavelet tagger, we compare to results based on the jet mass and mass drop cuts as benchmarks. In the spirit of Ref. [29], we consider the wavelet tagger in tandem with the jet mass and mass drop, to explore the potential gain in acceptance and fake rejection. It is worth keeping in mind that more optimal results could perhaps be obtained by combining our tagger with more of these other approaches through a multivariate strategy.
The initial steps concerning the jet selection and pruning are essentially the same as applied above in Sec. 2.1. From there, a basic cut is applied to the mass of the jet (m J ), since at parton level the constituents of a boosted W jet will tend to have an invariant mass near m W 80.4 GeV. As a benchmark, we choose the cut applied by the CMS diboson resonance search [30], which selects boosted hadronic W s by requiring 70 GeV < m J < 100 GeV.
One can dramatically improve the separation of hadronic W s from QCD jets via a massdrop tagging algorithm [24]. The basic idea is that some step in the W -jet clustering must typically involve combining the two parton level sub-jets from W -decay into a single fat Wjet. This would mean that the typical pseudo-jet mass before this combination step should be small compared to the pseudo-jet mass after combination, whereas no such effect should be expected for a QCD jet. This algorithm can be understood as a series of steps: 1. The last step of the clustering is undone: j → j 1 , j 2 , with m j 1 > m j 2 , where j 1 and j 2 are the pseudo-jets in the previous clustering step.
2. If there is a large mass drop, µ ≡ m j 1 /m j < µ cut , and the splitting is sufficiently symmetric, y ≡ min(p 2 T j 1 , p 2 T j 2 )∆R 2 j 1 j 2 /m 2 j > y cut , then j is identified as the W candidate with j 1 and j 2 its child subjets. Here, p T is the transverse momentum of the pseudo-jet and ∆R denotes the separation in η − φ space of the pseudo-jets.
3. If there is no large mass drop at this level, redefine j to be equal to j 1 and go back to step 1.
The mass drop algorithm is a function of two parameters, y cut and µ cut , which characterize how much the jet mass shifts due to a single round of clustering, and how symmetrically the energy is partitioned between the two subjets. The CMS analysis [30] uses µ cut = 0.25. We find that in the range 0.1 < µ cut < 0.4, the tagging efficiency is mildly sensitive to changes in y cut , but that changes in µ cut result in large changes of performance 2 . We fix y cut = 0.09 and scan over µ cut , defining a family of mass drop performance points, with varying W acceptance and jet fake rates.

Results
We generate a sample of boosted W bosons as part of the W + W − diboson rate, and one composed of high p T QCD jets via dijet production at LHC design energy of √ s = 14 TeV. Both samples are generated by MadGraph 5 [40] at parton level and showered using Pythia 8 [41] with the default tune. We use MadGraph to decay the W -bosons at parton level in order to retain the full angular correlations. The resulting jets from either process are clustered according to the CA algorithm (R = 1.0) using all final state particles from the shower other than neutrinos by employing the SpartyJet [42] wrapper for FastJet [43]. The resulting jets are pruned as described above in Section 2.1. We divide the jets into 7 p T -bins of width 200 GeV each, and present results as a function of the p T band.
For a given choice of cuts, we define the W (signal) acceptance fraction as s and the QCD jet (background) acceptance to be b . Two quantities that serve as figures-of-merit as a function of the control parameters are s / √ b and s / b . s / √ b is the quantity that characterizes performance of of a search, where it directly translates into an enhancement factor of the discovery significance compared to the case where no W -tagging is employed (in the Gaussian regime). s / b better characterizes high-precision measurements which benefit from greater purity of signal. For each choice of figure-of-merit, we optimize its value as a function of the W -acceptance fraction s . We can scan through different s and b by adjusting the cut parameters that define a given tagging algorithm. For example, for the mass drop cut, we (after applying the jet mass cut described above) fix y cut = 0.09. This leaves µ cut as the control parameter for the mass drop trigger, which we vary to sweep through different values of s and b , resulting in curves of each figure-of-merit as a function of s .  Figure 3: (a) Log of number of QCD jet events (with arbitrary normalization) that pass a given value of µ cut and w cut with 800 < p T < 1000 GeV. (b) Log of number of hadronic W events (with arbitrary normalization) that pass a given value of µ cut and w cut with 800 < p T < 1000 GeV.
For the wavelet tagger (which we use in conjunction with the mass drop tagger), we begin with the same jet mass window cut and y cut as in the conventional tagger. Both µ cut and w cut are scanned to see what fraction of QCD jets and what fraction of W jets pass the selection cuts. The distribution of the number of events in the 800 GeV < p T < 1000 GeV band that pass a given value of µ cut and w cut are shown in Figs. 3a, 3b for QCD and W jets, respectively.
In each p T band, we determine the cut parameters resulting in the optimal value of s / √ b for a given s . We find that to a good approximation a fixed value of w cut = 0.16 serves for all p T bands, upto high W acceptances ( s ∼ 0.7). Varying µ cut tunes the value of s and determines the corresponding value of s / √ b . The resulting tagging efficiency plots for the wavelet tagger are shown in Fig. 4a for jets identified in the 800 GeV < p T < 1000 GeV band. For comparison, we also show the efficiency curve based on the mass drop tagger alone. The black dot indicates the efficiency point corresponding to the application of the jet mass cut without the mass drop improvement. A separate scan determines the optimal cut values for s / b . In this case, the optimal choice for the wavelet cut is w cut = 0.23. Once again, varying µ cut adjusts s and determines b . The resulting efficiency curve is shown in Fig. 4b for jets in the 800 GeV < p T < 1000 GeV band. The ratio of peak values of s / √ b with the wavelet + mass drop cut compared to the mass drop cut alone is shown in Fig. 5, indicating a fairly constant (with respect to p T ) improvement in the search sensitivity of 6 − 7%.

Outlook
In this work, we have examined wavelets, a simple tool that is well understood in the field of image processing, to identify boosted hadronic W s. Our technique maps the energy flow  Figure 4: (a) Efficiency curves of s / √ b vs s using the wavelet tagger vs the conventional jet mass + mass drop tagger. We have also shown the efficiency point corresponding to using the jet mass cut alone. (b) Efficiency curves of s / b vs s using the wavelet tagger and the conventional jet mass + mass drop tagger. We have also shown the efficiency point corresponding to using the jet mass cut alone. All efficiency curves are shown in the p T band with 800 < p T < 1000 GeV. We can see a clear improvement of the figures-of-merit when using the wavelet tagger over the conventional conventional jet mass + mass drop tagger.  Figure 5: Ratio of peaks of the efficiency curves of s / √ b vs s for the wavelet tagger over the conventional jet mass + mass drop tagger plotted as a function of jet p T . We can see a 6 − 7% improvement in the discovery reach using the wavelet tagger.
around a jet into a grayscale image, and then deconstructs that image into a 1-D spectrum of coefficients. Jumps in that spectrum, parameterized by the wavelet parameter w j , can distinguish boosted W bosons from QCD jets. In tandem with the mass drop tagger currently in use by CMS, the wavelet cut results in a modest improvement of 6 − 7% in the efficiency for signal divided by the square-root of the background, s / √ b , over a wide range of jet p T . We chose to begin with W jets, but nothing in the technique is particularly specific to that application; one could imagine applying wavelet technology to searches for hadronic decays of boosted Z bosons, Higgs bosons, and top quarks. In fact, the wavelet's ability to deconstruct multiple scales at once could have interesting applications to decays with multi-scale features such as are present in top decays, or to tease out ancillary information such as polarizations. This multi-scale capability further implies that wavelets present an opportunity to look at more global event properties as well.
Wavelets are a powerful signal analysis tool and have been used in a wide variety of applications across different disciplines. We have only scratched the surface of possible applications to collider physics in this work. We intend to release a FastJet plugin to facilitate application of this technique to future jet substructure studies.