Identification of boosted, hadronically decaying W bosons and comparisons with ATLAS data taken at $\sqrt{s} = 8$ TeV

This paper reports a detailed study of techniques for identifying boosted, hadronically decaying $W$ bosons using 20fb$^{-1}$ of proton-proton collision data collected by the ATLAS detector at the LHC at a centre-of-mass energy $\sqrt{s} =$ 8 TeV. A range of techniques for optimising the signal jet mass resolution are combined with various jet substructure variables. The results of these studies in Monte Carlo simulations show that a simple pairwise combination of groomed jet mass and one substructure variable can provide a 50% efficiency for identifying $W$ bosons with transverse momenta larger than 200 GeV while maintaining multijet background efficiencies of 2-4% for jets with the same transverse momentum. These signal and background efficiencies are confirmed in data for a selection of tagging techniques.


Introduction
The high collision energies at the Large Hadron Collider (LHC) can result in the production of particles with transverse 1 momenta, p T , much larger than their mass. Such particles are boosted: their decay products are highly collimated, and for fully hadronic decays they can be reconstructed as a single hadronic jet [1] (a useful rule of thumb is 2M/p T ∼ R: twice the jet mass divided by the p T is roughly equal to the maximum opening angle of the two decay products). Heavy new particles as predicted in many theories beyond the Standard Model can be a source of highly boosted particles.
The work presented here is the result of a detailed study of a large number of techniques and substructure variables that have, over recent years, been proposed as effective methods for tagging hadronically decaying boosted particles. In 2012, the ATLAS experiment collected 20.3 fb −1 of proton-proton collision data at a centre-of-mass energy of √ s = 8 TeV, providing an opportunity to determine which of the many available techniques are most useful for identifying boosted, hadronically decaying W bosons. In the studies presented here, jets that contain the W boson decay products are referred to as W-jets.
A brief overview of the existing jet grooming and substructure techniques, along with references to more detailed information, are provided in Sect. 2. The ATLAS detector is described in Sect. 3, and details of Monte Carlo simulations (MC) in Sect. 4. The event selection procedure and object definitions are given in Sect. 5.
The body of the work detailing the W-jet tagging performance studies is divided into a broad study using MC (Sect. 6) and a detailed study of selected techniques in data (Sect. 7).
In Sect. 6 a two-stage optimisation procedure has been adopted: firstly more than 500 jet reconstruction and grooming algorithm configurations are investigated at a basic level, studying the groomed jet mass distributions only. Secondly, 27 configurations that are well-behaved and show potential for W-jet tagging are investigated using pairwise combinations of mass and one substructure variable.
In Sect. 7, one of the four most promising jet grooming algorithms and three substructure variables are selected as a benchmark for more detailed studies of the W-jet tagging performance in data. Jet mass and energy calibrations are derived and uncertainties are evaluated for the mass and the three selected substructure variables. Signal and background efficiencies are measured in tt events and multijet events, respectively. Efficiencies in different MC simulations and event topologies are compared, and various sources of systematic uncertainty and their effects on the measurements are discussed.
In Sect. 8 the conclusions of all the studies are presented. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upwards. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis.
2 A brief introduction to jets, grooming, and substructure variables

Jet grooming algorithms
The jet grooming algorithms studied here fall into three main categories: trimming [2], pruning [3,4] and split-filtering [5]. Within each category there are several tunable configuration parameters, in addition to the chosen initial jet reconstruction algorithm, Cambridge-Aachen [6] (C/A) or anti-k t [7], and jet radius parameter R. The FastJet [8] package is used for jet reconstruction and grooming. Jet grooming algorithms generally have two uses; (i): to remove contributions from pileup (additional pp interactions in the same or adjacent bunch crossings within the detector readout window), and (ii) to reveal hard substructure within jets resulting from massive particle decays by removing the soft component of the radiation.
The three major categories of jet grooming algorithms are described below: • Trimming: Starting with constituents of jets initially reconstructed using the C/A or anti-k t algorithm, smaller 'subjets' are reconstructed using the k t algorithm [9] with a radius parameter R = R sub , and removed if they carry less than a fraction f cut of the original, ungroomed, large-R jet p T . For reference, the recommended trimming configuration from prior ATLAS studies [10] is anti-k t , R = 1.0, with f cut ≥ 5 % and R sub = 0.3.
• Pruning: The constituents of jets initially reconstructed with the C/A or anti-k t algorithms are re-clustered with the C/A algorithm with two parameters: R cut and Z cut . The k t algorithm was used for re-clustering in previous studies [10], but was not found to be as effective. In each pairwise clustering, the secondary constituent is discarded if it is (i) wide-angled: ∆R 12 > R cut ×2M/p T , where ∆R 12 is the angular separation of the two subjets; or (ii) soft: f 2 < Z cut , where M is the jet mass and f 2 is the p T fraction of the softer constituent with respect to the p T of the pair. A configuration of the pruning algorithm is favoured by the CMS experiment for W-jet tagging [11,12], using C/A jets with R = 0.8 and pruning with Z cut =10% and R cut = 1 2 . • Split-filtering: This algorithm has two stages: the first (splitting) is based on the jet substructure, and the second (filtering) is a grooming stage to remove soft radiation. For the first stage, C/A jets are de-clustered through the clustering history of the jet. This declustering is an exact reversal of the C/A clustering procedure, and can be thought of as splitting the jet into two pieces. The momentum balance, √ y 12 , is defined as: where p T1 (p T2 ) is the piece with the highest (the lowest) p T , and m 12 is the invariant mass of the two pieces.
The mass-drop fraction µ 12 is the fraction of mass carried by the piece with the highest mass: (2) Figure 1: Key to the various distance measures used in the calculation of substructure variables. The large black circle represents a jet in (η, φ) space. The small, filled (orange) circles represent the constituents from which the jet is reconstructed. The various distance measures indicated are used by one or more of the algorithms described in the text. The abbreviation 'wta' stands for 'winner-takes-all'.
If the requirements on the mass-drop µ 12 < µ max and momentum balance √ y 12 > √ y min are met then the jet is accepted and can proceed to the filtering stage. Otherwise the de-clustering procedure continues with the highest mass piece: this is now split into two pieces and the µ 12 and √ y 12 requirements are again checked. This process continues iteratively. In the filtering stage, the constituents of the surviving jet are reclustered with a subjet size of R sub = min(0.3, ∆R 12 ) where ∆R 12 is taken from the splitting stage. Any remaining radiation outside the three hardest subjets is discarded. This algorithm differs somewhat from pruning and trimming in that it involves both grooming and jet selection. A version of this algorithm is favoured by ATLAS diboson resonance searches [13-15].

Substructure variables
Substructure variables are a set of jet properties that are designed to uncover hard substructure within jets. An important difference in the substructure variables comes from the choice of distance measure used in their calculation. The various distance measures available are illustrated in Fig. 1. The jet axis is usually defined as the thrust axis (along the jet momentum vector) and can also be defined as the 'winner-takesall' axis which is along the momentum vector of the constituent with the largest momentum.
The many jet substructure techniques can be roughly categorised as follows: • Jet shapes use the relative positions and momenta of jet constituents with respect to each other, rather than defining subjets. The jet mass, M, energy correlation ratios C (β) 2 [16] and D (β) 2 [17,18], the mass-normalised angularity a 3 [19], and the planar flow, P [19], all satisfy this description. The calculations of the jet mass and energy correlation ratios are described later in this section.
• Splitting scales use the clustering history of the jet to define substructures ('natural subjets'). The splitting scales studied here are √ d 12 [20] and its mass-normalised form √ z 12 [21], and the momentum balance and mass-drop variables √ y 12 and µ 12 , defined above in the description of the split-filtering algorithm. The soft-drop level L SD (β) [22] also belongs in this class of variables.
• Subjettiness variables [23,24] force the constituents into substructure templates to see how well they fit ('synthetic subjets'), and are connected to how likely the corresponding jet is composed of n subjets. The calculations for two forms of 2-subjettiness τ 2 , τ wta 2 , and the corresponding ratios τ 21 , τ wta 21 are given later in this section. The dipolarity [25], D, uses a related method to define hard substructure.
• Centre-of-mass jet shapes transform the constituents and then use them with respect to the jet axis. The variables considered are thrust, T min , T maj , sphericity, S , and aplanarity, A, which have been used in a previous ATLAS measurement [26].
• Quantum-jet variables The quantum jets ('Q-jets') method [27] is unique in its class, using a non-deterministic approach to jet reconstruction. More information on the use of this method by ATLAS can be found in Ref. [28].
The variables found in the following studies to be most interesting in terms of W-jet tagging are described here in more detail.

Jet Mass:
The mass of a jet is given by the difference between the squared sums of the energy E i and momenta p i of the constituents: For a two-body decay, the jet mass can be approximated as:

N-subjettiness:
The "N-subjettiness" [23,24] jet shape variables describe to what degree the substructure of a given jet J is compatible with being composed of N or fewer subjets. The 0-, 1-and 2-subjettiness are defined as: where the distance ∆R refers to the distance between constituent i and the jet axis, and the parameter β can be used to give a weight to the angular separation of the jet constituents. In the studies presented here, the value of β = 1 is taken. The calculation of τ N requires the definition of N axes, such that the distance between each constituent and any of these axes is R a N ,i . In the above functions, the sum is performed over the constituents i in the jet J, such that the normalisation factor τ 0 (Eq. 5a) is equivalent to the magnitude of the jet p T multiplied by the β−exponentiated jet radius.
Recent studies [29] have shown that an effective alternative axis definition can increase the discrimination power of these variables. The 'winner-takes-all' axis uses the direction of the hardest constituent in the exclusive k t subjet instead of the subjet axis, such that the distance measure ∆R a 1 ,i changes in the calculation. The ratio of the N-subjettiness functions found with the standard subjet axes, τ 21 , and with the 'winner-takes-all' axes, τ wta 21 , can be used to generate the dimensionless variables that have been shown in particle-level MC to be particularly useful in identifying two-body structures within jets: Energy correlation ratios: The 1-point, 2-point and 3-point energy correlation functions for a jet J are given by: where the parameter β is used to give weight to the angular separation of the jet constituents. In the above functions, the sum is over the constituents i in the jet J, such that the 1-point correlation function Eq. (7b) is approximately the jet p T . Likewise, if one takes β = 2, it is noted that the 2-point correlation functions are equivalent to the mass of a particle undergoing a two-body decay in collider coordinates.
An abbreviated form of these definitions can be written as : These ratios of the energy correlation functions can be used to generate the dimensionless variable C 2 [17,18], that have been shown in particle-level MC to be particularly useful in identifying two-body structures within jets: The W-tagging efficiency in multijet background events is studied on the same multijet samples as used for the optimisation studies, using Pythia (8.165) with the AU2 tune and the CT10 PDF set, and also a Herwig++ (2.6.3) sample with the EE3 tune [47] and CTEQ6L1 [45] PDF set. It is these samples that are used for the comparisons with data in Sect. 7.
The effects of differences between these samples due to using the leading jets (for the MC-based optimisation) or both leading and sub-leading jets (for the multijet background efficiency measurement in data) are discussed in Sect. 7.2.

Object reconstruction and event selection
In the studies presented here, calorimeter jets are reconstructed from three-dimensional topological clusters (topoclusters) [48] which have been calibrated using the Local Cluster Weighting (LCW) scheme [49].
In MC simulated events, truth jets are built from generator-level particles that have a lifetime longer than 10 ps, excluding muons and neutrinos. Jets are reconstructed using one of the iterative recombination jet reconstruction algorithms [50,51] C/A or anti-k t . The k t algorithm is also used by the jet trimming algorithm to reconstruct subjets.
In all following discussions, the term constituents means particles in the case of truth jets and LCW topoclusters in the case of calorimeter jets.
For the MC-based optimisation studies discussed in Sect. 6, events are characterised using the leading jet, reconstructed from generator-level particles with the C/A, R = 1.2 algorithm.
Objects used to select tt events in data and MC for the studies in Sect. 7 include reconstructed leptons (electrons and muons), missing transverse momentum (E miss T ), small-R jets (reconstructed with the antik t algorithm with radius parameter R = 0.4), trimmed anti-k t , R = 1.0 jets and b-tagged jets, defined below.
• Electrons: Electron candidates are reconstructed from energy deposits in the EM calorimeter matched to reconstructed tracks in the ID. Candidates are required to be within |η| < 2.47, excluding the barrel/endcap transition region, 1.37 < |η| < 1.52, of the EM calorimeter, and must have a transverse energy E T > 25 GeV. They are required to satisfy tight identification criteria [52] and to fulfil isolation [53] requirements; excluding its own track, the scalar sum of the p T of charged tracks within a cone of size ∆R = min(10 GeV/E T , 0.4) around the electron candidate must be less than 5% of the p T of the electron.
• Muons: Muons are reconstructed by matching MS to ID tracks. Muons are required to be within |η| < 2.5 and have p T > 25 GeV. In order to reject non-prompt muons from hadron decays, the significance of their transverse impact parameter must be |d 0 |/σ d 0 < 3, the longitudinal impact parameter must be |z 0 | < 2 mm, and the scalar sum of p T of the charged tracks within a cone of size ∆R = min(10 GeV/p T , 0.4) around the muon candidate, excluding its own track, must be less than 5% of the p T of muon.
• Trigger leptons: Events are selected by requiring an un-prescaled single-lepton trigger for the electron and muon channels. Two single-electron triggers, with transverse energy thresholds of E T > 24 GeV for isolated electrons and E T > 60 GeV without isolation criteria, are used in combination with two single-muon triggers, with transverse momentum of p T > 24 GeV for isolated muons and p T > 36 GeV without isolation criteria. The selected muon (electron) must be matched to a trigger and is required to fulfil p T > 25 (20) GeV and |η| < 2.5. Events are rejected if any other electron or muon satisfying the identification criteria is found in the event.
• Missing transverse momentum, E miss T and transverse mass, m W T : The missing transverse momentum is calculated from the vector sum of the transverse energy of topological clusters in the calorimeter [54]. The clusters associated with the reconstructed electrons and small-R jets are replaced by the calibrated energies of these objects. Muon p T determined from the ID and the muon spectrometer are also included in the calculation. The E miss T is required to exceed 20 GeV. The sum of the E miss T and the transverse mass, m W T = 2p T E miss T (1 − cos ∆φ), reconstructed from the E miss T and the transverse momentum of the lepton, must be E miss T + m W T > 60 GeV. • Small-R Jets (anti-k t , R = 0.4): Using locally calibrated topological clusters as input, small-R jets are formed using the anti-k t algorithm with a radius parameter R = 0.4. Small-R jets are required to be within |η| < 2.5 and to have p T > 25 GeV. To reject jets with significant pileup contributions, the jet vertex fraction [55], defined as the scalar sum of the p T of tracks associated with the jet that are assigned to the primary vertex divided by the scalar sum of the p T of all tracks associated to the jet, is required to be greater than 0.5 for jets with p T < 50 GeV. At least one small-R jet must be found. In addition, at least one small-R jet must lie within ∆R = 1.5 of the lepton. The leading small-R jet within ∆R = 1.5 of the lepton is defined as the "leptonic-top jet" and denoted j t . Jets have to satisfy specific cleaning requirements [56] to remove calorimeter signals coming from noncollision sources or calorimeter noise. Events containing any jets that fail these requirements are rejected.
The output of the MV1 [57] algorithm is used to identify small-R jets containing b-hadrons. Small-R jets are tagged as b-jets if the MV1 weight is larger than the value corresponding to the 70% b-tagging efficiency working point of the algorithm. At least one small-R jet must be tagged as a b-jet. Loose b-jets are defined as having an MV1 weight larger than the value corresponding to the 80% working point. All loose b-jets must be separated by ∆R > 1.0 from the W-jet candidate.
• Trimmed R = 1.0 Jets: Using locally calibrated topological clusters as inputs, anti-k t , R = 1.0 jets are groomed using the trimming algorithm with parameters f cut = 5% and R sub = 0.2. The pseudorapidity, energy and mass of these jets are calibrated using a simulation-based calibration scheme as mentioned in Sect. 6.4. At least one trimmed anti-k t , R = 1.0 jet with p T > 200 GeV and |η| < 1.2 is required. If more than one jet satisfies these criteria, the leading jet is used to reconstruct the W boson candidate, J W . This candidate, J W , has to be well separated from the leptonic-top jet, • Overlapping jets and leptons: An overlap removal procedure is applied to avoid double-counting of leptons and anti-k t , R = 0.4 jets, along with an electron-in-jet subtraction procedure to recover prompt electrons that are used as constituents of a jet. If an electron lies ∆R < 0.4 from the nearest jet, the electron four-momentum is subtracted from that of the jet. If the subtracted jet fails to meet the small-R jet selection criteria outlined above, the jet is marked for removal. If the subtracted jet satisfies the jet selection criteria, the electron is removed and its four-momentum is added back into the jet. Next, muons are removed if ∆R(muon, jet) < 0.04 + 10 GeV/p T,muon using jets that are not marked for removal after the electron subtraction process.
For the measurement of the multijet background efficiency, a different selection is used to ensure a multijet-enriched sample. The multijet sample is selected using a single, un-prescaled, R = 1.0 jet trigger that is 80% efficient for jets with p T > 450 GeV. No grooming is applied to jets at the trigger level. For events with a leading jet above the trigger threshold, both the leading and the sub-leading jets are used for this performance study, making it applicable for jets with p T down to 200 GeV. At least one anti-k t , R = 1.0 jet, trimmed with f cut = 5% and R sub = 0.2, is required to have p T > 200 GeV and |η| < 1.2. Events containing fake jets from noise in the calorimeter or non-collision backgrounds, according to Refs. [58,59], are rejected.
For the tt and multijet background selection, good data quality is required for events in data, meaning that all the detectors of ATLAS as well as the trigger and data acquisition system are required to be fully operational. Events are required to have at least one reconstructed primary vertex with at least five associated tracks, and this vertex must be consistent with the LHC beam spot.

A comprehensive comparison of techniques in Monte Carlo simulations
The initial phase of this study evaluates the performance of a large number of grooming and tagging algorithms in MC simulated events.
To account for correlations between the W boson p T and the resulting jet substructure features, events are categorised by the p T of the leading (highest p T ) jet reconstructed with the C/A [6] algorithm with radius parameter R = 1.2, using stable particles as inputs. These ranges in the ungroomed truth jet p T , GeV. This large, ungroomed jet is considered a rough proxy for the W boson, and this choice does not introduce a bias towards any particular grooming configuration for the p Truth T ranges in question. Only events with a C/A, R = 1.2 truth jet within |η| < 1.2 are considered, ensuring that jets are within the acceptance of the tracking detector, which is necessary for the derivation of the systematic uncertainties.
First, in Sect. 6.1, more than 500 jet reconstruction and grooming algorithm configurations are selected based on prior studies [10,11,[60][61][62][63]. The leading-groomed-jet mass distributions for W-jet signal and multijet background in MC are examined. An ordered list is built rating each configuration based on the background efficiency. The notation for the background efficiency at this grooming stage is G QCD , and this is measured within a mass window that provides a signal efficiency of 68%, denoted G W = 68%. The best performers for each category described in Sect. 2.1 (trimming, pruning, split-filtering) are retained for the next stage: a total of 27 jet collections.
Observations about pileup-dependence are summarised in Sect. 6.2. Jet grooming reduces the pileupdependence of the jet mass and helps distinguish W-jets from those initiated by light quarks and gluons by improving the mass resolution, but does not provide strong background rejection. Further information coming from the distribution of energy deposits within a jet can be used to improve the ratio of signal to background.
In the second stage, 26 substructure variables are studied for all 27 selected jet collections. These studies are detailed in Sect. 6.3. Substructure variables can be calculated using jet constituents before or after grooming; in these studies all variables are calculated from the groomed jet's constituents, such that the potential sensitivity to pileup conditions is reduced.  Table 1: Details of the different trimming, pruning and split-filtering configurations that were tried in order to define the best grooming algorithms. All combinations of the grooming parameters are explored in these studies.
The aim of these studies is to find an effective combination of groomed jet mass and one substructure variable. The background efficiency G&T QCD (where G&T indicates grooming plus tagging) versus the signal efficiency G&T W is calculated for all variables in each configuration, and background efficiencies for 'medium' (50%) and 'tight' (25%) signal efficiency working points are determined. Four grooming algorithms and three tagging variables are identified as having a particularly low background efficiency at the medium signal efficiency working point, G&T W = 50%.
In Sect. 6.4 the conclusions of these preliminary studies of combined groomed mass and substructure taggers are presented.

Performance of grooming algorithms
A set of more than 500 jet reconstruction and grooming algorithm configurations (introduced in Sect. 2.1) are explored within the parameter space summarised in Table 1.
The signal and background mass distributions for a selection of grooming configurations in the range 200 < p Truth T < 350 GeV are shown in Fig. 2. A Gaussian fit to the W boson mass peak (with the W mass set as the initial condition) is shown. Two alternative signal mass window definitions are considered: 1. The 1σ boundaries of the Gaussian fit.
2. The smallest interval that contains 68% of the integral.
Comparing the extent of these two mass windows allows an estimation of how closely the signal mass peak resembles a Gaussian distribution. The W-jet mass is required to be within the boundaries defined by this latter definition of the signal window; this leads, by definition, to a baseline signal efficiency of G W = 68% for all algorithms.
The groomed jet mass distributions for leading jets are examined for all combinations of grooming configurations for W-jet signal and multijet background. The background efficiency, G QCD is defined as follows: • The denominator is the total number of pre-selected events from the multijet background sample, where the pre-selection requires an ungroomed C/A, R = 1.2 truth jet with p Truth T > 200 GeV and |η Truth | < 1.2.
• The numerator is the number of pre-selected events where the groomed jet mass falls in the window that contains 68% of the W-jet signal, G W = 68%. The minimisation of G QCD is the primary criterion for ordering the algorithms according to their performance. In addition, there are a number of possible pathologies revealed in the mass distributions: features that show obviously unsuitable configurations, or make it impossible to derive a jet mass calibration, or indicate the need for additional pileup removal techniques. These are: (i) The G W = 68% window does not contain the W boson mass [64]. An example of this is shown in Fig. 3(a).
(ii) The signal mass distribution is strongly non-Gaussian. An example of this is shown in Fig. 3(b).
(iii) The background mass distribution has an irregular shape (e.g. it has local maxima) in the region of the signal peak. An example of this is also shown in Fig. 3(b).
(iv) The jet mass after grooming is strongly affected by pileup. Configurations where the average jet mass increases by > 1 GeV times the number of primary vertices, NPV, are rejected. This issue is discussed in Sect. 6.2.
Algorithms that are susceptible to any of these pathologies are removed from the list of well-behaved algorithm configurations.
The W boson tagging efficiency performance is studied independently for three different ranges in the p T of the ungroomed truth jet reconstructed with the C/A, R = 1. • The jets reconstructed with R = 0.6 and R = 0.8 are too small to contain all the decay products of a W-jet for p T < 500 GeV and p T < 350 GeV, respectively. The reconstructed jet mass is often much smaller than 80 GeV, indicating that some of the W boson decay products are not clustered, and the 68% signal mass window is wider, resulting in a higher background efficiency. Small radii jets can, however, have good performance at high p T .
• In the highest p T bin, 500-1000 GeV, the various configurations result in a similar performance.
The unique features of each grooming category are presented below.

Trimming:
Various trimming configurations are studied, varying the algorithm and size of the initial jet (C/A with R = 0.6-1.2, anti-k t with R = 0.8-1.2), and the R sub and f cut parameters summarised in Table 1. The background rejection and the boundaries of the 68% signal mass windows obtained with a subset of trimming configurations for the range 350 < p T < 500 GeV are shown in Fig. 4 for anti-k t , R = 1.0 and C/A, R = 1.0 jets. The systematic uncertainties resulting from the uncertainty on the jet mass and energy scale (described in detail in Sect. 7.5) are provided to give the reader an idea of the relevance of the differences in performance between the grooming configurations. The following characteristics are noted: • C/A and anti-k t jets have a similar performance under the same configurations.
• The larger values of f cut can lead to significantly lower background efficiency.
• The dependence of the performance on R sub is less significant, but the background efficiency does decrease somewhat for smaller R sub values.
Based on the performance of these algorithms, the trimming implementations considered for further investigation are given in Table 2. Although promising, configurations with R sub = 0.1 are not pursued  further in these studies, as this size is approaching the limiting granularity of the hadronic tile calorimeter, requiring further studies for a proper control of the systematic uncertainties.

Pruning:
The performance of pruning is studied using both C/A and anti-k t algorithms for the initial large-R (R = 0.6-1.2) jet finding, and C/A for the reclustering procedure. The background efficiencies and 68% signal mass windows obtained with a subset of pruning configurations for the range 350 < p T < 500 GeV are shown in Fig. 5. Several observations can be made: • Using the C/A algorithm as the re-clustering algorithm for pruning is consistently better than using the k t algorithm, for the same values of the R cut and Z cut parameters.
• Pruning with smaller R cut and/or higher Z cut can be overly harsh, resulting in W-jet mass peaks at values lower than 80 GeV.
• The background efficiency does not have strong dependence on R cut or on Z cut , but there is evidence for a p T dependence of the optimal Z cut , with Z cut = 0.15 being preferable for the ranges 200 < p T < 350 GeV and 350 < p T < 500 GeV, and Z cut = 0.10 being preferred for p T > 500 GeV.
• For all pruning configurations, the performance is significantly worse in the lowest p T bin.
Based on the performance of all the algorithms, the eight combinations retained for further studies are given in Table 3 Split-filtering: Split-filtering is studied with C/A jets with R = 1.2 and 1.0, and various values of the parameters √ y min , R sub and µ max . The background efficiencies and 68% signal mass windows obtained with a subset of split-filtering configurations for the range 350 < p T < 500 GeV are shown in Fig. 6 and Fig. 7.
Observations from the results of these studies include the following: • Larger √ y min values tend to result in lower background efficiencies.
• The performance has a dependence on √ y min and the optimal requirement varies with jet p T . For y cut ≥ 0.09, the background efficiency is relatively stable.
• For a √ y min > 0.09, there is not a strong dependence of the performance on R sub or µ max .
A total of 11 split-filtering jet collections are considered for further study, all with µ max = 100% and R sub = 0.3. These are given in Table 4.

Pileup dependence
The influence of pileup on the reconstructed groomed jets is examined during the first stage of algorithm optimisation, and configurations that show large susceptibility to pileup after grooming are discarded. There are a number of methods [61, [65][66][67][68][69][70][71] available for reducing the effects of pileup, either on their own or combined with grooming; these techniques are not considered in this study. Most grooming configurations almost completely remove the effects of pileup from the mean jet mass as illustrated in Fig. 8 in which the correlation between average jet mass M and number of primary vertices for a wellbehaved trimming configuration is shown. The significant correlation between the average ungroomed jet mass and the number of reconstructed primary vertices is absent for trimmed jets in both signal and background.
The pileup dependence of the mean jet mass obtained with all 27 of the grooming configurations selected for stage two of the optimisation studies is shown in terms of the fitted slope of δ M /δNPV in Fig. 9 for the p T range 350-500 GeV. In general, the average masses of jets with larger radii have a more pronounced pileup dependence, and the trimmed jet mass has a weaker pileup dependence than that obtained with the pruning and split-filtering algorithms. For all jet algorithms, the pileup dependence is much reduced with respect to that of ungroomed jets.  • The N-subjettiness ratios τ 2 , τ wta 2 , τ 21 , and τ wta 21 are also described in detail in Sect. 2.2. • Planar flow [19], P, is a measure of how uniformly distributed the energy of a jet is, perpendicular to its axis.
• The angularity, a 3 , distribution is expected to peak sharply at values close to zero for a balanced two-body decay, such as that of a W boson, while a broader tail is expected for jets initiated by quarks and gluons. The general formula for the mass-normalised angularity can be found in Ref. [19].
• Splitting scales [20] are calculated, within the jet clustering algorithm, and can be calculated for any jet using its constituents. The splitting scale √ d 12 , is calculated for a jet (re)clustered with the k t -clustering algorithm, and is the k t distance between the two proto-jets of the final clustering step.  • The variable √ z 12 [21] is a variant on the original splitting scale √ d 12 which uses the jet mass.

ATLAS
• The momentum balance [5], √ y 12 , and mass-drop fraction µ 12 , are defined at the first de-clustering step that satisfies a minimum mass-drop and momentum balance requirement, and are only available for those jets that are groomed with the split-filtering algorithm.
• The soft-drop algorithm [22] declusters the jet, following the path of highest p T through the clustering history. A condition is defined: where the fractional momentum of the softest of the two branches is z g = min(p T1 ,p T2 ) p T1 +p T2 , and the fractional angular separation of the two branches (with respect to the R parameter of the initial jet algorithm, R 0 ) is r g = ∆R 12 R 0 . Nine values of the z cut parameter between 4% and 20% are explored here, given in Table 5. The β values chosen here are −1.0, −0.75, and −0.5. The starting condition of Eq. 10 with z cut = 4% is applied to the first step in the declustering. If this condition is not satisfied, the algorithm continues to the next step in the jet's clustering history, and so on, checking if the condition is satisfied at any point. If it is not, the 'soft-drop-level', L SD (β) is zero. If this condition is satisfied, L SD (β) = 1. The algorithm then remains at this point in the clustering history  and asks for the same condition with the harder momentum condition, z cut = 6%. If this condition is not satisfied, the algorithm continues to the next step in the jet's clustering history, and so on.
• The dipolarity [25], D, is a measure of the colour flow between two hard centres within a jet.
• Jet shape variables are computed in the centre-of-mass frame of a jet, which can increase the separation power between W-jets and jets in multijet events. Sphericity, S , aplanarity, A, and thrust minor and major, T min , T maj , already used in a previous ATLAS measurement [26], as well as the ratio of the second to zeroth order Fox-Wolfram moments, R FW 2 [72] are considered. • For a jet clustered with a given recombination jet clustering algorithm, the Q-jets technique [27] reclusters the jet many times for each step in the clustering. Following this, any jet observable, such as the mass, will have a distribution for a given jet. The Q-jets configuration optimised in Ref. [28] is adopted in this study. The high mass in W-jets tends to persist during the re-clustering while the mass of QCD jets fluctuates. A sensitive observable to this trend is the coefficient of variation of the mass distribution for a single jet, called the volatility [27,28], ν α Q . The superscript α denotes the are shown in Figs. 10-12 for anti-k t , R = 1.0 jets trimmed with f cut = 0.05 and R sub = 0.2, after applying the 68% signal efficiency mass window requirement. This grooming algorithm is referred to in the remainder of this paper as 'R2-trimming'. At this stage no jet mass calibrations have been applied for any of the grooming configurations. Also shown are the correlations between the jet mass and each of these variables, shown separately for the W-jet signal and multijet background, in both cases before applying the 68% signal efficiency mass window requirement. No truth-matching between the subjets and the quarks from the W decay is required, such that the signal sample contains both full W-jets and jets made of fragments of the W-decay, generally because the W-decay is not completely captured in the R = 1.0 jet. The background jets within the signal sample are particularly visible in the low-mass region of Fig. 10(b), where the distributions echo those seen in the background sample.
The background rejection power (1 / background efficiency) is shown in Fig. 13 for the G&T W = 50% efficiency working point for each substructure variable inside the mass window determined by the grooming, and for each of the 27 grooming configurations, for the range 350 < p T < 500 GeV.  In addition to calculating the background rejection power at a particular signal efficiency working point, full rejection versus efficiency curves (so-called Receiver Operating Characteristic 'ROC' curves) are produced for each combination. An example showing the relationship between the W-jet signal efficiency and the multijet background rejection for the range 350 < p T < 500 GeV is shown in Fig. 14. The maximal efficiency value for each algorithm is by definition 68%, since the tagging criteria are applied after requiring the jet mass to be within the mass window defined by the grooming.

Summary of grooming and substructure in MC
Four grooming configurations, given in Table 6, show consistently high performance in all p T bins. The jet η, mass and energy calibrations are derived for these four using a simulation-based calibration scheme, used as the standard one by ATLAS in previous studies [10]. The mass window sizes for calibrated jets, the background efficiencies for G W = 68% and the δ M /δNPV in the range 200 < p T < 350 GeV are also given in Table 6.
Since the first algorithm in Table 6 is the only one of the four with negligible pileup dependence across all p T ranges (the central p T range only is shown in Fig. 9), it is adopted for all successive studies.
The best substructure variables for use with R2-trimmed jets at the G&T W = 50% working point, providing background efficiencies G&T QCD ∼ 2% (background rejection power ∼ 50, in terms of Fig. 13) for jets with p T > 350 GeV, are given in Table 7. Studies of the R2-trimmed grooming configuration and the three preferred substructure variables are described in the next section, where the results obtained from Monte Carlo simulations are compared to data.

Detailed studies of selected techniques in data
This section describes a comparison of the W-jet and multijet tagging efficiencies measured using three tagging variables C      Table 7: The mass windows for calibrated R2-trimmed jets that provide G W = 68%, and the requirements on the three substructure variables that result in the lowest background efficiencies G&T QCD , when combined with the mass windows to provide G&T W = 50%. differences between the tt final state examined in this section and the W final state used in the preliminary optimisation studies are given in Sect. 7.2. The systematic uncertainties are discussed in Sect. 7.3, and the distributions of mass and substructure variables in data and MC are presented in Sect. 7.4. The signal and background efficiency estimation procedures and their uncertainties are detailed in Sect. 7.5. A summary of the signal and background tagging efficiencies measured in data and compared to MC is given in Sect. 7.6.
In all the following studies, events are categorised according to the leading, reconstructed R2-trimmed jet p T in three ranges: [200,250], [250,350], and [350, 500] GeV. This characterisation differs from that used in the first stage of the optimisation in Sect. 6, which uses ungroomed C/A, R = 1.2 truth jets and different ranges; the selection is extended only to 500 GeV here because there are insufficient data above 500 GeV in the 2012 dataset. The lowest p T range used in the preliminary optimisation stage, [200,350] GeV, is now divided in two, since the 2012 dataset has an abundance of top-decay events in this range.

Sample compositions and definitions
Signal W-jets are extracted from tt events in data and in the MC samples detailed in Sect. 4. The tt production cross-section is scaled to match the value obtained from NNLO calculations [73]. An additional reweighting is then applied to the tt MC using the generator-level p T of the top quark and the p T of the tt system to reproduce the p T -dependence of the measured cross-section [74].
The dominant backgrounds to the tt event topology come from tt production where there is only partial reconstruction of the W boson decay, with or without contamination from radiation outside of the top quark decay (such as hard gluon emission, non-tagged b-jets). Generator-level information from the tt and Wt samples is used to distinguish the cases where the W candidate jet is matched to a genuine W boson or to other jets (referred to as top quark background events). An event is categorised as belonging to the W signal when both partons from the W boson decay are within ∆R = 1.0 of the jet axis; otherwise, the event is labelled as non-W background.
The leading non-top background process is production of W bosons in association with jets. The W+jets contribution is estimated using a data-driven charge asymmetry method [75]. Alpgen + Pythia MC samples provide the event kinematics, and the relative flavour contributions and overall normalisation are determined from data. The flavour fractions are found using a control region in which there is no b-tagged jet requirement and instead of requiring a large-R jet, events are required to have exactly two small-R jets. The relative contributions from each jet flavour are found using the charge asymmetry and the flavour fractions are fixed for W+jets events in the signal region before the b-tagged jet requirement is applied. Finally, an overall normalisation is obtained by scaling the simulated W+jets charge asymmetry to match the charge asymmetry in data, after other charge-asymmetric backgrounds are accounted for using MC.
The contribution from multijet events to the sample composition is estimated by using loose lepton identification criteria and deriving the contribution of non-prompt leptons using the matrix method [76,77]. This method relies on the fact that the tight lepton identification criteria selects primarily prompt leptons, while loose leptons that do not satisfy the tight criteria are primarily from backgrounds. The probabilities for a non-prompt lepton from multijet production which satisfies the loose/tight identification criteria are measured from data in control regions dominated by multijet events, with prompt-lepton contributions subtracted based on MC. The corresponding probabilities for a lepton from prompt sources (such as W bosons) which satisfies the loose/tight identification criteria are derived from MC samples, corrected using data-to-MC correction factors derived from Z → events. Once the fraction of events satisfying the different identification criteria is known, an event weight is calculated and applied to data events with the loosened lepton identification criteria to provide an estimate of the multijet contribution.

Event topology effects in Monte Carlo simulations
The preliminary MC-based optimisation studies in Sect. 6 use a signal composed of well-isolated W-jets from the hypothetical process W → WZ → qq provided by Pythia and a background sample of jets initiated by light quarks or gluons, also provided by Pythia. In the following sections, efficiencies are measured in data, so the tt final state is used as a source of W-jets. As described in Sect. 4, the main tt signal processes are provided by either Powheg-BOX + Pythia or MC@NLO + Herwig and the multijet background is provided by Pythia or by Herwig++.
Despite the backgrounds in both event topologies being Pythia multijets, they are different in that the background efficiencies obtained in data include a leading-jet minimum p T requirement of 450 GeV in order to ensure full efficiency with respect to the trigger used. With this selection, the lower p T ranges, [200,250] and [250,350] GeV, are composed entirely of sub-leading jets, and the highest p T bin, [350, 500] GeV, is a mixture of leading and sub-leading jets. Jets softer than the sub-leading jet are not considered. In the background sample used for the studies in Sect. 6 there is no comparison with data, thus there are no trigger requirements and the leading jet is always shown. A higher average jet mass is observed in the leading + sub-leading jet selection than with the leading-jet selection. This in turn leads to a higher background efficiency for the studies summarised in Sect. 7 than for those in Sect. 6.4. These differences are relevant in that leading and sub-leading jets have different flavour compositions (light-quark versus gluon). Gluon-initiated jets have higher average mass than quark-initiated jets [78].
The signal event topologies are more obviously different, with the W process producing potentially more isolated W-jets than those found in the tt final state. The W bosons produced in the W decay are also generally longitudinally polarised, making them potentially easier to distinguish from multijet background than W-jets from top decays, which are produced in both the longitudinal and transverse modes [11,63].
The signal efficiency versus background rejection curves in the two different event topologies, including the differences in both signal and background, are shown in Fig. 15. The curves for tagging W-jets from the W against a leading-jet background indicate better performance in this event topology, with the magnitude of the difference depending on the substructure variable used for tagging. Figure 16 shows the curves again, but this time the leading jet from the Pythia multijet background is used in both cases, thus removing the differences in background efficiencies, and isolating the differences resulting from the different signal event topologies. With identical background compositions, the performance is generally slightly better in the Powheg-BOX tt sample. The mass distributions for the different signal and background samples are compared in Fig. 17 for the lowest and highest p T ranges. The signal distributions also include the R2-trimmed leading-jet mass from tt events provided by MC@NLO + Herwig. The mass shape differences are less pronounced at higher p T , although the difference in G W for the different signal event topologies is still a non-negligible 10% even in the highest p T range.   Figure 17: The R2-trimmed jet mass distributions for signal W-jet candidates in the range (a) 200 < p T < 250 GeV, and (b) 350 < p T < 500 GeV, and multijet background candidates (c,d) in the same ranges. The W-jets are taken from the processes W → WZ (solid black), and tt events provided by Powheg-BOX (dotted red). Two kinds of Pythia multijets are shown: the solid black line is for the leading jets only, and the dotted red line is for the leading and sub-leading jets. The ratios between the models is shown at the bottom. The inclusion of sub-leading jets, which are more likely to be initiated by gluons, results in higher-mass jets. The vertical lines represent the signal mass window.

Systematic uncertainties
The sources of systematic uncertainty that are common to both the signal and background efficiency measurements include the jet mass scale (JMS), jet mass resolution (JMR), jet energy scale (JES), jet energy resolution (JER) and jet substructure variable (JSS).
The uncertainty on the JER is taken from previous studies [79] and is parameterised as a function of p T . The size of JER uncertainty is approximately 10% for the p T ranges presented here. The uncertainty on JMR is also taken from previous studies [10], where it was determined from the data/MC variations in the widths of the W-jet mass peaks in tt events, and is fixed at 20%. The JMS, JES and JSS are varied up and down by ±1σ, using the standard deviation derived from the double-ratio method; this is described in detail below using the JSS as an example.
The systematic uncertainty on the JSS is needed in order to derive the full systematic uncertainties on the signal and background efficiencies. Uncertainties are derived using in-situ methods by comparing the measured calorimeter jet energy, mass and substructure variables to the same quantities measured by well-calibrated and completely independent detectors in both data and MC, using the double ratio: where X denotes a jet variable. In this case, track-jets are used as reference objects, since tracks from charged hadrons are well-measured and are independent of the calorimeter. In addition, the use of trackjets, where tracks are required to come from the hard scattering vertex, suppresses pileup effects. A geometrical matching in the η-ϕ plane is applied to associate track-jets with calorimeter-jets. This approach was widely used in the measurement of the jet mass and substructure properties of jets in the 2011 data [10]. Performance studies have also shown that there is excellent agreement between the measured positions of clusters and tracks in data, indicating no systematic misalignment between the calorimeter and the inner detector. This technique achieves a precision of around 3-7% in the central detector region, which is dominated by systematic uncertainties arising from the inner-detector tracking efficiency and MC modelling uncertainties of the charged and neutral components of jets.
The double ratio of Eq. (11) is computed for two different MC generators, Pythia and Herwig++, and the largest disagreement between data and each of the MC generators is taken as a modelling uncertainty. The total uncertainty is then obtained by adding in quadrature this modelling uncertainty to the tracking efficiency uncertainty. Specific uncertainties for tracks inside the core of dense jets are not needed here, because only jets with p T < 1 TeV are considered. The scale uncertainties for the jet energy, mass and substructure variables are derived in ranges of the p T , η, and M/p T of the reconstructed calorimeter jet. and τ wta 21 in the range 350 < p T < 500 GeV. The mean values of the single-ratio X jet /X ref distributions are shown as a function of the jet mass, along with the distributions of X jet /X ref themselves within the relevant G W ∼ 68% mass window. Large discrepancies between data and MC are observed for low-mass jets, while for masses around 80 GeV the data/MC agreement is within 5%. In the distributions of X jet /X ref it is noticed that while the tails of the ratio distributions show discrepancies between data and the MC, the agreement is good for values of the ratio close to one, which represents the large majority of events. In summary, the scale uncertainty of the three jet substructure variables ranges between 1% and 5% in the different kinematic regions. Additional, sub-dominant systematic uncertainties come from MC sources listed in Table 9 and described in Sect. 7.5 in terms of the uncertainty on the final measured signal and background efficiencies. The full systematic uncertainty on the mass and substructure variables are obtained by adding each of the scale, resolution, statistical and MC uncertainties in quadrature.

Mass and substructure distributions in tt events
The jet mass distribution for the leading R2-trimmed jets in events satisfying the pre-selection criteria in Sect. 5 are shown in Fig. 19. The data and events in Powheg-BOX + Pythia and MC@NLO + Herwig simulations agree within the uncertainties detailed in Sect. 7.3. Distributions of the three tagging variables C The jet mass distributions of the W boson candidates satisfying or failing to satisfy the medium signal efficiency requirement for each of the three substructure variables are shown in Fig. 21. The mass distribution for jets failing the C (β=1) 2 tagger ( Fig. 21(a)) is notably different from the mass distributions for jets that fail the D (β=1) 2 and/or τ wta 21 taggers, with a significantly higher mass peak and a low-mass tail that is conspicuous in its absence. This effect can be understood by referring back to Fig. 10(b): the correlation between the mass and C (β=1) 2 is strong for background jets with low masses, while there is no clear correlation in the signal mass region. This means that the C (β=1) 2 variable performs well when combined with a mass window, but is not very effective without the mass constraint.

Signal and background efficiencies and uncertainties
Background efficiencies are measured in a multijet-enriched sample of data, using the large-R trigger and event selection described in Sect. 5.
The systematic uncertainties on the background efficiency measurements in multijet events are summarised in Table 8. The uncertainties are propagated coherently through to the measurement and then added together in quadrature. The background efficiency uncertainty due the JSS uncertainty can be as large as ∼ 25% for jets with p T > 500 GeV and is about 15-20% in the lower p T ranges for the scale uncertainty on D (β=1) 2 . The background efficiency uncertainties from the JMS are, in general, larger than those from the JES and are of the order of 6-10% and 2-9%, respectively. The impact of JER and JMR uncertainties is much smaller than that of the scale uncertainties.
Signal efficiencies are extracted from data by performing a template fit to the mass distributions of jets that satisfy or fail to satisfy the requirement on the given tagging variable. The signal template is constructed using the Powheg-BOX + Pythia tt events, requiring that both partons from the W boson decay in the event record are within ∆R = 1.0 of the jet axis. The mass templates for the background are composed of decays of W bosons from top quarks, where not all the decay products fall inside the jet cone, and the other non-W backgrounds are also estimated using Powheg-BOX + Pythia. The normalisations of both templates are allowed to float.
The statistical uncertainty on the efficiency measurement in data includes the statistical uncertainty of the templates. For most sources of systematic uncertainty, a variation of the fit is performed with templates modified by ±1σ. In the case of the JMS, this variation is between ±0.5σ and ±1.0σ; this reduction in the uncertainty with respect to that obtained with the standard double-ratio technique is made possible by fitting the mass distributions in data to a number of different templates. The templates are obtained by shifting the jet mass up and down by fractions (0.25 -1.0) of σ. The χ 2 /nd f fit quality of each template is calculated, and a parabolic fit performed to the χ 2 /nd f as a function of the fraction of σ. The fraction of σ that results in a one unit shift from that which minimises χ 2 /nd f is used as the uncertainty on the JMS for the signal efficiency calculation.
The full set of contributions to the systematic uncertainty on the signal efficiency is summarised in Table 9, after applying the mass and D (β=1) 2 medium tagging requirements. As in the background efficiency uncertainty estimate, the JSS contributes the largest uncertainty on this efficiency, varying between and 3% and 5% for the D (β=1) 2 scale. The contribution from the JMR is ∼ 3%. The contribution from JER is less significant than JMR, being negligible in the lowest p T bin and ∼ 1% for jets with 250 < p T < 500 GeV. The contribution from JMS variations is also ∼ 1% (symmetrised as a result of the profiling technique) and increases to ∼ 10% in the highest p T range (350 < p T < 500 GeV). The uncertainty from the JES is around 2-4%.
In addition to the scale and resolution uncertainties, two other types of uncertainty are considered for the signal efficiency measurement: (a) tt modelling -initial-state radiation (ISR), final-state radiation (FSR),   The generator uncertainty is taken into account as the difference between the signal efficiency measurement using the MC@NLO + Herwig mass templates for the signal instead of the default Powheg-BOX + Pythia ones. These uncertainties are between 1% and 3%. The modelling uncertainty of the QCD radiation is estimated using AcerMC [80] v3.8 plus Pythia v6.426 MC samples by varying the parameters controlling the ISR and FSR in a range consistent with a previous ATLAS measurement [81]. The resulting uncertainties on the signal efficiency increase with jet p T and are 2-6%. The normalisation uncertainties for the main background sources are evaluated using a ±1σ variation of the cross-section. The normalisation uncertainties are negligible with respect to the scale and resolution uncertainties, and for the tt signal and W+jets background they are < 1%.

Summary of W boson tagging efficiencies in data and MC
The W-jet tagging efficiency in tt events using the R2-trimmed jet mass window and the medium and tight C (β=1) 2 selections is measured in top-enriched data and in MC provided by Powheg-BOX + Pythia and MC@NLO + Herwig. The background efficiency with the same selection is measured in multijetenriched data and in Pythia and Herwig++ simulations. The results of these measurements are shown in Fig. 22. In both the signal and background efficiency distributions, the ratio of data to each of the two MC models is shown in the lower panels. The corresponding signal and background efficiency distributions for D (β=1) 2 and τ wta 21 are shown in Fig. 23 and Fig. 24 respectively. Systematic errors from background modeling are added for the signal data points, while no background modeling is involved in the derivation of background efficiencies, whose points only show statistical error. Good agreement is observed between data and predictions.   Table 9: Relative systematic uncertainties (in %) on the W-jet tagging efficiency from different sources after tagging with the R2-trimmed mass and medium D (β=1) 2 requirement that results in a signal efficiency G&T W ≈ 50%. The uncertainties on scales (JMS, JES and JSS indicate the mass, energy and substructure scale uncertainties) and normalisations can be in both directions, and so result in pairs of efficiency uncertainties, but here the JMS is symmetrised as part of the profiling technique described in the text. The contributions from each source are added in quadrature to get the total uncertainty on G&T QCD . The mass and energy resolution uncertainties are denoted JMR and JER respectively, and ISR/FSR indicate the uncertainties from the modeling of the initial/final state radiation.      Figure 24: W boson tagging efficiencies in ranges of jet p T for (left) signal W-jets in tt events and (right) multijet background. The G&T W ∼ 50% working points obtained with the combined mass window and τ wta 21 requirements are shown in (a) and (b), and the ∼ 25% working points are shown in (c), (d). The deviations from 50% and 25% in (a) and (c) respectively are due to the optimisations being based on W-jets in a different W → WZ topology, as discussed in the text. The lower panels show ratios of the efficiency measured in data to the efficiency in two different MC simulations.
The signal efficiency at the medium working point is not exactly 50% because the selection requirements for the G&T W = 50% working point are calculated using W-jets from W → WZ → qq events, and are applied here to W-jets in tt events.
The data points are the result of fits using templates extracted from Powheg-BOX + Pythia; the difference with respect to the results that would be obtained using templates from MC@NLO + Herwig is added in quadrature as an additional source of systematic uncertainty.
The D (β=1) 2 tagger has the smallest background efficiency for the medium and tight working points in all p T ranges except for the lowest, 200 < p T < 250 GeV. The background efficiencies decrease with increasing p T , with the exception of the C (β=1) 2 tagger, for which the background efficiency increases for jets in the range 250 < p T < 350 GeV. This behaviour can be explained by the stronger p T dependence of the C (β=1) 2 tagger compared to the D (β=1) 2 and τ wta 21 taggers. For the signal efficiencies, the uncertainty bands of the ratios account for the correlations in the systematic uncertainties between data and MC. In general, data and Powheg-BOX + Pythia agree better than data and MC@NLO + Herwig. For the medium working point, there is agreement between the two MC models within 1σ except in the range 200 < p T < 250 GeV, while for the tight working point ( G&T W ∼ 25%) the efficiency of MC@NLO + Herwig is 1.5σ to 2σ higher than both the efficiency predicted by Powheg-BOX + Pythia and the measurements in data. There is a potential bias towards Powheg-BOX + Pythia, as this generator provides the signal template used in determining the background subtraction that is necessary to define the signal efficiency in data. However, even when using MC@NLO + Herwig for the templates in the subtraction, Powheg + Pythia gives a better description of the signal efficiency measured in data. The differences in the MC signal efficiencies stem from the differences in the signal mass distributions between models; the mass peak has a different width, so the fraction of signal in the mass window (which is the same for both Monte Carlo samples) is already significantly different after the requirement on the groomed jet mass is applied (see for example Fig. 17). Figure 25 shows the tt MC efficiency versus rejection curves with data measurements at the medium and tight working points, including systematic uncertainties on the signal and background efficiencies. Generally good agreement between data and MC simulation is observed in all p T ranges for these measurements.

Conclusions
Several combinations of jet grooming algorithms and tagging variables have been studied to find an optimal W-jet tagger in terms of (a) maximising multijet background rejection power for given values of W-jet signal efficiency; (b) minimising systematic uncertainties and the effects of pileup; and (c) the modelling of the jet mass and substructure variables in Monte Carlo simulations.
The signal efficiency working point G W = 68% is chosen as a suitable baseline for the comparison of grooming algorithms. The performances of the best few configurations of trimming, pruning and splitfiltering are similar at this working point, and the anti-k t , R = 1.0 jet trimmed with f cut = 5% and R sub = 0.2 ('R2-trimming') does particularly well in terms of removing pileup-dependence. Cambridge-Aachen pruning also provides significant discrimination for W-jet tagging, as does split-filtering without the mass-drop requirement. The irrelevance of the mass-drop requirement was shown previously in phenomenological studies [82], and is verified here in MC samples with a full ATLAS detector simulation. Trimming with R sub = 0.1 shows promise in terms of the jet mass; it is not pursued further in these studies because it is challenging in terms of systematic uncertainties, as one is entering the arena of single-cluster jet, but it may well be considered in future extensions of these studies (for example in tagging W bosons with p T > 1 TeV).
The energy correlation ratios D (β=1) 2 , C (β=1) 2 are found to be particularly good variables for tagging W-jets, as shown for the first time here in data. However, there is some evidence of the C (β=1) 2 variable having a higher background efficiency for low-p T jets. Similarly good is the N-subjettiness ratio τ wta 21 , which performs better than its predecessor τ 21 .
The signal and background efficiencies obtained using pairwise combinations of the R2-trimmed mass and three different substructure variables are measured in tt and multijet events from 20.3 fb −1 of 8 TeV pp collisions recorded by ATLAS at the LHC. These are compared to various MC predictions which show in general good agreement within the uncertainties with the data measurements of signal efficiencies around 50% for background efficiencies around 2%.
In some configurations, significant differences are observed in both the signal and background efficiencies from different Monte Carlo predictions. This can provide important information to improve the Monte Carlo simulations for searches for physics beyond the Standard Model. It further highlights the potential for data measurements such as these to be utilised for tuning Monte Carlo simulations.
These studies are necessarily limited in scope to comparing simple two-variable taggers, made up of a groomed mass window and a substructure variable requirement, both of which are sensitive to p T and therefore optimised for three different p T ranges. Extensions to these studies could include combining three or more variables and using multivariate techniques to further boost the signal efficiency and/or reduce the background; investigating how these conclusions change if dedicated pileup-removal techniques are used alongside grooming; and varying the G W baseline at which the grooming algorithms are compared.      The ATLAS Collaboration