Identification and rejection of pile-up jets at high pseudorapidity with the ATLAS detector

The rejection of forward jets originating from additional proton–proton interactions (pile-up) is crucial for a variety of physics analyses at the LHC, including Standard Model measurements and searches for physics beyond the Standard Model. The identification of such jets is challenging due to the lack of track and vertex information in the pseudorapidity range |η|>2.5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\eta |>2.5$$\end{document}. This paper presents a novel strategy for forward pile-up jet tagging that exploits jet shapes and topological jet correlations in pile-up interactions. Measurements of the per-jet tagging efficiency are presented using a data set of 3.2 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document} of proton–proton collisions at a centre-of-mass energy of 13 TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {TeV}$$\end{document} collected with the ATLAS detector. The fraction of pile-up jets rejected in the range 2.5<|η|<4.5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.5<|\eta |<4.5$$\end{document} is estimated in simulated events with an average of 22 interactions per bunch-crossing. It increases with jet transverse momentum and, for jets with transverse momentum between 20 and 50 GeV, it ranges between 49% and 67% with an efficiency of 85% for selecting hard-scatter jets. A case study is performed in Higgs boson production via the vector-boson fusion process, showing that these techniques mitigate the background growth due to additional proton–proton interactions, thus enhancing the reach for such signatures.


Introduction
In order to enhance the capability of the experiments to discover physics beyond the Standard Model, the Large Hadron Collider (LHC) operates at the conditions yielding the highest integrated luminosity achievable. Therefore, the collisions of proton bunches result not only in large transverse-momentum transfer proton-proton ( pp) interactions, but also in additional collisions within the same bunch crossing, primarily consisting of low-energy quantum chromodynamics (QCD) processes. Such additional pp collisions are referred to as intime pile-up interactions. In addition to in-time pile-up, outof-time pile-up refers to the energy deposits in the ATLAS calorimeter from previous and following bunch crossings with respect to the triggered event. In this paper, in-time and out-of-time pile-up are referred collectively as pile-up (PU).
In Ref. [1] it was shown that pile-up jets can be effectively removed using track and vertex information with the jet-vertex-tagger (JVT) technique. The CMS Collaboration employs a pile-up mitigation strategy based on tracks and jet shapes [2]. A limitation of the JVT discriminant used by the ATLAS Collaboration is that it can only be used for jets within the coverage 1 of the tracking detector, |η| < 2.5. However, in the ATLAS detector, jets are reconstructed 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upward. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the in the range |η| < 4.5. The rejection of pile-up jets in the forward region, here defined as 2.5 < |η| < 4.5, is crucial to enhance the sensitivity of key analyses such as the measurement of Higgs boson production in the vectorboson fusion (VBF) process. Figure 1a shows how the fraction of Z +jets events with at least one forward jet 2 with p T > 20 GeV, an important background for VBF analyses, rises quickly with busier pile-up conditions, quantified by the average number of interactions per bunch crossing ( μ ). Likewise, the resolution of the missing transverse momentum (E miss T ) components E miss x and E miss y in Z +jets events is also affected by the presence of forward pile-up jets. The inclusion of forward jets allows a more precise E miss T calculation but a more pronounced pile-up dependence, as shown in Fig. 1b. At higher μ , improving the E miss T resolution depends on rejecting all forward jets, unless the impact of pile-up jets specifically can be mitigated.
In this paper, the phenomenology of pile-up jets with |η| > 2.5 is investigated in detail, and techniques to identify and reject them are presented. The paper is organized as follows. Section 2 briefly describes the ATLAS detector, the event reconstruction and selection. The physical origin and classification of pile-up jets are described in Sect. 3. Section 4 describes the use of jet shape variables for the identification and rejection of forward pileup jets. The forward JVT (fJVT) technique is presented in Sect. 5 along with its performance and efficiency measurements. The usage of jet shape variables in improving fJVT performance is presented in Sect. 6, while the application of forward pile-up jet rejection in a VBF analysis is discussed in Sect. 7. The conclusions are presented in Sect. 8. Footnote 1 continued beam pipe. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). 2 The jet reconstruction is described in Sect. 2.

ATLAS detector
The ATLAS detector is a general-purpose particle detector covering almost 4π in solid angle and consisting of a tracking system called the inner detector (ID), a calorimeter system, and a muon spectrometer (MS). The details of the detector are given in Refs. [3][4][5].
The ID consists of silicon pixel and microstrip tracking detectors covering the pseudorapidity range of |η| < 2.5 and a straw-tube tracker covering |η| < 2.0. These components are immersed in an axial 2 T magnetic field provided by a superconducting solenoid.
The electromagnetic (EM) and hadronic calorimeters are composed of multiple subdetectors covering the range |η| < 4.9, generally divided into barrel (|η| < 1.4), endcap (1.4 < |η| < 3.2) and forward (3.2 < |η| < 4.9) regions. The barrel and endcap sections of the EM calorimeter use liquid argon (LAr) as the active medium and lead absorbers. The hadronic endcap calorimeter (1.5 < |η| < 3.2) uses copper absorbers and LAr, while in the forward (3.1 < |η| < 4.9) region LAr, copper and tungsten are used. The LAr calorimeter read-out [6], with a pulse length between 60 and 600 ns, is sensitive to signals from the preceding 24 bunch crossings. It uses bipolar shaping with positive and negative output, which ensures that the signal induced by out-of-time pile-up averages to zero. In the region |η| < 1.7, the hadronic (Tile) calorimeter is constructed from steel absorber and scintillator tiles and is separated into barrel (|η| < 1.0) and extended barrel (0.8 < |η| < 1.7) sections. The fast response of the Tile calorimeter makes it less sensitive to out-of-time pile-up.
The MS forms the outer layer of the ATLAS detector and is dedicated to the detection and measurement of high-energy muons in the region |η| < 2.7. A multi-level trigger system of dedicated hardware and software filters is used to select pp collisions producing highp T particles.

Data and MC samples
The studies presented in this paper are performed using a data set of pp collisions at √ s = 13 TeV, corresponding to an integrated luminosity of 3.2 fb −1 , collected in 2015 during which the LHC operated with a bunch spacing of 25 ns. There are on average 13.5 interactions per bunch crossing in the data sample used for the analysis.
Samples of simulated events used for comparisons with data are reweighted to match the distribution of the number of pile-up interactions observed in data. The average number of interactions per bunch crossing μ in the data used as reference for the reweighting is divided by a scale factor of 1.16 ± 0.07. This scale factor takes into account the fraction of visible cross-section due to inelastic pp collisions as measured in the data [7] and is required to obtain good agreement with the number of inelastic interactions reconstructed in the tracking detector as predicted in the reweighted simulation. In order to extend the study of the pile-up dependence, simulated samples with an average of 22 interactions per bunch crossing are also used. Dijet events are simulated with the Pythia8.186 [8] event generator using the NNPDF2.3LO [9] set of parton distribution functions (PDFs) and the parameter values set according to the A14 underlying-event tune [10]. Simulated tt events are generated with powheg box v2.0 [11][12][13] using the CT10 PDF set [14]; Pythia6.428 [15] is used for fragmentation and hadronization with the Perugia2012 [16] tune that employs the CTEQ6L1 [17] PDF set. A sample of leptonically decaying Z bosons produced with jets (Z (→ )+jets) and VBF H → τ τ samples are generated with powheg box v1.0 and Pythia8.186 is used for fragmentation and hadronization with the AZNLO tune [18] and the CTEQ6L1 PDF set. For all samples, the EvtGen v1.2.0 program [19] is used for properties of the bottom and charm hadron decays. The effect of in-time as well as out-of-time pile-up is simulated using minimum-bias events generated with Pythia8.186 to reflect the pile-up conditions during the 2015 data-taking period, using the A2 tune [20] and the MSTW2008LO [21] PDF set. All generated events are processed with a detailed simulation of the ATLAS detector response [22] based on Geant4 [23] and subsequently reconstructed and analysed in the same way as the data.

Event reconstruction
The raw data collected by the ATLAS detector is reconstructed in the form of particle candidates and jets using various pattern recognition algorithms. The reconstruction used in this analysis are detailed in Ref. [1], while an overview is presented in this section.

Calorimeter clusters and towers
Jets in ATLAS are reconstructed from clusters of energy deposits in the calorimeters. Two methods of combining calorimeter cell information are considered in this paper: topological clusters and towers.
Topological clusters (topo-clusters) [24] are built from neighbouring calorimeter cells. The algorithm uses as seeds calorimeter cells with energy significance 3 |E cell |/σ noise > 4, combines all neighbouring cells with |E cell |/σ noise > 2 and finally adds neighbouring cells without any significance requirement. Topo-clusters are used as input for jet reconstruction.
Calorimeter towers are fixed-size objects ( η × φ = 0.1 × 0.1) [26] that ensure a uniform segmentation of the calorimeter information. Instead of building clusters, the cells are projected onto a fixed grid in η and φ corresponding to 6400 towers. Calorimeter cells which completely fit within a tower contribute their total energy to the single tower. Other cells extending beyond the tower boundary contribute to multiple towers, depending on the overlap fraction of the cell area with the towers. In the following, towers are matched geometrically to jets reconstructed using topo-clusters and are used for jet classification.

Vertices and tracks
The event hard-scatter primary vertex is defined as the reconstructed primary vertex with the largest p 2 T of constituent tracks. When evaluating performance in simulation, only events where the reconstructed hard-scatter primary vertex lies | z| < 0.1 mm from the true hard-scatter interaction are considered. For the physics processes considered, the reconstructed hard-scatter primary vertex matches the true hard-scatter interaction more than 95% of the time. Tracks are required to have p T > 0.5 GeV and to satisfy quality criteria designed to reject poorly measured or fake tracks [27]. Tracks are assigned to primary vertices based on the trackto-vertex matching resulting from the vertex reconstruction. Tracks not included in vertex reconstruction are assigned to the nearest vertex based on the distance | z × sin θ |, up to a maximum distance of 3.0 mm. Tracks not matched to any vertex are not considered. Tracks are then assigned to jets by adding them to the jet clustering process with infinitesimal p T , a procedure known as ghost-association [28].

Jets
Jets are reconstructed from topo-clusters at the EM scale 4 using the anti-k t [29] algorithm, as implemented in Fastjet 2.4.3 [30], with a radius parameter R = 0.4. After a jet-area-based subtraction of pile-up energy, a response correction is applied to each jet reconstructed in the calorimeter to calibrate it to the particle-level jet energy scale [1,25,31]. Unless noted otherwise, jets are required to have 20 GeV < p T < 50 GeV. Higherp T forward jets are ignored due to their negligible pile-up rate at the pile-up conditions considered in this paper. Central jets are required to be within |η| of 2.5 so that most of their charged particles are within the tracking coverage of the inner detector. Forward jets are those in the region 2.5 < |η| < 4.5, and no tracks associated with their charged particles are measured beyond |η| = 2.5.
Jets built from particles in the Monte Carlo generator's event record ("truth particles") are also considered. Truthparticle jets are reconstructed using the anti-k t algorithm with R = 0.4 from stable 5 final-state truth particles from the simulated hard-scatter (truth-particle hard-scatter jets) or in-time pile-up (truth-particle pile-up jets) interaction of choice. A third type of truth-particle jet (inclusive truth-particle jets) is reconstructed by considering truth particles from all interactions simultaneously, in order to study the effects of pile-up interactions on truth-particle pile-up jets.
The simulation studies in this paper require a classification of the reconstructed jets into three categories: hardscatter jets, QCD pile-up jets, and stochastic pile-up jets. Jets are thus truth-labelled based on a matching criterion to truth-particle jets. Similarly to Ref. [1], jets are first classified as hard-scatter or pile-up jets. Jets are labelled as hard-scatter jets if a truth-particle hard-scatter jet with p T > 10 GeV is found within R = ( η) 2 + ( φ) 2 of 0.3. The p T > 10 GeV requirement is used to avoid accidental matches of reconstructed jets with soft activity from the hard-scatter interaction. In cases where more than one truthparticle jet is matched, p truth T is defined from the highestp T truth-particle hard-scatter jet within R of 0.3.
Jets are labelled as pile-up jets if no truth-particle hardscatter jet with p T > 4 GeV is found within R of 0.6. These pile-up jets are further classified as QCD pile-up if they are matched within R < 0.3 to a truth-particle pile-up jet or as stochastic pile-up jets if there is no truth-particle pile-up jet within R < 0.6, requiring that truth-particle pile-up jets have p T > 10 GeV in both cases. Jets with 0.3 < R < 0.6 relative to truth-particle hard-scatter jets with p T > 10 GeV or R < 0.3 of truth-particle hard-scatter jets with 4 GeV < p T < 10 GeV are not labelled because their nature cannot be unambiguously determined. These jets are therefore not used for performance based on simulation. 5 Truth particles are considered stable if their decay length cτ is greater than 1 cm. A truth particle is considered to be interacting if it is expected to deposit most of its energy in the calorimeters; muons and neutrinos are considered to be non-interacting.

Jet Vertex Tagger
The Jet Vertex Tagger (JVT) is built out of the combination of two jet variables, corrJVF and R 0 pT , that provide information to separate hard-scatter jets from pile-up jets. The quantity corrJVF [1] is defined for each jet as where PV i denotes the reconstructed event vertices (PV 0 is the identified hard-scatter vertex and the PV i are sorted by decreasing p 2 T ), and p trk T (PV 0 ) is the scalar p T sum of the tracks that are associated with the jet and originate from the hard-scatter vertex. The term p PU T = i≥1 p trk T (PV i ) denotes the scalar p T sum of the tracks associated with the jet and originating from pile-up vertices. To correct for the linear increase of p PU T with the total number of pile-up tracks per event (n PU trk ), p PU T is divided by (k · n PU trk ) with the parameter k set to 0.01 [1]. 6 The variable R 0 pT is defined as the scalar p T sum of the tracks that are associated with the jet and originate from the hard-scatter vertex divided by the fully calibrated jet p T , which includes pile-up subtraction: This observable tests the compatibility between the jet p T and the total p T of the hard-scatter charged particles within the jet. Its average value for hard-scatter jets is approximately 0.5, as the numerator does not account for the neutral particles in the jet. The JVT discriminant is built by defining a two-dimensional likelihood based on a k-nearest neighbour (kNN) algorithm [32]. An extension of the R 0 pT variable computed with respect to any vertex i in the event, jet T , is also used in this analysis.
Electrons and muons Electrons are built from EM clusters and associated ID tracks. They are required to satisfy |η| < 2.47 and p T > 10 GeV, as well as reconstruction quality and isolation criteria [33].

Origin and structure of pile-up jets
The additional transverse energy from pile-up interactions contributing to jets originating from the hard-scatter (HS) interaction is subtracted on an event-by-event basis using the jet-area method [1,36]. However, the jet-area subtraction assumes a uniform pile-up distribution across the calorimeter, while local fluctuations of pile-up can cause additional jets to be reconstructed. The additional jets can be classified into two categories: QCD pile-up jets, where the particles in the jet stem mostly from a single QCD process occuring in a single pile-up interaction, and stochastic jets, which combine particles from different interactions. Figure 2 shows an event with a hard-scatter jet, a QCD pile-up jet and a stochastic pile-up jet. Most of the particles associated with the hardscatter jet originate from the primary interaction. Most of the particles associated with the QCD pile-up jet originate from a single pile-up interaction. The stochastic pile-up jet includes particles associated with both pile-up interactions in the event, without a single prevalent source.
While this binary classification is convenient for the purpose of description, the boundary between the two categories is somewhat arbitrary. This is particularly true in harsh pileup conditions, with dozens of concurrent pp interactions, where every jet, including those originating primarily from the identified hard-scatter interaction, also has contributions from multiple pile-up interactions.
In order to identify and reject forward pile-up jets, a twofold strategy is adopted. Stochastic jets have intrinsic differences in shape with respect to hard-scatter and QCD pile-up jets, and this shape can be used for discrimination. On the other hand, the calorimeter signature of QCD pile-up jets does not differ fundamentally from that of hard-scatter jets. Therefore, QCD pile-up jets are identified by exploiting transverse momentum conservation in individual pile-up interactions.
The nature of pile-up jets can vary significantly whether or not most of the jet energy originates from a single interaction. Figure 3 shows the fraction of QCD pile-up jets among all pile-up jets, when considering inclusive truth-particle jets. The corresponding distributions for reconstructed jets are shown in Fig. 4. When considering only in-time pile-up contributions ( Fig. 3), the fraction of QCD pile-up jets depends on the pseudorapidity and p T of the jet and the average number of interactions per bunch crossing μ . Stochastic jets are more likely at low p T and |η| and in harsher pile-up conditions. However, the comparison between Fig. 3, containing inclusive truth-particle jets, and Fig. 4, containing reconstructed jets, suggests that only a small fraction of stochastic jets are due to in-time pile-up. Indeed, the fraction of QCD pile-up jets decreases significantly once out-of-time pile-up effects and detector noise and resolution are taken into account. Even though the average amount of out-oftime energy is higher in the forward region, topo-clustering results in a stronger suppression of this contribution in the forward region. Therefore, the fraction of QCD pile-up jets increases in the forward region, and it constitutes more than 80% of pile-up jets with p T > 30 GeVoverall. Similarly, the minimum at around |η| = 1 corresponds to a maximum in the pile-up noise distribution [24], which results in a larger number of stochastic pile-up jets relative to QCD pile-up jets. The fraction of stochastic jets becomes more prominent at low p T and it grows as the number of interactions increases. The majority of pile-up jets in the forward region are QCD pile-up jets, although a sizeable fraction of stochastic jets is present in both the central and forward regions.
In the following, each source of forward pile-up jets is addressed with algorithms targeting its specific features.

Stochastic pile-up jet tagging with time and shape information
Given the evidence presented in Sect. 3 that out-of-time pileup plays an important role for stochastic jets, a direct handle consists of the timing information associated with the jet. The jet timing t jet is defined as the energy-weighted average of the timing of the constituent clusters. In turn, the cluster timing is defined as the energy-weighted average of the timing of the constituent calorimeter cells. The jet timing distribution, shown in Fig. 5, is symmetric and centred at t jet = 0 for both the hard-scatter and pile-up jets. However, the significantly wider distribution for stochastic jets reveals the large out-of-time pile-up contribution. For jets with 20 < p T < 30 GeV, requiring |t jet | < 12 ns ensures that 20% of stochastic pile-up jets are rejected while keeping 99% of hard-scatter jets. In the following, this is always applied as a baseline requirement when identifying stochastic pile-up jets. Stochastic jets can be further suppressed using shape information. Being formed from a random collection of particles from different interactions, stochastic jets lack the char-acteristic dense energy core of jets originating from the showering and hadronization of a hard-scatter parton. The energy is instead spread rather uniformly within the jet cone. Therefore, pile-up mitigation techniques based on jet shapes have been shown to be effective in suppressing stochastic pile-up jets [2]. In this section, the challenges of this approach are presented, and different algorithms exploiting the jet shape information are described and characterized.
The jet width w is a variable that characterizes the energy spread within a jet. It is defined as where the index k runs over the jet constituents and R(jet, k) is the angular distance between the jet constituent k and the jet axis. The jet width is a useful observable for identifying stochastic jets, as the average width is significantly larger for jets with a smaller fraction of energy originating from a single interaction.
In simulation the jet width can be computed using truthparticles (truth-particle width), as a reference point to bench-  mark the performance of the reconstructed observable. At detector level, the jet constituents are calorimeter topoclusters. In general, topo-clustering compresses the calorimeter information while retaining its fine granularity. Ide-ally, each cluster captures the energy shower from a single incoming particle. However, the cluster multiplicity in jets decreases quickly in the forward region, to the point where jets are formed by a single cluster and the jet width can no  (c) longer be defined. An alternative approach consists of using as constituents the 11 by 11 grid of calorimeter towers in η × φ, centred around the jet axis. The use of calorimeter towers ensures a fixed multiplicity given by the 0.1 × 0.1 granularity so that the jet width always contains jet shape information.
As shown in Fig. 6, the average jet width depends on the pile-up conditions. At higher pile-up values, a larger number of pile-up particles are likely to contribute to a jet, thus broadening the energy distribution within the jet itself. As a result, the width drifts towards higher values for hardscatter, QCD pile-up, and stochastic jets. The difference in width between hard-scatter and QCD pile-up jets is due to the different underlying p T spectra. The spectrum of QCD pile-up jets is softer than that of the hard-scatter jets for the process considered (tt); therefore, a significant fraction of QCD pile-up jets are reconstructed with p T between 20 and 30 GeVbecause the stochastic and out-of-time component is larger than in hard-scatter jets.
Using calorimeter towers as constituents, it is possible to explore the p T distribution within a jet with a fixed η × φ granularity. Figure 7 shows the two-dimensional p T distribution around the jet axis for hard-scatter jets. The distribu-  tower constituents, is considered. The two-dimensional 7 p T distribution in the η-φ plane centred around the jet axis is fitted with a function Both the width of the Gaussian component of the fit and the range in which the fit is performed are treated as jetindependent constants. The fit range, an 11 × 11 tower grid, optimizes the balance between an improved constant (α) and linear (β) term measurement by using a larger range and a decreased risk of including outside pile-up fluctuations by using a smaller range. On average, the jet tower p T distribution is symmetric with respect to φ, and pile-up rejection at constant hard-scatter efficiency is improved by averaging the tower momenta at | φ| and −| φ| so that fluctuations are partially cancelled before performing the fit.
The constant (α) and linear (β) terms in the fit capture the average stochastic pile-up contribution to the jet p T distribution, while the Gaussian term describes the p T distribution from the underlying hard-scatter or QCD pile-up jet. The parameter γ therefore represents a stochastic pile-upsubtracted estimate of the p T of such a hard-scatter or QCD pile-up jet in a R = 0.1 core assuming a Gaussian p T distribution of its constituent towers. By definition, γ does not depend on the amount of pile-up in the event, but only on the stochastic nature of the jet.. In order to make the fitting procedure more robust, the Gaussian width parameter is fixed. While the width of a hard-scatter or QCD pile-up jet is expected to depend on the truth-particle jet p T and η, such dependence is negligible in the p T range relevant for these studies (20-50 GeV). Figure 8, showing projections of the tower distribution with the fit function overlaid, illustrates the characteristic peaking shape of pure hard-scatter jets compared with the flatter distribution in stochastic jets. The hardscatter jet distribution displays the expected, sharply peaked 7 The simultaneous fit of both dimensions was found to perform better than the fit of a 1D projection. distribution, while the stochastic pile-up jet distribution is flat with various off-centre features, reflecting the randomness of the underlying processes.
The performance of the γ variable and of the cluster-based and tower-based widths is compared in Fig. 9, where the efficiency for stochastic pile-up jets is shown as a function of the hard-scatter jet efficiency. Each curve is obtained by applying an upper or lower bound on the jet width or γ , respectively, in order to select hard-scatter jets. The tower-based width outperforms the cluster-based width over the whole efficiency range, while the γ variable performs similarly to the tower-based width. The hard-scatter efficiency and pileup efficiency dependence on the number of reconstructed vertices in the event (N PV ) and η is shown in Fig. 10; the requirement for each discriminant is tuned so that an overall efficiency of 90% is achieved for hard-scatter jets. By construction, the performance of the γ variable is less affected by the pile-up conditions than the two width variables.
The γ parameter is a good discriminant for stochastic pileup jets because it provides an estimate of the largest amount of p T in the jet originating from a single vertex. If there is no dominant contribution, the p T distribution does not feature a prominent core, and therefore γ is close to zero. With this approach, all jets are effectively considered as QCD pile-up jets, and γ is used to estimate their core p T . Therefore, from this stage, the challenge of pile-up rejection is reduced to the identification and rejection of QCD pile-up jets, which is discussed in the following section.

QCD pile-up jet tagging with topological information
While it has been shown that pile-up mitigation techniques based on jet shapes are effective in suppressing stochastic pile-up jets, such methods do not address QCD pile-up jets that are prevalent in the forward region. This section describes the development of an effective rejection method specifically targeting QCD pile-up jets.

(d)
QCD pile-up jets originate from a single pp interaction where multiple jets can be produced. The total transverse momentum associated with each pile-up interaction is expected to be conserved; 8 therefore all jets and central tracks associated with a given vertex can be exploited to identify QCD pile-up jets beyond the tracking coverage of the inner is that the transverse momentum of each pile-up interaction should be balanced, and any imbalance would be due to a forward jet from one of the interactions.
In order to properly compute the transverse momentum of each interaction, only QCD pile-up jets should be considered. Consequently, the challenge of identifying forward QCD pile-up jets using transverse momentum conservation with central pile-up jets requires being able to discriminate between QCD and stochastic pile-up jets in the central region.

A discriminant for central pile-up jet classification
Discrimination between stochastic and QCD pile-up jets in the central region can be achieved using track and vertex information. This section describes a new discriminant built for this purpose.
The underlying features of QCD and stochastic pile-up jets are different. Tracks matched to QCD pile-up jets mostly originate from a vertex PV i corresponding to a pile-up interaction (i = 0), thus yielding R i pT > R 0 pT for a given jet. Such jets have large values of R i pT with respect to the pile-up vertex i from which they originated. Tracks matched to stochastic pile-up jets are not likely to originate from the same interaction, thus yielding small R i pT values with respect to any vertex i. This feature can be exploited to discriminate between these two categories. For stochastic pile-up jets, the largest R i pT value is going to be of similar size as the average R i pT value across all vertices, while a large difference will show for QCD jets, as most tracks originate from the same pile-up vertex.
Thus, the difference between the leading and median values of R i pT for a central jet, R pT , can be used for distinguishing QCD pile-up jets from stochastic pile-up jets in the central region, as shown in Fig. 11. A minimum R pT requirement can effectively reject stochastic pile-up jets. In the following a R pT > 0.2 requirement is applied for central jets with p T < 35 GeV. Above this threshold the fraction of stochastic pile-up jets is negligible, and all pile-up jets are therefore assumed to be QCD pile-up jets irrespective of their R pT value. The choice of threshold depends on the pile-up conditions. This choice is tuned to be optimal for the collisions considered in this study, with an average of 13.5 interactions per bunch crossing.
The total transverse momentum of each vertex is thus computed by averaging, with a vectorial sum, the total transverse momentum of tracks and central jets assigned to the vertex. The jet-vertex matching is performed by considering the largest R i pT for each jet. The transverse momentum vector ( p T ) of a given forward jet is then compared with the total transverse momentum of each vertex in the event. If there is at least one pile-up vertex in the event with a large total vertex transverse momentum back-to-back in φ with respect to the forward jet, the jet itself is likely to have originated from that vertex. Figure 12 shows an example event, where the p T Fig. 11 Distribution of R pT for stochastic and QCD pile-up jets, as observed in dijet events with Pythia8.186 pile-up simulation a forward pile-up jet is back-to-back with respect to the total transverse momentum of the vertex from which it is expected to have originated.

Forward jet vertex tagging algorithm
The procedure is referred to as forward jet vertex tagging (fJVT). The main parameters for the forward JVT algorithm are thus the maximum JVT value, JVT max , to reject central hard-scatter jets and the minimum R pT requirement to ensure the selected pile-up jets are QCD pile-up jets. JVT max is set to 0.14 corresponding to an efficiency of selecting pileup jets of 93% in dijet events. The minimum R pT requirement defines the operating point in terms of efficiency for selecting QCD pile-up jet and contamination from stochastic pile-up jets. A minimum R pT of 0.2 is required, corresponding to an efficiency of 70% for QCD pile-up jets and 20% for stochastic pile-up jets in dijet events. The selected jets are then assigned to the vertex PV i corresponding to the highest R i pT value. For each pile-up vertex i, i = 0, the missing transverse momentum p miss T,i is computed as the weighted vector sum of the jet ( p jet T ) and track ( p track T ) transverse momenta: The factor k accounts for intrinsic differences between the jet and track terms. The track component does not include the contribution of neutral particles, while the jet component is not sensitive to soft emissions significantly below 20 GeV. The value k = 2.5 is chosen as the one that optimizes the overall rejection of forward pile-up jets. The fJVT discriminant for a given forward jet, with respect to the vertex i, is then defined as the normalized projection of the missing transverse momentum on p fj T : where p fj T is the forward jet's transverse momentum. The motivation for this definition is that the amount of missing transverse momentum in the direction of the forward jet needed for the jet to be tagged should be proportional to the jet's transverse momentum. The forward jet is therefore tagged as pile-up if its fJVT value, defined as fJVT = max i (fJVT i ), is above a threshold. The choice of threshold determines the pile-up rejection performance. The fJVT discriminant tends to have larger values for QCD pile-up jets, while the distribution for hard-scatter jets falls steeply, as shown in Fig. 13. Figure 14 shows the efficiency of selecting forward pile-up jets as a function of the efficiency of selecting forward hardscatter jets when varying the maximum fJVT requirement.

Performance
Using a maximum fJVT of 0.5 and 0.4 respectively, hardscatter efficiencies of 92 and 85% are achieved for pile-up efficiencies of 60 and 50%, considering jets with 20 < p T < 50 GeV. The dependence of the hard-scatter and pile-up efficiencies on the forward jet p T is shown in Fig. 15. For lowp T forward jets, the probability of an upward fluctuation in the fJVT value is more likely, and therefore the efficiency for hard-scatter jets is slightly lower than for higherp T jets. The hard-scatter efficiency depends on the number of pile-up interactions, as shown in Fig. 16, as busier pile-up conditions increase the chance of accidentally matching the hard-scatter jet to a pile-up vertex. The pile-up efficiency depends on the p T of the forward jets, due to the p T -dependence of the relative numbers of QCD and stochastic pile-up jets.

Efficiency measurements
The fJVT efficiency for hard-scatter jets is measured in Z + jets data events, exploiting a tag-and-probe procedure similar to that described in Ref. [1].
For Z (→ μμ)+jets events, selected by single-muon triggers, two muons of opposite sign and p T > 25 GeV are required, such that their invariant mass lies between 66 and 116 GeV. Events are further required to satisfy event and jet quality criteria, and a veto on cosmic-ray muons.

ATLAS Simulation
Fig. 14 Efficiency for pile-up jets in simulated Z +jets events as a function of the efficiency for hard-scatter jets for different jet p T ranges.eps Using the leading forward jet recoiling against the Z boson as a probe, a signal region of forward hard-scatter jets is defined as the back-to-back region specified by | φ(Z , jet)| > 2.8 rad. In order to select a sample pure in forward hard-scatter jets, events are required to have no central hard-scatter jets with p T > 20 GeV, identified with JVT, and exactly one forward jet. The Z boson is required to have p T > 20 GeV, as events in which the Z boson has p T less than the minimum defined jet p T have a lower hard-scatter purity. The above selection results in a forward hard-scatter signal region that is greater than 98% pure in hard-scatter jets relative to pile-up jets, as estimated in simulation.
The fJVT distributions for data and simulation in the signal region are compared in Fig. 17. The data distribution is observed to have fewer jets with high fJVT than predicted by simulation, consistent with an overestimation of the number of pile-up jets, as reported in Ref. [1].
The pile-up jet contamination in the signal region N signal PU (| φ(Z , jet)| > 2.8 rad) is estimated in a pile-up-enriched control region with | φ(Z , jet)| < 1.2 rad, based on the assumption that the | φ(Z , jet)| distribution is uniform for pile-up jets. The validity of such assumption was verified in simulation. The pile-up jet rate in data is therefore used to estimate the contamination of the signal region as The hard-scatter efficiency is therefore measured in the signal region as  represent the overall number of pile-up jets in the signal region and the number of pile-up jets satisfying the fJVT requirements, respectively, and are both estimated from simulation. Figure 18 shows the hard-scatter efficiency evaluated in data and simulation. The uncertainties correspond to a 30% uncertainty in the number of pile-up jets and a 10% uncertainty in the number of hard-scatter jets in the signal region. The uncertainties are estimated by comparing data and simulation in the pile-up-and hard-scatter-enriched regions, respectively. The hard-scatter efficiency is found to be underestimated in simulation, consistent with the simulation overestimating the pile-up activity in data. The level of disagreement is observed to be larger at low jet p T and high |η| and can be as large as about 3%. The efficiencies evaluated in this paper are used to define a calibration procedure accounting for this discrepancy. The uncertainties associated with the calibration and resolution of the jets used to compute fJVT are estimated in ATLAS analyses by recomputing fJVT for each variation reflecting a systematic uncertainty. The fJVT and γ discriminants correspond to a twofold strategy for pile-up rejection targeting QCD and stochastic pileup jets, respectively. However, as highlighted in Sect. 3, this classification is not well defined as all jets have a stochastic component. Therefore, it is useful to define a coherent strategy that addresses both the stochastic and QCD nature of pile-up jets at the same time.
The γ parameter discussed in Sect. 4 provides an estimate of the p T in the core of the jet originating from the single interaction contributing the largest amount of transverse momentum to the jet. Therefore, the fJVT definition can be modified to exploit this estimation by replacing the jet p T with γ , so that where u fj is the unit vector representing the direction of the forward jet in the transverse plane. Figure 19 shows the performance of fJVT γ compared with fJVT and γ independently. The fJVT γ discriminant outperforms the individual discriminants over the whole efficiency range. In samples enriched in QCD pile-up jets (30 < p T < 50 GeV), the fJVT γ performance is driven by the topology information, while fJVT γ benefits from the shape information for rejecting stochastic pile-up jets. A multivariate combination of fJVT and γ discriminants was also studied and found to be similar in performance to fJVT γ .

Impact on physics of Vector-Boson Fusion
In order to quantify the impact of forward pile-up rejection on a VBF analysis, the VBF H → τ τ signature is considered, in the case where the τ decays leptonically. The pile-up dependence of the signal purity (S/B) is studied in a simplified analysis in the dilepton channel. Several other channels are used in the analysis of VBF H → τ τ by ATLAS; the dilepton channel is chosen for this study by virtue of its simple selection and background composition. The dominant background in this channel originates from Z +jets production, where the Z boson decays leptonically, either to electrons, muons, or a leptonically decaying τ τ pair. The rate of Z bosons produced in association with two jets satisfying the requirements targeting the VBF topology is extremely low. The requirements include large η between the jets and large dijet invariant mass m jj . However, background events with forward pile-up jets often have large η and m jj , mimicking the VBF topology. As a consequence, the background acceptance grows almost quadratically with the number of pile-up interactions. This section illustrates the mitigation of this effect that can be achieved with the pile-up rejection provided by fJVT γ .
The event selection used for this study was optimized using simulation without pile-up [26]: • The event must contain exactly two opposite-charge same-flavour leptons + − (with = e,μ) with p T >15 GeV; • The invariant mass of the lepton pair must satisfy m + − < 66 GeV or m + − > 116 GeV; • The magnitude of the missing transverse momentum must be larger than 40 GeV; • The event must contain two jets with p T > 20 GeV, one of which has p T > 40 GeV. The absolute difference in rapidities |η j 1 − η j 2 | must exceed 4.4 and the invariant mass of the two jets must exceed 700 GeV. • For simulated VBF H → τ τ only, both jets are required to be truth-labelled as hard-scatter jets.
The impact of pile-up mitigation is emulated by randomly removing hard-scatter and pile-up jets to match the perfor-  are used as reference. Parameterized hard-scatter efficiency and pile-up efficiency are used. The lower panels display the ratio to the reference without pile-up rejection mance of a fJVT γ requirement with 85% overall efficiency for hard-scatter jets with 20 < p T < 50 GeV, as estimated in tt simulation with an average μ of 13.5. The efficiencies are estimated as a function of the jet p T and the average number of interactions per bunch crossing. Figure 20 shows the expected numbers of signal and background events, as well as the signal purity, as a function of μ . When going from μ of 10 to 35, the expected number of background events grows by a factor of seven and the corresponding signal purity drops by a factor of eight, indicating that the presence of pile-up jets enhances the background acceptance. The slight decrease in signal acceptance is due to misidentification of pile-up jets as VBF jets. The fJVT γ algorithm mitigates the background growth, at the expense of a signal loss proportional to the hard-scatter jet efficiency. 9 Therefore, the degradation of the purity due to pile-up can be effectively reduced. For the specific final state and event selection under consideration, where Z +jets production is the dominant background, this results in about a fourfold improvement in signal purity at μ = 35.

Conclusions
The presence of multiple pp interactions per bunch crossing at the LHC, referred to as pile-up, results in the reconstruction of additional jets beside the ones from the hard-scatter interaction. The ATLAS baseline strategy for identifying and rejecting pile-up jets relies on matching tracks to jets to determine the pp interaction of origin. This strategy cannot be applied for jets beyond the tracking coverage of the inner detector. However, a broad spectrum of physics measurements at the LHC relies on the reconstruction of jets at high pseudorapidities. An example is the measurement of Higgs boson production through vector-boson fusion. The presence of pile-up jets at high pseudorapidities reduces the sensitivity for these signatures, by incorrectly reconstructing these final states in background events.
The techniques presented in this paper allow the identification and rejection of pile-up jets beyond the tracking coverage of the inner detector. The strategy to perform such a task is twofold. First, the information about the jet shape is used to estimate the leading contribution to the jet above the stochastic pile-up noise. Then the topological correlation among particles originating from a pile-up interaction is exploited to extrapolate the jet vertex tagger, using track and vertex information, beyond the tracking coverage of the inner detector to identify and reject pile-up jets at high pseudorapidities. When using both shape and topological information, approximately 57% of forward pile-up jets are rejected for a hard-scatter efficiency of about 85% at the pile-up conditions considered in this paper, with an average of 22 pile-up interactions. In events with 35 pile-up interactions, typical conditions for the LHC operations in the near future, 37, 48, and 51% of forward pile-up jets are rejected using, respectively, topological information, shape information, and their combination, for the same 85% hard-scatter efficiency.
A procedure is defined and used to measure the efficiency of identifying hard-scatter jets in 3.2 fb −1 of pp collisions at √ s = 13 TeV collected in 2015. The efficiencies are measured in data and estimated in simulation as a function of the jet kinematics. Discrepancies of up to approximately 3% are observed, mainly due to the modelling of pile-up events.
The impact of forward pile-up rejection algorithms presented here is estimated in a simplified study of Higgs boson production through vector-boson fusion and decaying into a τ τ pair; the signal purity for the baseline selection under consideration, where Z +jets production is the dominant background, is enhanced by a factor of about four for events with 35 pile-up interactions.