Fuzzy Jets

Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets. To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets, are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variables. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.


Introduction
As the result of a proton-proton collision at a hadron collider, hundreds of particles are created and detected [1,2]. While some particles can be identified by their type, such as electrons [3,4] and muons [5,6], most of the detected particles are light hadrons produced in collimated sprays called jets. Jets are the consequence of high energy quarks or gluons fragmenting into colorless hadrons. Experimentally, jets are defined by clustering schemes which group together measured calorimeter energy deposits or reconstructed charged particle tracks. A jet algorithm is a clustering scheme that connects the measured objects with theoretical quantities that can be calculated and simulated. At a hadron collider, the natural coordinates for describing particles are p T , y, and φ, where p T is the magnitude of the momentum transverse to the proton beam, y is the rapidity, and φ is the azimuthal angle. Particles or calorimeter energy deposits are clustered using jet algorithms based on distance metrics on their coordinates in (p T , ρ) = (p T , y, φ). In order for a jet algorithm to be useful to experimentalists and theorists, the collection of jets should be IRC safe in the following sense: 1. Infrared safe (IR): if a particle i is added with |p T | → 0, the jets are unaffected.
2. Collinear safe (C): if a particle i with momentum p i is replaced with two particles j and k with momenta p j + p k = p i such that | ρ i − ρ j | = 0, then the jets are unaffected.
The jet algorithms most widely used at hadron colliders fall into a class of schemes known as sequential recombination [7]. These IRC safe schemes require metrics d on momenta d ij = d(p i , p j ) : (p i , p j ) → R + , d iB = d(p i ) : p i → R + and proceed as follows: 1. Assign each particle as a proto-jet.
2. Repeat until there are no proto-jets left: Let (k, ) = argmin i,j d(p i , p j ) and without loss of generality, d kB < d B . If d kB < d k , declare proto-jet k a jet and remove it from the list. Otherwise, combine proto-jets k and into a new proto-jet with momentum p new = p + p k .
One common prescription is called the Cambridge-Aachen (C/A) algorithm [8,9], which uses d ij = | ρ i − ρ j | 2 /R 2 and d iB = 1. The fixed quantity R is roughly the size of the jet in (y, φ). By far, the most ubiquitous jet algorithm used at the Large Hadron Collider (LHC) is the anti-k t algorithm [10] with d ij = min(p −2 T,i , p −2 T,j )| ρ i − ρ j | 2 /R 2 and d iB = p −2 T,i . The purpose of this paper is to introduce a new paradigm for jet clustering, called fuzzy jets, based on probabilistic mixture modeling. Section 2 introduces the statistical concept of a mixture model and describes the necessary modification to make the procedure IRC safe. Section 3 gives one efficient method for clustering fuzzy jets based on the Expectation-Maximization (EM) algorithm. Section 4.4 contains several examples comparing fuzzy jets with sequential recombination and Sec. 5 describes how one might mitigate the impact of overlapping proton-proton collisions (pileup). We conclude in Sec. 6 with some summary remarks and outlook for the future.

Mixture Model Jets
Mixture models [11] are a statistical tool for clustering which postulate a particular class of probability densities for the data to be clustered. Generically, for grouping m n-dimensional data points into k clusters, the mixture model density is where π j is the unknown weight of cluster j such that j π j = 1 and f (x i |θ j ) is a probability density on n-dimensions with unknown parameters θ j to be learned from the data. A common choice for f is the normal density Φ with θ j = (µ j , Σ j ) for µ j the n-dimensional mean and Σ j the n × n covariance matrix. In the mixture model paradigm, the θ j are the cluster properties; in the Gaussian case, µ j is the location of cluster j and Σ j describes its shape in the n-dimensional space. When clustering with a finite mixture, the number of clusters k must be specified ahead of time 1 , which is dual to the usual use of sequential recombination 2 in which k is learned and the size of jets is specified ahead of time. The standard objective in (frequentist) mixture modeling is to select the parameters θ j which maximize the likelihood (Eq. 2.1) of the observed dataset. Figure 1 illustrates what the learned event density might look like for k = 3 and Gaussian f = Φ in n = 2 dimensions.
is Mixture Modeling ution with parametersμ + fixed number of clusters k, learn ⇡ j (X |μ j ),X ⌘ 4-vector of particle odel for choosing particle locations by: f membership om variables ⇠ $ j where the dot · is a placeholder for the function argument. 1 There is a wealth of literature on the subject of choosing k, for a survey of methods, see [12]. The likelihood monotonically increases with k; as alternatives to maximum likelihood, one can for instance look for kinks in the likelihood as a function of k [13]. 2 It is similar to the exclusive form of the kT sequential recombination scheme [14]. The exclusive nature of the algorithm (and the minimization procedure used to find the jets) is similar to the XCone algorithm [15,16] that became public as this manuscript was in its final preparation.
An equivalent way of approaching mixture modeling is to view Eq. (2.1) as the density used to generate the data. We view the data as having been drawn randomly from the density specified in Eq. (2.1), with the following setup: 1. Throw n independent and identical k-sided dice with probability π j to land on side j = 1, ..., k and label the outcomes λ 1 , ..., λ n .
Once θ and π are learned by minimizing Eq. (2.1), we can compute q ij = Pr(λ i = j | x i ), the posterior probability that x i was generated by f (· | θ j ) or, intuitively, the posterior probability that x i belongs to cluster j. The q ij are the soft assignments of particles i to jet j and will play an important role in Sec. 3 when we show how to maximize the likelihood in Eq. (2.1). Jets produced with mixture modeling are called fuzzy jets because of the soft memberships -every particle can belong to every jet with some probability 3 . This can be seen explicitly in Fig. 1 where the densities of all three clusters are everywhere nonzero, so q ij > 0 for all j. The idea of probabilistic membership was recently studied in the context of the Q-jets algorithm [19] in which the same event is interpreted many times by injecting randomness into the clustering procedure. Unlike Q-jets, fuzzy jets allocates the soft membership functions deterministically throughout the clustering procedure. However, like Q-jets, there is an ambiguity in how to assign kinematic properties to the clustered jets. Fuzzy jets are defined by their shape (and location), not their constituents. This is in contrast to anti-k t jets, which are defined by their constituents without an explicit shape determined from the clustering procedure. One simple assignment scheme is to define the momentum of a fuzzy jet j as In other words, this procedure assigns every particle to its most probable associated jet. This scheme will be known as the hard maximum likelihood (HML) scheme, but is not the only possible assignment algorithm. The dual problem in sequential recombination is the jet area, which must be defined [20], whereas the jet kinematics are the 'natural' coordinates.
We now specialize the likelihood in Eq. (2.1) to the case of clustering particles into jets at a collider like the LHC. Consider a mixture model in two dimensions 4 with x i = ρ i . The resulting mixture model (MM) jets are inherently not IR safe: particle p T does not appear in the likelihood and therefore arbitrarily low energy particles can influence the clustering procedure. Therefore, we add a modification to the log likelihood: We have studied several possibilities for f , but for the remainder of this paper will specialize to (wrapped 4 ) Gaussian f = Φ. The resulting fuzzy jets are called modified Gaussian Mixture Model jets (mGMM) and are parameterized by the locations µ j , the covariance matrices Σ i , and the cluster weights π j . We initialize π j = 1/k and Σ j = I. Since practical procedures for maximizing the modified likelihood in Eq. (2.3) may converge to stationary points that are not globally optimal, the output of a fuzzy jet algorithm will depend on an initial setting of the cluster parameters θ and π. One simple procedure, used exclusively for the rest of the paper, is to seed fuzzy jets based on the output of a sequential recombination jet algorithm. This guarantees an IRC safe initial condition and therefore the entire procedure is IRC safe. We now discuss practically how one can find the maximum of the fuzzy jets likelihood.

Clustering Fuzzy Jets: the EM Algorithm
One iterative procedure for maximizing the mixture model likelihood in Eq. (2.1) is the Expectation-Maximization (EM) algorithm [21][22][23]. After initializing the cluster locations and prior density π, the following two steps are repeated: Expectation Given the current values of θ j , compute the fuzzy membership probabilities Maximization Given q ij , maximize the expected modified complete log likelihood over the parameters π, µ, Σ.
The expected modified complete log likelihood has the form Note that the expected modified complete log likelihood is not the same as the expected modified log likelihood, shown in Eq. (2.3). They differ in that the complete log likelihood has the second sum outside the logarithm while Eq. (2.3) has the sum inside the logarithm. The power of the EM algorithm is that maximizing the complete log likelihood results in fixed point iteration to monotonically improve the original log likelihood. This desirable property of the EM algortihm is still true when α > 0; for a proof, see Appendix B. Many choices for f have closed form maxima for the M step; in the Gaussian f = Φ case outlined above, the updates are given by whereq ij = q ij p α T i / n l=1 p lj p α T l . The well-known k-means clustering algorithm [24] can be recovered as the limit of expectation-maximization in a Gaussian mixture model with Σ = σ 2 I, σ 2 → 0. Figure 2 illustrates GMM clustering using the EM algorithm with k = 2 clusters. The EM algorithm readily accommodates constraints on the model parameters. One constraint we will consider throughout the rest of the paper is Σ j = σ 2 j I for all j, which requires the curves of constant likelihood in (y, φ) to be circular. We will see in the next section that the learned value of σ j is useful for distinguishing jets originating from different physics processes.          The circles represent data points, the triangles represent the estimated cluster locations µ j , and the ellipsoids are equidensity contours describing the shapes Σ j of the learned cluster distributions. In the E-step, bluer colors correspond to higher value of p i,blue jet .

Comparisons with Sequential Recombination and Jet Tagging
This section describes some numerical comparisons between sequential recombination and fuzzy jets. Section 4.1 summarizes the simulation details with some first event displays showing both fuzzy and sequential recombination jets. These two approaches to jet clustering are studied over an ensemble of events in Sec. 4.2. A third subsection, Sec. 4.3, illustrates that fuzzy jets captures new information about the hadronic final state, and in the fourth section, Sec 4.4, it is demonstrated that this new information can be used to classify the jet type.

Details of the Simulation
In order to study fuzzy jets in a realistic scenario, we run Monte Carlo (MC) simulations. Three physics processes are generated using Pythia 8.170 [25,26] at √ s = 8 TeV. Hadronic W boson and top quarks are used for studying hard 2-and 3-prong type jets, respectively. To simulate high p T hadronic W decays, W bosons are generated to decay exclusively into a W and Z boson which subsequently decay into quarks and leptons, respectively. The p T scale of the hadronically decaying W is set by the mass of the W which is tuned to 800 GeV for this study so that the p W T 400 GeV. In this p W T range, the W decay products are expected to merge within a cone of R 1.0 where ∆R 2 = ∆φ 2 + ∆η 2 ∼ 4m 2 W /p 2 T,W . A sample enriched in 3-prong type jets is generated with Z → tt, where the Z mass sets the energy scale of the hadronically decaying top quarks. In this analysis, we use m Z = 1.0 TeV, which sets p t T 500 GeV. To study the impact on signal versus background, QCD dijets are generated with a range ofp T that is approximately in the same range as the relevant signal process. In all distributions, the QCD p T spectrum is weighted to exactly match that of the signal to control for differences between signal and background due only to the p T spectrum differences. Pileup is simulated by overlaying additional independently generated minimum-bias interactions with each signal event. For the rest of this section, the number of pileup interactions n PU = 0. See Sec. 5 for studies of n PU > 0.
For a comparison to fuzzy jets, anti-k t jets are clustered using FastJet [27] 3.0.3. The signal processes are chosen such that jets with radius parameter R = 1 are most appropriate in capturing the decay products of the heavy particles. The anti-k t jets are trimmed [28] by re-clustering the constituents into R = 0.3 k t subjets and dropping those which have p subjet Anti-k t jets are also used to seed the fuzzy jet clustering; the p T threshold for this initialization is 5 GeV 5 , and the impact of this choice is studied in Appendix C.
To model the discretization and finite acceptance of a real detector, a calorimeter of towers with size 0.1 × 0.1 in (y, φ) extends out to y = 5.0. The total energy of the simulated particles incident upon a particular cell are added as scalars and the four-vector p j of any particular tower j is given by (4.1) To simulate a particle flow reconstruction, the sum in Eq. (4.1) contains only neutral particles for |y| < 2.5 and both charged and neutral particles for 2.5 < |y| < 5. Charged particles within |y| < 2.5 are individually added to the list of inputs for clustering, unless they originate from a pileup collision. Anti-k t jet momenta are corrected for pileup on average using area subtraction [20]. The median pileup density, ρ, is estimated by clustering hard scatter particles, neutral pileup particles, and charged pileup particles in the range |y| < 2.5 using k t R = 0.4 jets in FastJet with ghosted areas. A representative event display for a Z → tt event is shown in Figure 3. The top right plot in Figure 3 shows the anti-k t jets with p T > 5 GeV as filled in (partial) circles. The filled area is determined by the jet area and there are deviations from circles only one a low p T jet is close to a higher p T jet. The two top quarks are depicted as red stars, each of which sits at the center of two high p T jets. The top left plot in Figure 3 shows mGMM fuzzy jets. The fuzzy jets are depicted by their 1-σ contours. In contrast to the anti-k t jets, fuzzy jets vary widely in radial size. Gray crosses in the top left plot indicate the locations of the anti-k t jets shown in the top right plot. The long tail of the crosses point toward the fuzzy jet for which they were the seed. The two jets closest to the top quarks did not move a long distance from the seed location, though the size did change significantly from R = 1. The lowest p T fuzzy jet moved a long distance from the seed to the final location.
Another new feature of fuzzy jets compared to anti-k t jets is that they can overlap with each other. This is seen by the four jets with overlapping 1-σ contours in the top left plot of Figure 3. Overlapping mGMM jets are an expression of structure inadequately captured with a single Gaussian shape. The ability to learn features at different scales in the same event without relying on a size parameter like the anti-k t radius parameter can give mGMM fuzzy jets additional descriptive power over anti-k t and other traditional jet algorithms. This particular event will be used again for reference in Section 5 during a discussion on the performance of the technique in the presence of pileup interactions.

Kinematic Properties of Fuzzy Jets
Jets clustered according to the mGMM algorithm capture similar hard jet locations and jet energy (under HML) as those clustered by anti-k t R = 1. In Figure 4, the p T distribution for the highest p T jets for three different physics processes are plotted as given by anti-k t R = 1.0 and mGMM jets. The anti-k t p T distributions are re-weighted so that all three processes have identical distributions in the left plot. On the right, the distributions are in good correspondence with those in the left plot, though there is a slight shift of the peak. Additionally, the (y, φ) locations of the highest p T mGMM jets are in excellent In the top left plot, gray circles show the location and size of mGMM fuzzy jets after clustering, with the size of the circle indicating 1-σ contours in the detector; the black circle indicates the highest p T jet with HML particle assignment. The small filled colored circles are the particles, with the color and size indicating their energy. In each case, the events have been rotated in φ to place the truth top quark at φ = 3π/2, which is indicated by a red star. Anti-k t jet locations are shown with gray crosses in the left hand plot, the long tail of which points towards the mGMM jet for which it was a seed. In the top right plot, anti-k t R = 1.0 jets passing a 5 GeV p T cut are shown as discs under the particles indicating their active area, with centers the same as the crosses in the left hand side. Shades of gray in the anti-k t discs have no scale and are meant to aid the eye, but go from low p T (lighter) to high p T (darker).
correspondence with the locations of the anti-k t jets as was already discussed in reference to Figure 3.
The mGMM algorithm differs from the anti-k t algorithm in how the size and structure of clustered jets. This was already shown qualitatively in Figure 3: fuzzy jets come in a variety of sizes, and can overlap in complex ways. The matter is further complicated by the choice of particle assignment scheme for defining kinematic properties in the mGMM family of algorithms. The catchment area's volume and shape of a fuzzy jet depends in general on the full set of learned jet locations and model parameters, Σ. In contrast, for anti-k t jets, the catchment area is bounded from above by R and is only smaller when another high p T jet is nearby. The nonlocality of the mGMM clustering model can be observed quantitatively by examining jet mass, given in Eq. (4.2), which is sensitive to the distribution of energy within a jet. The jet mass distributions for both mGMM (HML assignment) and anti-k t jets are shown in Figure 5, with the same p T weighting as in Figure 4. Even though fuzzy jets learn the same core (i.e. p T ) for jets as anti-k t , they do not learn the same mass. The  . The jet p T for the leading anti-k t jet (left) and leading fuzzy jet under the HML particle assignment scheme (right). All the processes are re-weighted so that the anti-k t p T spectra are the same.
white dashed lines in Figure 5 mark the locations of the W boson and top quark masses at about 80 GeV and 175 GeV, respectively [29]. For both anti-k t and fuzzy jets, there are clear peaks at the W mass for the boosted W → qq from W simulated events and at the top quark mass for Z → tt simulated events. However, there are clear differences in the shape of these distributions. The W mass peak for W events is more peaked for fuzzy jets, though there is also a low-mass contribution to the distribution. For Z events, the top quark mass peak is less populated for fuzzy jets, which instead has shifted events to the W mass peak. This often happens when the tree-prong structure is learned by two (overlapping) fuzzy jets. The QCD multi-jet jet mass distribution is also qualitatively different between fuzzy jets and anti-k t jets, with the former shifted to lower values of the mass.

New Information from Fuzzy Jets
The properties Σ of a fuzzy jet can be useful in distinguishing jets resulting from different physics processes. In the simplest realization of mGMM jets already described above, Σ = σ 2 I, where σ is a measure of the size of the core of a jet. Although σ is a simple variable to construct from the wealth of data available after clustering with the mGMM algorithm, it captures at least some of the schematic differences in the likelihood for Z → tt and W → W Z relative to a QCD multijet background (shown below). The left plot of Figure 6 also shows the average σ over all fuzzy jets in an event. The generic fuzzy jet is rather independent of the physics process and tends to be quite large. This is because fuzzy jets capturing hard radiation tend to be small, but most of the fuzzy jets needed to capture the sparse radiation pattern from the underlying event need to be large. In contrast, the σ for the leading mGMM jets are shown the right plot of Figure 6 for each of the three physics processes. As expected, the decay relative size of the highest p T jets depends on the physics process. For the decay of a boosted heavy particle with mass m and p T , the radial size of the decay products scales as 2m/p T and thus since the p T distribution in Figure 6 is fixed, one would expect that the top quark jets have a larger σ than the W boson jets, which are in turn larger than the quark and gluon jets. This is reflected 6 in the three peaks in the left plot of Figure 6. The separation between the three physics processes it not 100% correlated with the naive scaling m/p T of the corresponding leading anti-k t jets. Figure 7 shows that there is a strong positive correlation between σ and the corresponding anti-k t mass over p T as expected. There are two peaks in the correlation for the Z → tt events because the anti-k t mass spectrum has peaks at both the top mass, and the W boson mass. While the correlations between the fuzzy jet σ and the anti-k t m/p T are non-negligible, they are far from unity and thus there may be additional information contained in the fuzzy jet σ that is useful for tagging the flavour of a jet.

Fuzzy Jets for Tagging
In this section, σ is compared with another class of jet substructure variables known to be useful for tagging: the N -subjettiness ratios [30]. N -subjettiness moments are defined over a set of N axes 7 , and calculated as: where d 0 is the normalization and R 0 is the radius of the jet. In practice, the useful variables for determining how much more i-pronged a jet is compared to j-pronged are the N -subjettiness ratios: The variable τ 21 is often used for the separation of W from QCD jets [31,32] and is a measure of the compatibility of a jet with a 2-prong hypothesis compared to a 1-prong hypothesis. Low value of τ 21 indicates that the jet likely has a 2-prong structure. Similarly, τ 32 is useful for top tagging in that it measures whether a 3-prong structure is a better description of a jet relative to a 2-prong structure. The rest of his section contains comparisons of the performance of σ relative to τ 21 for separating W from QCD jets, as well as σ relative to τ 32 for tagging Z → tt amongst a QCD jet background. In Figure 8, a k-nearest neighbors classifier was trained with 2fold cross validation in TMVA [33]. The left plot in Figure 8 demonstrates an increase in performance for discriminating Z → tt from QCD relative to using τ 32 alone. The fuzzy jet σ is roughly equally useful to the N -subjettiness ratio at a sigma efficiency of 0.85, and using both variables greatly improves background rejection. Similar results can be seen in the right plot of Figure 8, where σ boosts background rejection relative to τ 21 alone. In each case, the training and classification was performed in a mass window around the particles of interest, the top quark mass in the Z → tt sample and the W boson mass to discriminate W → qq from QCD.
The comparisons of the fuzzy jet σ and N -subjettiness are intended to be an illustrative example. As discussed in the opening of this section, σ is just one variable that can be constructed by using mGMM clustered jets. Expanded studies of the various learned parameters could come up with additional variables, or the full learned parameter set could be thrown into an off the shelf classifier or machine learning model.

Underlying Event and Pileup
As with any new jet algorithm or jet variable, understanding the effect of pileup vertices from additional proton-proton collisions is essential to make meaningful statements about how the method will be applicable to real data analyses at the LHC. Studying pileup in the context of mGMM jets is complicated by the effective catchment area of the jets. For hierarchical-agglomerative algorithms like anti-k t , the catchment area scales with the radius parameter. However fuzzy jets can have infinite catchment area because the likelihood for particle membership is nonzero for any finite distance and arrangement of Gaussian jets and particles. Furthermore, the catchment area can change depending on the other jets in an event. Although this effect also occurs in the hierarchical-agglomerative case, the effect is much more pronounced in the mGMM clustering algorithm, with some jets having finite catchment areas while others cluster infinite area. The challenge of pileup for fuzzy jets is illustrated in Figure 9, where the same event is shown with n PU = 0, and with n PU = 40. The event displays show the central region of the detector, where most of the decay products of the hard scatter lie. Qualitatively, it can be seen that the introduction of additional interaction vertices broadens all of the mGMM jets. This broadening clearly impacts the power of σ for differentiating QCD background from signal processes. The next sections explore two methods for mitigating the impact of pileup in relation to fuzzy jets, illustrated with the variable σ.

Changing α for Pileup Suppression
In Section 2, it was discussed that choosing α = 1 in the likelihood (Eq. (2.3)) guarantees IRC safety. With α = 1, the mGMM algorithm treats hard structure and soft structure linearly in the particle or tower p T . However, one can exploit the fact that σ is disproportionately a measure of the shape and extent of the leading jet hard structure to make the variable more resilient to the effects of pileup. In particular, choosing α > 1 stabilizes σ at high n PU because so long as the average input particle p T due to pileup is significantly smaller than the p T of the particles constituting the leading jet hard structure, the change in likelihood will be suppressed roughly according to (p T,hs /p T,PU ) α . An example of this effect is illustrated in Figure 10, which shows the same event as in Figure 9. The price for adjusting α is the loss of collinear safety. Varying α is not explored further, as Section 5.2 demonstrates a method for dealing with pileup effectively that does not rely on moving α away from the IRC safe value of one. There is little broadening between the n PU = 0 (left) and n PU = 40 (right) cases, but jets at the locations of the tops in the event are substantially narrower than in the case where α = 1, even with n PU = 0 (compare to Figure 9). Under the ML particle assignment, the α = 2 algorithm identifies the other top as the highest p T jet in the event, demonstrating the difficulty in dealing with fuzzy jet kinematics.

Tower Subtraction and the Event Jet: Effective Pileup Correction
Recent developments in pileup mitigation have led to several algorithms for correcting jet inputs before jet clustering beings. Such techniques include Pileup Per Particle Identification (PUPPI), Constituent Subtraction, and SoftKiller [34][35][36]. One simple input-correction scheme is to subtract from each calorimeter tower the estimated pileup p T density per unit area multiplied by the size of the tower in the detector. As a first step, ρ is calculated in the same way as described in Sec. 4.1. Tower momenta are then corrected according to Eq. (5.1), where p T,s is the corrected momentum, p T,o is the original momentum, and A is the area of the tower. In this study, all towers have area 0.1 × 0.1 in y-φ space.
While subtracting the average p T background from towers before clustering is a relatively safe way of reducing the effect of pileup, at least when the p T scales of the tower to tower fluctuations are small compared to the hard scatter p T scale, it would still be helpful to systematically address the question of catchment areas. The mGMM clustering algorithm provides a natural framework in which to think about pileup, however, because the algorithm deals fundamentally with likelihoods, and the pileup likelihood is to leading order uniform over the detector (this is the motivation for the area-subtraction technique). This is the motivation for modifying the mGMM likelihood using a technique we call the event jet.
In addition to learning k mGMM jets throughout clustering, the event jet includes another background contribution to the likelihood which attempts to capture the intuition of a uniform contribution of particle likelihood due to pileup. Constraints are further imposed on the likelihood on the event level jet so that it has constant likelihood during the clustering process, making the necessary modifications to the algorithm procedures simpler.
Practically, the effect of the event jet can be parameterized through the introduction of an algorithmic parameter γ. Particle membership probabilities change according to Eq. (5.2) with corresponding changes to the analytical M step for the Gaussian kernel type. The choice of γ is important, and it should reflect the fact that not all events are created equal in the sense that not all events have the same contributions due to pileup. Although there is no strict way of dealing with this issue, it is reasonable to replace γ by a meaningful combination of parameters which is sensitive to our estimates of the amount of pileup in a particular event. We have chosen to take γ = ρAγ w where ρ is our estimate of the p T density due to pileup, A is the calorimeter area, and γ w is a parameter of the algorithm controlling the strength of the event jet. Initial studies with the event jet indicate that introducing a ρ dependent γ is much more effective than a ρ independent one.
Studies of the pileup conditions similar to LHC Run I, with ∼ 20 pileup interactions, indicate that with a 5 GeV p T cut, γ w = 0.01 provides reasonable stability of the learned σ. This is demonstrated qualitatively in Figure 11, in which the tower and event jet corrections are applied to the same event shown in Figure 3 at both n PU = 0 and n PU = 40. Unlike any of the methods discussed previously, this method for correction maintains IRC safety, demonstrates very little jet broadening at n PU = 40, and is not drastically different in its qualitative features by comparison to the standard mGMM algorithm. Note that the assignment of towers to jets under the HML scheme is impacted with the event jet because many towers belong to the event jet with higher probability than any of the other fuzzy jets.
To preserve tower-to-jet assignments under pileup, a smaller value of γ w should be chosen. The event jet is useful instead because it changes the dynamics of clustering, making jets less sensitive to soft radiation far away from the jet axis during the EM update steps, and therefore increasing the stability of the hard core that is eventually clustered. A quantitative study of the pileup mitigation suggested qualitatively by Figure 11 requires an ensemble of events. Figure 12 shows how the mean and standard deviation of learned σ evolve with n PU . The uncorrected σ is shown in red downward pointing triangles while the tower subtraction and event jet corrections are shown in blue upward pointing triangles. For both Z → tt and QCD, the pileup dependence is dramatically reduced with the tower subtraction and the event jet. The uncorrected mean σ increases as a function of n PU as all of the fuzzy jets become the same size. The standard deviation of the uncorrected σ actually decreases beyond n PU ∼ 5 as all of the fuzzy jets become the same size. For modest levels of pileup, tower subtraction and event and the event jet maintain the mean and standard deviation of the σ distribution.

Conclusions
The modified mixture model algorithms provide a new way of looking at whole event structure. In contrast to the usual uses of hierarchical-agglomerative algorithms like anti-k t , the number of seeds is fixed ahead of time and their properties are learned during the clustering process. The learned parameters provide a new set of handles for distinguishing jets of different types. Even simple variables constructed out of the learned parameters of a mixture of isotropic Gaussian jets, like σ, offer complementary information to the n-subjettiness variables τ 21 and τ 32 for tagging W boson and top quark jets. Even though the variable σ is sensitive to changes in pileup conditions, small modifications to the fuzzy jets algorithm -correcting jet inputs and adding a pileup likelihood -can mitigate the impact of pileup.
Fuzzy jets are new paradigm for jet clustering in high energy physics. These IRC safe likelihood-based clustering schemes set the stage for many possibilities for future studies related to jet tagging, probabilistic clustering, and pileup suppression.

Acknowledgments
We would like to thank Jesse Thaler for useful discussions and helpful feedback on the manuscript. In addition, we thank Gavin Salam for useful comments on the algorithm description. This work is supported by the US Department of Energy (

A Wrapped Gaussian
In the EM algorithm described in Sec. 3, there are explicit (and implicit) dependencies on the topology. For instance, if a Gaussian density is used to model φ, then, in the E step, a particle with φ i near 2π will be deemed far from a cluster with location φ j near 0. To avoid this undesirable behavior and enforce the equivalence of the angles 0 and 2π, we associate φ with a wrapped Gaussian density and y with a standard Gaussian density: where Φ y is a normal distribution and µ φ (I) = µ φ + 2πI. In order to approximate the sum in Eq. (A.1), we take only the leading contribution by choosing µ φ (I * ) for I * = argmin I |φ − µ φ + 2πI |. As other contributions are exponentially suppressed, this is a good approximation and recovers continuity near 0 and 2π. Figure 13 illustrates the improved clustering behavior that results when φ is modeled using the wrapped Gaussian approximation in place of the standard Gaussian density.  Figure 13. A three-particle event display illustrating the results of fuzzy jet clustering using a Gaussian density for φ (left) and a wrapped Gaussian density approximation for φ (right).

B The EM algorithm
This appendix contains two derivations: the modified EM algorithm updates in Eq. (3.2) and the proof that the modified EM algorithm generically improves the original modified log likelihood Eq. (2.3) with every iteration. Recall the expected modified complete log likelihood (mmCLL) from Eq. (3.1): Viewing the mCLL as a function of µ, Σ and π for fixed λ and ρ we can maximize. For π, we optimize where the last term is needed so that the optimal π * is a probability. The derivative of this expression with respect to π j is and then summing the equation over j and using k j=1 q ij = 1 and the constraint equation k j=1 π j = 1, we find that The updates for µ and Σ follow from the standard derivation (by similarly taking derivatives of the mCLL with respect to components of these multi-dimensional objects) by noting that the only difference is that q ij → q ij p α T i and there are no Lagrange multipliers needed unlike for π * j . Finally, we prove the claim that the modified EM algorithm described in the body of the text monotonically improves the modified log likelihood in Eq. (2.3). First, we note that we can rewrite the (log) likelihood as C Controlling Jet Multiplicity with p T In contrast to most uses of hierarchical-agglomerative clustering algorithms, the number of fuzzy jets is fixed before clustering begins. Whereas a single traditional jet can reasonably be considered to correspond to a parton in appropriate cases, mGMM jets should not be, as several mGMM jets can together express structure of what would be one or several jets according to another algorithm. The choice of the number of jets used in mGMM jet clustering therefore controls the expressive power of the algorithm to look at the event structure. In practice, choosing too many jets does not greatly affect the value of the leading learned σ variable, because the additional jets learn finer features of the event structure. On the other hand, choosing too few jets is often problematic as can be seen in Figure 14 -the fuzzy jets need to grow in order to cover the full energy distribution in the event. Using anti-k t jets as seeds for fuzzy jets has the feature that the number of fuzzy jets change dynamically with the complexity of the event. The algorithm is not very sensitive to the exact locations of the anti-k t jets -studies which randomly perturbed the initial jet locations inside a disc of radius 1.0 found that σ was robust to such fluctuations, even on an event by event basis. However, the p T threshold for the seed anti-k t jets can have a significant impact on the fuzzy jets as this alters the number of seeds. The p T threshold for the anti-k t seeds is typically lower than the p T threshold one would use to consider anti-k t jets alone because the fuzzy jets algorithm needs enough seeds to populate the low energy regions of the detector. One way of mitigating the impact of the p T cut on the fuzzy jet clustering is to introduce an event jet, described in Section 5.2.  Figure 14. Changing the choice of the p T cut used to select seeds can make a vast difference in the values of the constructed variables, like σ. In this event, clustered on the left with a cut of 5 GeV resulting in five jets, and on the right with a cut of 50 GeV resulting four jets. Fewer degrees of freedom in the four jet case means a much larger learned value for the σ variable.

D A Leading Order Description of Fuzzy Jet σ
We have seen in Sec. 4.4 that the fuzzy jet σ is correlated with ρ = m/p T . We can build some intuition for this relationship by considering a leading order QCD calculation of σ.
Consider an isolated quark jet with energy E which radiates a gluon with angle θ 1 from the jet axis and with energy fraction z 1. Without loss of generality, suppose the quark is moving in the φ = 0 direction and the splitting happens in the φ = π/2 direction so that the four vector of the quark is q µ = E(1 − z)(1, 0, 0, 1), and the gluon four-vector is g µ = Ez(1, θ, 0, 1), to leading order. To this order, the jet mass is simply m = Ezθ 2 . What is σ? Consider k = 1 and something like the event-jet applied so that we can treat this jet in isolation from other hadronic activity in the event. Since k = 1, the soft memberships are all one, i.e., q i1 = 1 and there is only one step of the EM algorithm. The anti-k t jet has (y, φ) coordinates (0, θ), which could be used for the seed, but since k = 1, the seed is not used. The quark has coordinates (0, 0), and the gluon has coordinates (0, θ). We can compute the fuzzy jet coordinates in the (single) M step: Therefore, to leading order and k = 1, the learned σ is the jet mass. For k = 2, there are enough degrees of freedom to resolve the substructure of the hard splitting and so the relationship between the jet mass and σ breaks down.