Jet-images: computer vision inspired techniques for jet tagging

We introduce a novel approach to jet tagging and classification through the use of techniques inspired by computer vision. Drawing parallels to the problem of facial recognition in images, we define a jet-image using calorimeter towers as the elements of the image and establish jet-image preprocessing methods. For the jet-image processing step, we develop a discriminant for classifying the jet-images derived using Fisher discriminant analysis. The effectiveness of the technique is shown within the context of identifying boosted hadronic W boson decays with respect to a background of quark- and gluoninitiated jets. Using Monte Carlo simulation, we demonstrate that the performance of this technique introduces additional discriminating power over other substructure approaches, and gives significant insight into the internal structure of jets.


Introduction
Modern TeV scale colliders, such as the LHC [1], are capable of producing highly boosted jets from the decay of heavy particles.The large momentum and mass of the heavy particle may lead to a boosted system in which the heavy particle's decay products merge into a single jet.In order to discover and study such systems, it is vital to discriminate boosted heavy particle jets from those of QCD initiated quark/gluon jets (which we will refer to as QCD jets for the remainder of the text).The techniques of jet substructure, which exploit the internal structure of the jet, have been developed for this task.Contemporary experimental and theoretical work in the field of jet substructure includes, but is not limited to, subjet finding and jet tagging / jet flavor identification algorithms [2][3][4][5][6][7][8][9][10][11][12], jet grooming and pileup mitigation [13][14][15], template and matrix element techniques [16,17], energy flow and event shapes [18][19][20][21][22][23][24], and probabilistic jet clustering [25,26]; more inclusive summaries can be found in references [27][28][29][30] Typical approaches to jet flavor identification and jet tagging begin with a simplified analytic expression of the system's behavior, such as the two-pronged nature of the hadronic W boson decay, or the large particle multiplicity of gluon initiated jets relative to quark initiated jets.A variable which capitalizes on this analytic knowledge and is ideally not subject to large theoretical uncertainties is derived.In contrast, our method does not rely on an analytic description of the system, such as colorflow information, but instead uses a catalog of jets produced by Monte Carlo simulation as a training sample to learn the discriminating features between several categories of jets.The method begins by putting jets through a series of preprocessing steps to build a consistent representation of the jets.
After preprocessing, we employ a fast, linear, and powerful method for feature extraction and physical interpretation 1 .
Our methods for preprocessing and discrimination are inspired by techniques in the field of computer vision.Most notably, our task is similar to that of facial recognition, and we have developed jet-specific analogs to the algorithms used in facial recognition.As such, the work presented here serves to both build a connection between the fields of jet substructure and computer vision, and as an introduction to the use of computer vision techniques in the analysis of hadronic final states.The basic algorithm presented here is thus serving as a general proof of principle, and can be more deeply developed in future studies.
In Section 2 we present the"jet-image" algorithm, which both describes the definition of the jet-image, the jet-image preprocessing steps, and the jet-image processing (that is, the methods of discrimination).Section 3 describes the samples used for our performance studies of the algorithm, and Section 4 presents the case study where the algorithm is employed for discriminating hadronically decaying W jets from QCD jets and is compared to other jet substructure techniques.In Section 5 we discuss the information that can be gained from our approach, and further possible application and developments.

Algorithm
All operations described bellow are performed in Python (v2.7.2) using the Numpy [31] and Scipy [32] libraries.In addition, many of the classifiers are built using Scikit-Learn [33], and the figures are made using Matplotlib [34] and ROOT [35].

Jets as Images
For this study, we restrict the inputs to jet reconstruction strictly to calorimeter towers and, with little loss in generality, approximate the calorimeter as a single layered grid of towers 2 with spacing ∆η × ∆φ = 0.1 × 0.1 spanning [−2.5, 2.5] 3 in η and [0, 2π] in φ.Using common jet-finding algorithms [38][39][40][41][42], we identify the angular location of the jets (as the jet axis) and save all towers within a 2R by 2R square, centered on that location for further examination.This yields the (2 R 0.1 + 1) × (2 R 0.1 + 1) grid 4 of towers which we denote the jet-image, an example of which can be seen in Figure 1a.Henceforth, we will refer to calorimeter towers which are included in a jet-image as pixels.It is important to note that with this jet-image representation the similarity of two jets can be established using the dot product between two jets.That is, an N × N pixel jet-image A can be recast into a N 2 dimensional vector Ā, where the elements A i,j = Āi+N×(j−1) with i, j ∈ {1...N }, and the dot product between jet-images A and B is While this study focuses on jet-images built from calorimeter towers, any inputs which can be mapped to a pixelated image could be used to build jet images.For instance, jetimages could be build from tracks, calorimeter clusters, truth particles, etc., all of which could in principle use much finer pixel granularity due to the higher resolution of the inputs.Once the jet-images are built from the inputs, the algorithms described in the proceeding text are directly applicable.
Jet-images have several properties that make them useful inputs to jet flavor tagging techniques.First, each image has the same dimensionality-every jet can be described by a fixed set of numbers.This is an important property for the application of image processing techniques and for training classification algorithms.Second, as a low-level jet description, the jet-image uses all available information5 for later discriminatory techniques rather than compressing the information into a set of derived variables.Third, the similarity between two jets can be easily and quickly computed using standard linear algebra computations.While the concept of analyzing jets as images is not new (see for instance [43][44][45][46]), this paper presents a computer vision inspired framework for processing and interpretation of jet-images.
Having re-cast the representation of the calorimeter information, we introduce the jetimage preprocessing steps and the discrimination techniques used for flavor identification.

Jet-Image Preprocessing
Approaches to facial recognition (see [47] and references therein for review) attempt to classify a face in an image by learning the structure of the face, i.e. learning the expected distribution of pixel intensities from a set of example images.The ability to accurately recognize a face in an image can be sensitive to lighting conditions, intensity, shadows, the orientation of the faces, facial expressions, noise, etc.These features introduce large variations in the pixel intensities but are not correlated with different categories and thus obfuscate the underlying discriminatory distributions.Image preprocessing steps are used to standardize the images and greatly improve later classification performance.
Analogously, our approach to jet tagging attempts to learn the structure of the transverse energy distribution of the pixels of the jet-image.The use of preprocessing is paramount to accurate jet-image classification, and as such we have developed a set of preprocessing steps specifically for this context.Many of these steps have parallels in facial recognition, which should become evident from the description.The jet-image preprocessing proceeds as follows: 1. Noise Reduction: The effects of noise in the jet-image from pileup are reduced using trimming [13].Trimming is used due to its simplicity, the fact that it uses subjets which are used a later stages in the algorithm, and because it is seen to greatly reduce the impact of pileup on the algorithm performance.Different jet grooming techniques (see [29,42] and references therein) which mitigate the effects of pileup could also be used.

Point of Interest Finding:
The positions of the leading regions of transverse energy deposition are located using the subjets resulting from the trimming procedure.Subjets are ordered by p T to decide the relevance of the points of interest.The number of subjets required depends on the alignment procedure (step 3), but at least one subjet must be identified.This step, in the case of a 2-prong decay such as a W-jet, is similar to locating the eyes in an image of a face.

Alignment:
The jet-image is aligned such that the relative pixel location of primary features are always the same.This step exploits symmetries of the η − φ space and removes variations in jets which do not improve classification accuracy.This is analogous to aligning the eyes of a face within an image to always lie in the same pixels.There are three components to jet-image alignment: • Rotation: Rotation is performed to remove the stochastic nature of the decay angle relative to the η − φ coordinate system.This alignment can be done very generally, by determining the principle axis [48] of the original image and rotating the imagine around the jet-energy centroid such that the principle axis is always vertical.Alternatively, process specific information can be used.For two-body decay processes (such as the hadronic decay of a W boson) the direction connecting the axes of the leading two subjets can be rotated until the leading subject is directly above the subleading subjet.However using process specific information can lead to acceptance loss, for example when the two decay products merge into a single subjet.
• Translation: Once rotated, the jet-image is translated such that the jet-image energy centroid or leading subjet is always centered in the same pixel.This procedure is paramount to adjusting the position of the eyes in a picture, anchored so as to always appear in the same pixels.
• Reflection: Once translated, the image is reflected over the vertical axis such that the side of the image with maximal transverse energy always appears on the right side of the image.This ensures that the hardest radiation always appears in similar locations and can be exploited by the training procedure for classification.
4. Equalization: Jet-images are normalized such that the dot product of a jet-image with itself (i.e. the sum of squared pixel contents) is equal to one.This removes the absolute energy scale dependence of jet-images, thus allowing for comparisons of jet-images with different energies.This step is analogous to the standardizing the lighting conditions of images.
5. Binning: In many cases, the expected jet-images may vary significantly with a known variable; in this case, the variable can divide a class of jet-images into a set of sub-classes with more uniform jet-images.For instance, if the total transverse energy of the jet-image or the ∆R between the subjets causes significant variations, jet images can be binned into different ranges of the variable.This is analogous to separating images based on the facial expression.A different discriminant can then be trained separately for each sub-class.An example of the image preprocessing with jets from hadronically decaying W bosons can be seen in Figure 1 plotted using the η and φ coordinates of the pixels relative to the jet axis (before rotation) and using the rotated coordinate system Q 1 and Q 2 after rotation.The average of a large sample of W jet-images without preprocessing can be seen in Figure 1d, where there is no clear sign of the two-prong decay structure of the W jet. Figures 1a, 1b, and 1c illustrate the different stages of preprocessing on a single W jet-image.Finally, Figure 1e shows the average of the same large sample of W jet-images after preprocessing with the two-prong structure of the W decay clearly visible.
This preprocessing prescription is generic enough to be applied to most jet flavor identification problems.However, the impact of the preprocessing steps on the system under consideration must be carefully considered and in some cases tailored.For example, while trimming, or more generally jet grooming, is important for LHC experiments which expect large amounts of pileup, it could be dropped for use in a cleaner experimental environment.Similarly, the rotation steps can be adjusted depending on the system under study (i.e.two-prong decays or three-prong decays).

Jet-Image Processing: Constructing the Discriminant
Having preprocessed the jet images, we have a representation which can be thought of as a vector specifying a coordinate in a high-dimensional space.Linear classification of the jets is performed by projecting the vector onto the discriminate direction which ideally maximizes the separation between different classes of jets.
A number of linear classifiers can be used for discrimination.In facial recognition, Principle Components Analysis (PCA) [49] and Fisher's Linear Discriminant (FLD) [50,51] are common choices and their specified directions are known as eigenfaces and fisherfaces [52] respectively.While both methods have been investigated within the context jet-images, FLD is observed to perform significantly better and thus this paper focuses on FLD exclusively.FLD identifies the plane in the high dimensional feature space which maximizes the separation between the jet classes and simultaneously minimizes the scatter within each jet class 6 .Since FLD uses knowledge of the within-class variations, it is less influenced by variations present in both classes than PCA.
FLD is trained using a set of preprocessed example jet-images from two classes and produces a discriminant F , which we denote the Fisher-jet, that has the same dimensionality as the example jet-images and thus can be viewed as a jet-image itself.Discrimination between classes for a jet-image, A, is then achieved by projecting A onto the Fisher-jet.That is, where D[A] is the discriminant output for the jet-image A, and F and Ā are the vector representations of F and A, respectively, as discussed in section 2.1.FLD produces a robust classifier which is fast to apply, fast to derive, and can itself be interpreted as a type of image.Therein lies much of the technique's power.For any system under study, the discriminant reduces the dimensionality of the problem, such that the properties of a jet can easily be expressed numerically and a cut on the discriminant value D can be used to separate classes.Since the discriminant output is computed by performing a projection of a jet-image onto the Fisher-jet, positive values in the Fisher-jet identify features which are indicative of a jet of class A whereas class B is highlighted by negative coefficients.The magnitude of the coefficient indicates the classification strength of the feature.To understand the behavior of the discriminant, the Fisher-jet image reveals what differences are being exploited in a simple and intuitive way.This is an important aspect of the technique.Such a visualization can be used to convince ourselves that the algorithm is detecting relevant differences between the two classes.Furthermore, the ability to visualize the classifier means that this approach is also useful as a step in exploratory analysis, and could be used to guide the construction of top-down analytical solutions to a problem.This aspect of our technique contrasts with widely used multivariate techniques in highenergy physics, such as neural networks and boosted decision trees, which predominantly use variables derived from the jet's kinematics and substructure rather than directly using the jet constituents.

Samples
Monte Carlo samples are produced with Pythia8 (v8.176) [53], MadGraph (v1.5.11) [54], and Herwig++ (v2.6.3)[55], with proton-proton collisions at √ s = 8 TeV.In all cases, the effect of pileup from multiple collisions in one crossing of the collider beams is simulated by adding the energy of particles from additional Pythia8 "minBias" interactions.The number of additional pileup interactions per bunch crossing is a random variable sampled from a Poisson distribution with mean µ.In this paper we consider the cases of no pileup and µ = 30.
The signal sample used for these studies is di-boson W W production, with one W decaying to a muon and neutrino, and one W decaying to a pair of quarks.The matrix element and shower for the boosted W samples are calculated with Pythia8.The boost of the system is controlled by applying a loose cut on the matrix-element pT , followed by a harder cut on the leading reconstructed jet p T such that it falls within a 50 GeV-wide p T window (e.g.200 < p T < 250 GeV).The minimum and maximum pT was set 50 GeV below and above the jet p T interval being studied.The corresponding light and gluon jet background is simulated using Pythia8 with W + jets, where the W decays to a muon and a neutrino.Again, a loose cut on pT is used to set a generator-level boost for the system, followed by a harder cut on the leading reconstructed jet.Similar samples are generated with Herwig++ for validation studies.
Jet finding is performed with Fastjet (v3.03) [56] on the massless four-vectors defined by the calorimeter towers.For each generated event, the towers are clustered using the Cambridge/Aachen algorithm [40] with a radius R = 1.2 and only jets with p T > 25 GeV are kept.We apply trimming with k T R = 0.3 subjets and f cut = 5% to reduce the impact of pileup interactions.This serves as the noise reduction step introduced in Section 2.2, and no additional pileup subtraction was performed.

Case Study: W Boson Jet Tagging
In this section we demonstrate the performance of this technique in classifying boosted W → qq jets versus QCD jets.This serves both as an illustration of the power of the technique, and as a pedagogical example.
The inputs are grids of 25×25 pixels (i.e the jet-images of Section 2.1), centered around the reconstructed jet axes, as described in Section 3.Only the leading jet in each event is used and it is required to have mass M ∈ [65, 95] GeV.For image preprocessing, we use trimming to reduce noise, the leading 2 subjets to align the image, and translation (based on the jet-image energy centroid), reflection, and equalization to uniformize the jet-images.
After preprocessing, it is observed that the jet-images varied significantly both with the jet transverse momentum (p T ) and the ∆R between the leading two subjets (∆R jj ).Neither of these variations are surprising, since jet p T differences cause wide variations in the energy distribution within the jet and differences in ∆R jj lead directly to differences in the separation between features in the jet-image.As a result, the jet-images are binned in both p T and ∆R jj .We use 50 GeV bins for jets with p T ∈ [200, 500] GeV, and the ∆R jj binning of each p T bin is (∆R jj < 0.4, ∆R jj ∈ [0.4,0.6], ∆R jj ∈ [0.6, 0.8], ∆R jj ∈ [0.8, 1.0], ∆R jj ∈ [1.0, 1.2], ∆R jj > 1.2).The ∆R jj binning is chosen such that resolvable differences in subjet position on the order of the subjet radius would not end up in the same bin and smear the jet images, but different optimizations could be explored 7 .
After preprocessing both the W and QCD jets, we train a FLD using these samples as signal and background respectively.A separate FLD is trained for each bin of (p T , ∆R jj ).It should be noted that equal numbers of W and QCD jets are used in the training of each FLD, as this was seen to produce better results than using different numbers of events in each sample (or equivalently different priors for each sample).
The resulting discriminant Fisher-jet, recast to a 25×25 image, can be seen in Figure 2a for a single bin (p T ∈ [250, 300] GeV, ∆R jj ∈ [0.6, 0.8]), while the discriminant output when this Fisher-jet is applied to samples of W and QCD jets can be seen in Figure 2b.Jet images of several bins of p T and ∆R can be found in Appendix A. As noted earlier, the sign of the pixel value in the Fisher-jet indicates which class the pixel helps identify, as seen in Figure 2a where red pixels indicate importance for W-jets while blue features imply importance for QCD jets.We can see in the figure that the presence of a hard primary subjet located at (Q 1 ∼ 1.5, Q 2 ∼ 1.25) is almost irrelevant for discrimination, as both classes have hard primary subjets.However, the presence of a second hard subjet near (Q 1 ∼ 0.75, Q 2 ∼ 1.25) is a strong indicator of a W-jet.Finally, we see that energy appearing around the two subjets is a strong indication of a QCD jet, which is essentially telling us that QCD subjets tend to be wider and have radiation surrounding the subjets.
The background rejection vs. signal efficiency curves for the FLD, computed using the 1-D likelihood ratios of the output distribution of the FLD for W-jets and QCD jets, can be seen in Figure 3a, along with the rejection vs. efficiency curves observed when using N-subjettiness (τ 2 /τ 1 ) [7,8] computed analogously with the 1-D likelihood ratios.For the rejection vs. efficiency curve in Figure 3a Fisher-jets are trained on jets satisfying p T ∈ [250, 300] in 6 bins of ∆R jj , and a combined 1D likelihood ratio distribution is computed by taking the likelihood ratio for each jet computed with respect to appropriate ∆R jj bin and merging these likelihood ratio values into a single distribution.The Nsubjettiness distributions are not binned in ∆R jj as this did not show any improvements in performance.Figure 3b shows the efficiency of W jets at a fixed QCD jet rejection of 10 as a function of jet p T for the FLD (combining the 6 bins of ∆R jj for each jet p T bin) and for N-subjettiness.It can be seen that FLD outperforms N-subjettiness for the full range of jet p T examined.
It should be noted that the output of FLD and N-subjettiness are correlated, as shown in Figures 4a and 4b for W and QCD jets respectively, with a correlation coefficient of approximately 0.7 for both W and QCD jets.Thus, the Fisher-jet approach is able to combine in a linear way the information comprising the jet effectively, and capture much of the information of N-subjettiness and more.On the other hand, mass, which relies on quadratic relationships between the inputs, is a simple quantity which FLD does not reproduce, as shown in Figures 4c and 4d for W and QCD jets respectively.Since the Fisher-jet output is only slightly correlated with mass, with a correlation coefficient of approximately -0.25 for both W and QCD jets indicating a small degree of anti-correlation, the performance of the classifier does not change dramatically whether it is applied to a small window around the W mass, or to a sample without jet mass cuts.
To investigate the effect of pileup, which essentially acts as a source of noise within the jet-image, the Fisher-jets are trained on samples without pileup and subsequently applied to statistically independent samples with pileup8 .No significant degradation in performance

Background Rejection
Sig Eff @ Bkg Rej 776% @ x2 319% @ x10 196% @ x20 60% @ x100  is observed, likely due to the application of trimming.More in depth studies of pileup impact on the Fisher-jet approach are left for future studies.
To check the generator dependence of the classifier, a second sample of W and QCD jets are generated using Herwig++ which implements an independent description of the hard sub-process and subsequent showering.The Fisher-jet trained on the Pythia8 samples is then applied to the Herwig++ samples.The Fisher-jet discriminant output distributions for both W-jets and QCD jets from both Pythia8 and Herwig++ are shown Figure 5 for jets with p T ∈ [200, 250] GeV and ∆R ∈ [0.6, 0.8], and are seen to be extremely similar between both generators.
underlying physics (in the parlance of computer vision, these are the clearest pictures without noise) and are observed to give the best discrimination performance even in samples with noise due to pileup.

Conclusion
We present a generic algorithm to extract discriminating information between different classes of jets using parallels to techniques used in the field of computer vision.The algorithm uses a representation of jets as images, applies preprocessing techniques to construct a consistent set of jet-images, and applies a Fisher linear discriminant, which has been trained on a collection of example jets, to process the jet-images.The application of the Fisher discriminant projects different classes of jets to opposite ends of the discriminant output spectrum, thereby allowing for separation of the classes.The resulting algorithm is a linear function of the inputs and is easy to understand, with little prior knowledge needed in the tuning of the discriminant and does not rely on an analytic description of the system studied.This method is seen to be competitive with N-subjettiness when separating hadronically decaying boosted W bosons from QCD jets.Importantly, the resulting algorithm can be used as a tool to visually discover differences in samples with no complete analytical description.The jet-images and the Fisherjet can be trivially visualized, allowing one to study the important features of different jet classes and to gain insight into the features which are most discriminating between different classes.In this way, the method provides a starting place to provide simple answers to high dimensional problems.
This computer vision inspired approach stands out in its flexibility and its ease of inspection.While this study focuses on W vs QCD jets, initial studies have show that the method can be easily applied to the H → b b system and provide powerful discrimination.The algorithm could be adapted for use in other problems of jet discrimination, such as identifying hadronically decaying tops.More generally, many problems in experimental physics contain a correspondence to the imaging problem discussed here, such as categorizing the ring images produced in Super-K [57], where our methods are likely directly applicable.In addition, since a consistent set of jet-images is produced by the pre-processing steps, non-linear classifiers could easily be used for jet-image processing to improve the discrimination performance.
Jet-images provide a connection between the fields of particle physics and computer vision, and is a departure from the ways that experimental high energy physics currently uses machine learning algorithms.However, this is merely a first step, while many of the techniques of the vast field of computer vision have yet to be explored in the context of particle physics and may prove powerful within this domain.

( a )Figure 1 :
Figure 1: The preprocessing of jet-images and the impact on the average jet-image for W jets in which the leading jet with p T between 200 and 250 GeV.Note that the grid in figure 1c appears shifted down to represent the jet-image before translation, which is subsequently translated such that the leading subjet lies in the location (Q 1 ∼ 1.5, Q 2 ∼ 1.25) as see in the final average jet-image of figure 1e.

Figure 2 :
Figure 2: A Fisher's linear discriminant presented as an image (left) and the distributions of the discriminant output when applied to W-jets and Light-jets (right), when the FLD is trained on jets with p T ∈ [250, 300] GeV, mass M ∈ [65, 95] GeV, and separation between subjets of ∆R ∈ [0.6, 0.8].

Figure 3 :
Figure 3: Background rejection vs signal efficiency curves obtained by training discriminants on samples with p T ∈ [250, 300] GeV (left), and the W-jet efficiency at fixed ×10 QCD jet rejection versus jet p T (right).

Figure 4 :
Figure 4: The Fisher-jet discriminant output for the bin p T ∈ [250, 300] GeV and ∆R ∈ [0.6, 0.8] plotted against: (4a) the output of N-subjettiness for W-jets, (4b) the output of N-subjettiness for QCD jets, (4c) the invariant mass of the jet for W-jets, and (4d) the invariant mass of the jet for QCD jets.

Figure 5 :
Figure 5: The Fisher-jet discriminant output, trained on a sample of Pythia8 W and QCD jets applied to Pythia8 and Herwig++ samples.

Figure 6 :
Figure 6: Fisher linear discriminants presented as jet-images, or Fisher-jets, for several p T and ∆R jj bins for jets with mass M ∈ [65, 95] GeV.