Learning How to Count: A High Multiplicity Search for the LHC

We introduce a search technique that is sensitive to a broad class of signals with large final state multiplicities. Events are clustered into large radius jets and jet substructure techniques are used to count the number of subjets within each jet. The search consists of a cut on the total number of subjets in the event as well as the summed jet mass and missing energy. Two different techniques for counting subjets are described and expected sensitivities are presented for eight benchmark signals. These signals exhibit diverse phenomenology, including 2-step cascade decays, direct three body decays, and multi-top final states. We find improved sensitivity to these signals as compared to previous high multiplicity searches as well as a reduced reliance on missing energy requirements. One benefit of this approach is that it allows for natural data driven estimates of the QCD background.

The search for physics beyond the Standard Model is a central focus of LHC research. The motivations for extensions of the Standard Model are multi-faceted, spanning such diverse physics topics as the identity of dark matter, the radiative stability of the weak scale, the unification of forces, and the origin of the baryon asymmetry of the Universe. Apart from these specific theory inputs, which suggest certain classes of models, there is the generic goal of thoroughly probing the weak scale for new physics-whatever that might be. In order to cover the vast range of possibilities for new physics that open up when the input from theory is loosened, it is imperative for the LHC to carry out an extensive experimental program that is sensitive to the widest possible range of new physics signatures.
Since it is not possible to perform model independent searches for new physics-doing so is limited by both theoretical and experimental systematic uncertainties-it is necessary to design searches for new physics that are targeted at specific experimental signatures. Typically such searches are based on exploiting a single key handle that significantly reduces the dominant Standard Model backgrounds, namely those originating from QCD jet production. For instance, it is common to require hard electroweak particles such as leptons or photons or (large amounts of) missing energy. Requiring b-tagged jets, massive jet resonances or extremely high energy events can provide alternative avenues for parametrically reducing the QCD background. These considerations exist both because of triggering requirements necessary for permanently storing events to tape and because realistic systematic uncertainties in any case limit the degree to which increased integrated luminosity leads to increased sensitivity.
The past several years have seen increasing attention being paid to high multiplicities (i.e. requiring that events have more than ∼ 6 final state jets) as a way of parametrically reducing QCD contributions to searches for new physics. Historically this approach was motivated by searches for black holes at the LHC (see e.g. ref. [1]), but more recently high multiplicity searches have been advocated as an effective technique for helping to probe scenarios with natural supersymmetry as well as those with baryonic R-parity violating supersymmetry [2][3][4][5][6][7]. High multiplicity final states also arise in theories of strong dynamics where new colored objects can produce four to eight top quarks through the production of "coloron" vector resonances or colored technipions [8,9]. Finally, some models introduced to explain the magnitude of the tt asymmetry measured at the Tevatron also predict large final state multiplicites [10].
One of the challenges of using high multiplicity as a handle for reducing QCD backgrounds is that the background rate is intrinsically difficult to calculate. The current state of the art for tree-level jet production is 2 → 7, with 2 → 6 being the most that is typically feasible. Significantly, these tree-level calculations have unquantified uncertainties in their rates and distributions. One of the computational challenges is that high multiplicity final states have enormous configuration spaces; for instance the 10 jet final state has 28 dimensions, with the consequence that is unfeasible to densely populate the configuration space with Monte Carlo events. Many of the configuration space variables, such as the angular separations between jets, ∆R ij , have not been studied in depth and historically have been unreliably calculated with tree-level Monte Carlo.
One way that high multiplicity backgrounds are estimated is by extrapolating from lower multiplicities. There are well-known approximate empirical scaling relations connecting the N jet production rate to the N + 1 rate. There has been some progress in deriving these from first principles (see e.g. ref. [11,12]); nevertheless, this approach comes with large uncertainties in the rates that will cause these searches to quickly become systematically limited. Additionally, for large N the p T spectrum for the N th jet becomes increasingly soft and harder to measure accurately, leading to additional uncertainties.
Recently alternative approaches to gaining sensitivity to high multiplicity final states have been developed in the spirit of the "fat jet approach" introduced 1 in ref. [13,14]. In effect these proposals factorize the problem: instead of directly counting jets, the event is clustered into a fixed number of large radius jets (N = 4, 5, or 6) whose substructure is then further scrutinized. Because the jet radius is large, these N "fat" jets will incorporate most of the radiation in the central region of the detector. The particles that would have formed multiple small radius jets in the traditional approach are clustered together into large radius jets. These fat jets will automatically have substructure, some of which will appear to originate from hard 1 → 2 parton shower splittings. While such splittings certainly occur in QCD, they occur relatively rarely with the result that requiring multiple fat jets to have multiple hard splittings helps to separate signal events with high multiplicities from the dominant low multiplicity QCD background. The fat jet approach has an additional benefit in that it may be better suited to getting a good handle on systematic uncertainties in the QCD backgrounds. Large radius jets, particularly well separated ones, have properties that are relatively independent from one another, since their dynamics are largely driven by the parton shower, which is local in nature. For instance, the mass of one fat jet is not strongly correlated with the masses of other fat jets in the event. By exploiting the approximate independence of jets, QCD backgrounds can be estimated with a data driven analysis. Effectively the large configuration space we started out with has been factorized into a much smaller fat jet configuration space tied to N (approximately identical) configuration spaces encoding the fat jets' substructure. This factorization lends itself to the measurement of appropriate jet templates, which can then be combined with fat jet distributions to arrive at background estimates. This approach is of obvious practical importance to experimental searches, but it is also useful for theoretical calculations of the backgrounds, since searching for obvservables that reduce QCD backgrounds by six to ten orders of magnitude while acquiring enough statistics in Monte Carlo can be challenging to prohibitive.
Specifically, this study builds on a recent paper [16] that proposed M J , the sum over fat jet masses, as an effective observable for separating high multiplicity signals from Standard Model backgrounds. Jet mass is the simplest jet substructure observable, but it is also one of the coarsest. This article advocates a more refined use of jet substructure to probe high multiplicity events, in particular by counting the number of subjets inside each fat jet. This may seem similar to clustering events into small radius jets and counting the resulting number of jets, but the potential to systematically estimate QCD backgrounds with a data driven approach distinguishes it from the traditional approach.
This article is organized as follows. Sec. 2 presents the two subjet counting techniques introduced in this paper. Sec. 3 describes how the backgrounds were generated in Monte Carlo. Sec. 4 describes how the backgrounds were validated against ATLAS data and how the QCD backgrounds should be amenable to data driven estimates. Sec. 5 introduces eight benchmark signals and presents the expected sensitivity of our search strategy. A comparison to previous high multiplicity searches is also made. We conclude in Sec. 6 with some general discussion.

COUNTING SUBJETS
This section describes the two subjet counting techniques implemented in this study, n k T and n CA . The former is a straightforward application of the exclusive k T algorithm [17]. The latter counts subjets by recursively inspecting the structure of the Cambridge/Aachen [18] clustering tree of the fat jet. Both are implemented using FastJet 3 [19]. As we will see, searches incorporating these methods yield improved sensitivity to the high multiplicity signals considered in Sec. 5. The two algorithms result in very similar expected limits, with the difference that n CA does better than n k T in cases where the QCD backgrounds are more important.

Wide radius jets
Wide radius jets are now a standard technique in high energy physics searches. The basic structure of our high multiplicity search is built around such fat jets and is common to both subjet counting techniques. First the event is clustered into fat jets with R 0 = 1.2 using the anti-k T jet algorithm [20]. Next the fat jets are trimmed [21] with the parameters r cut = 0.3 and f cut = 0.05. While fat jets are particularly sensitive to pile-up, making some sort of jet grooming necessary, it has been shown that jet substructure observables such as jet mass and N-subjettiness [22] have a significantly reduced sensitivity to pile-up effects with this choice of parameters [23]. Next the leading fat jet is required to have p T > 100 GeV, while subleading fat jets are required to have p T > 50 GeV. Only those events with four or more such fat jets are considered. Then the subjet count n i of each of the four leading fat jets is calculated using one of the two algorithms described below. Finally cuts are made on the observables and the missing transverse energy ( / E T ).

Counting with k T
The exclusive k T algorithm is defined via two metrics, d ij and d iB , and a dimensionful resolution parameter d cut [24]. The jet-jet metric d ij and the jet-beam metric d i are defined as: Here, p T i is the transverse momentum of protojet i and ∆R 2 ij ≡ ∆y 2 ij + ∆φ 2 ij . The exclusive mode of the algorithm proceeds by sequentially clustering pairs of protojets, stopping once all the d ij and d i are above d cut . When the smallest metric is d ij , i and j are combined. When the smallest metric is d i , protojet i is set aside as a beam jet. The total number of subjets (in the exclusive sense) is then given by the number of protojets that are not beam jets and that remain once the clustering step has terminated. 2 For our particular application we define n k T by taking where p T J is the total transverse momentum of the fat jet and f k T is a dimensionless parameter that we take to be given by throughout. This value of f k T leads to good separation between signal and background, although for the range of signals considered the separation depends only weakly on the particular value used. Then for a given fat jet n k T is taken to be the number of subjets identified by exclusive k T with The dependence on the parameter p Tcut is rather weak for the massive fat jets of interest, which contain softer subjets only infrequently. A significant advantage of this definition of n k T is that a typical QCD jet will, due to its asymmetric energy sharing (a hard core surrounded by soft radiation), have a small number of subjets since much of the soft radiation will be clustered with the core (see Fig. 1). This is in contrast to a naive application of Cambridge-Aachen for reclustering, which can yield a large number of subjets even for a single-pronged QCD jet.

Counting with Cambridge-Aachen
The n k T algorithm introduced in Sec. 2.2 was entirely generic in its motivation and approach. The present method, denoted by n CA , aims instead to count the number of hard partons consistent with the decay of a massive particle. The n CA algorithm is explicitly constructed to: • identify massive substructure; and • 'undercount' the number of subjets within a given fat jet if the energy sharing among the subjets is very asymmetric.
The latter requirement is made because asymmetric sharing of energy between subjets is a telltale sign of subjets generated via the parton shower. The method is in the spirit of the various substructure algorithms that have emerged since the introduction of the BDRS procedure [14] and which make use of the information encoded in the clustering tree of the jet. In particular, it is closely related to an intermediate step in the HEPTopTagger [25,26]. We determine the number of subjets by unclustering the fat jet down to the mass scale m cut , throwing out subjets with an asymmetric energy sharing as defined by y cut . The number of identified subjets that then pass an additional p T cut yields n CA . In detail, the method is defined as follows: 1. Cluster a given fat jet using the Cambridge/Aachen algorithm.
2. To define n CA inspect the clustering tree of the fat jet via the following recursive procedure.
4. If m j < m cut or dR(j 1 , j 2 ) < R min consider j a subjet and exit the recursion.
6. Continue the recursion on j 1 and (if it is retained) j 2 .
7. When the recursion is complete, count the number of identified subjets with p T > p Tcut ; this number is n CA .
So, for example, an idealized two-pronged jet initiated by the hadronic decay of an energetic Z boson would yield (supposing that the decay angle is such as to yield a roughly symmetric energy sharing) a count n CA = 1 for m cut > m Z and n CA = 2 for m cut < m Z . Throughout this study we use the following parameters: These values lead to good separation between signal and background, although for the range of signals considered the separation provided by n CA depends only weakly on the particular values used.

Comparison of n k T and n CA
This section has introduced two distinct subjet counting techniques, and it is interesting to ask how they are related. A detailed comparison between the two algorithms is complicated by the fact that each is defined by several parameters. For simplicity we restrict ourselves to the parameter choices made above. Qualitatively, the two algorithms have a number of similar features.
On a jet-by-jet basis, there are strong correlations between n k T and n CA , with n CA typically yielding more subjets than n k T . Fig. 1 illustrates a pronounced example of the tendency for n CA to identify more subjets. For a given ensemble of jets, it is useful to define the normalized distribution P (n), which is the fraction of jets with n subjets. The left panel of Fig. 2 shows P (n) for both algorithms for a sample of leading QCD jets generated by MadGraph and with p T ≥ 100 GeV. Also illustrated is P (n) for a sample 3 of leading jets drawn from signal events where pair produced gluinos decay viag → ttχ (mg = 600 GeV and m χ = 60 GeV).
To study how n k T and n CA are correlated, it is useful to introduce the joint distribution From the joint distribution one can define the mean of n CA as a function n k T as well as the mean of n k T as a function of n CA : µ n CA (n k T ) = n CA n CA P (n k T , n CA ) and µ n k T (n CA ) = FIG. 1: Example of a fat (and very massive) QCD jet with p T = 910 GeV, m = 360 GeV and with its two n k T subjets (left) and its four n CA subjets (right) indicated. Note that this jet has not been trimmed to better illustrate the different treatment of soft radiation in n CA and n k T (the dark gray cells on the right do not belong to any identified subjets). The subjet distributions log 10 P n k T (n k T ) and log 10 P n CA (n CA ) in blue and red, respectively. Right: The blue and red distributions show δµ n CA (n k T ) and δµ n k T (n CA ), respectively, which are defined in Eq. 7. The error bars on δµ n CA (n k T ) and δµ n k T (n CA ) correspond to the standard deviations σ n CA and σ n k T . For both panels the solid lines correspond to a sample of leading QCD jets generated by MadGraph and with p T ≥ 100 GeV, while dashed lines correspond to a signal-like sample described in the text.
From this one can then define the quantities δµ n CA (n k T ) = µ n CA − n k T and δµ n k T (n CA ) = µ n k T − n CA (7) which are shown in the right panel of Fig. 2. It can be seen that that n k T and n CA track one another pretty closely for small n, but that for larger numbers of subjets n CA tends to pull further and further ahead of n k T . The correlation between n k T and n CA is somewhat tighter in the signal-like sample than in the QCD sample. For fixed n k T the distribution P (n k T , n CA ) is a function of n CA with standard deviation σ n CA (n k T ). The standard deviation σ n CA (n k T ) is a steadily rising function of n k T , as illus-trated by the error bars in the right panel of Fig. 2. For the QCD sample it grows from σ n CA (1) 0.3 to σ n CA (6) 1.0, indicating that for small n the two algorithms identify very similar numbers of subjets, but that as the amount of substructure grows there is less agreement between the two algorithms. Similarly the standard deviation σ n k T (n CA ) grows approximately linearly from σ n k T (1) 0.4 to σ n k T (8) 0.9. Note that this dispersion is logically distinct from the divergence in the mean seen in δµ n CA (n k T ).
Note that both n CA and n k T are peaked at 3 for the leading jet of the signal (see Fig. 2). This is as expected, since this signal contains up to 12 final state quarks, with the result that, if the leading four fat jets capture all of the decay products, an average of 3 subjets per fat jet are expected.
Interestingly, the subjet distributions for QCD fat jets are not governed by approximate "staircase scaling," as one might have expected. This is a scaling pattern defined by the condition that, if we define the ratio w(n) ≡ P (n + 1) P (n) (8) then w(n) is a constant independent of n. Instead, a significantly steeper distribution is seen. The variable is useful for characterizing deviations from constant w, with staircase scaling corresponding to r = 1. For the case of QCD fat jets the ratios w(n) and r(n) for both n k T and n CA are illustrated in Fig. 3. Since there are intrinsic energy scales in both n k T and n CA , low p T jets are sensitive to these scales. For p T ≥ 100 GeV, the w(n) distribution drops rapidly before asymptoting to a constant of w n CA 0.05 and w n k T 0.03, indicating staircase scaling with a very hard spectrum. Requiring higher p T jets makes the influence of the intrinsic scales in the counting algorithms less relevant. If the minimum p T of the jet is raised to 500 GeV, the scales inside the subjet counting algorithm will play a much smaller role. Even for these high p T jets staircase scaling is not observed; instead w(n) appears to have a staircase behavior and r(n) is approximately constant with r n CA 0.5 and r n k T 0.4. This can be called "double staircase scaling" and indicates a distribution of subjets that scales like P (n) ∼ r n 2 /2 . This is in contrast to the more traditional staircase scaling that gives P (n) ∼ w n demonstrating that subjet counting is not related in a straightforward manner to jet counting. Deviations from staircase scaling is well known and have even been explained in [12]; however, the deviations from staircase scaling noted here do not appear to be "Poisson" which predicts that r should asymptote to unity which does not appear to happen.
Finally, it is worth noting that some of the quantitative differences between n CA and n k T are being driven by the p T -ordering of the parton shower algorithms in PYTHIA and SHERPA. This is important because the definition of n CA is motivated by (the unclustering of) angular-ordered emissions, while n k T is motivated by p T -ordered emissions. As we will see The ratio w(n) for n k T (bottom) and n CA (top) for jets with p T ≥ 100 GeV (blue) and p T ≥ 500 GeV (red). "Staircase scaling" corresponds to a flat w(n). Right: The ratio r(n) for n k T (bottom) and n CA (top) for jets with p T ≥ 100 GeV (blue) and p T ≥ 500 GeV (red). "Double staircase scaling" corresponds to a flat r(n) and appears to be emerging in the high p T regime.
in Sec. 5, from the point of view of discriminating QCD backgrounds from high multiplicity signals, what is most important is the heaviness of QCD tails (i.e. the fraction of fat QCD jets with n k T , n CA 3). Preliminary studies indicate that a p T -ordered (angular-ordered) parton shower tends to lead to fatter QCD tails for n CA (n k T ). Consequently, the results in this paper, which rely on p T -ordered parton showers, probably place n k T at a relative advantage over n CA as far as QCD discrimination is concerned. Given the relatively better agreement between LHC jet substructure data and HERWIG++ as compared to PYTHIA 6.4 (at least as far as leading order Monte Carlo is concerned; see e.g. ref. [27]), we suspect that n CA is likely to outperform n k T on real data. Closer scrutiny of parton shower dependence would be an important part of any future experimental study.

MONTE CARLO CALCULATIONS
This article studies the use of the total number of subjets in an event as a way to separate new physics scenarios that produce many final state quarks and gluons from QCD and electroweak-scale backgrounds. The ultimate goal is to reduce the reliance upon missing transverse energy ( / E T ) and lepton requirements so as not to veto on signals that have neither, while still obtaining relatively low background search regions. Since / E T and leptons are the standard handles used to reduce QCD backgrounds, it is particularly important to have reliable estimates of multijet production rates and differential distributions. Of course, the same holds true for the non-QCD backgrounds. Consequently this section and the next play an important role in everything that follows. Throughout, the calculations are performed

Process
Order p T range (GeV) Cross Section (fb) Events Event Weight The non-QCD backgrounds used in this analysis. The subscript i in n i indicates the highest jet multiplicity considered in the matched sample. Thus tt + n 2 j is tt + 0j, tt + 1j, tt + 2 + j, where the last jet multiplicity is an inclusive process that can include higher jet multiplicities generated through the parton shower. The p T slicing is with respect to the leading massive object. The two samples marked with a * denote that resonant top production is excluded to avoid double counting. The last column indicates the ratio between the expected number of events at 30 fb −1 and the number of Monte Carlo events generated.
at a center of mass energy of √ s = 8 TeV.
The rest of this section is organized as follows. Sec. 3.1 describes the calculation of non-QCD backgrounds such as V +jets and tt+jets. Sec. 3.2 describes the two approaches used to generate QCD Monte Carlo events. Sec. 3.3 describes how detector effects and jet clustering are implemented. These latter two subsections are complemented by the discussion in Sec. 4, which focuses on data driven methods.

Non-QCD Backgrounds
The dominant Standard Model backgrounds are QCD, V +jets and tt+jets. Since, however, any backgrounds where there is an intrinsic mass scale are potentially important, the leading subdominant backgrounds were also computed for completeness, see Table 1. The non-QCD backgrounds used for our analyses were generated using MadGraph 4.5.1 [28][29][30] and showered and hadronized using PYTHIA 6.4 [31] . The five-flavor MLM matching scheme with a shower-k ⊥ scheme was used to account for the extra radiation [32].
It is the high p T tails of these backgrounds that will make the dominant contribution to signal regions. Since some of these backgrounds have quite large cross sections, it is not feasible to generate 30 fb −1 worth of Monte Carlo. Instead, the backgrounds are generated in different p T bins of the heavy particle, p T heavy , where a heavy particle is any one of t, W ± , Z 0 , or H 0 . 4 This ensures that the regions of phase space that result in the largest contamination of the signal region are computed with sizeable significance. Another important consideration is that events with small p T heavy can still have large H T as a consequence of high jet activity. Since these backgrounds are potentially important for regions of phase space with sizeable / E T , it is important for them to be computed with sufficient significance. However, if p T heavy is small, then so is the intrinsic / E T , with the consequence that the high H T /low p T heavy region of phase space is not important for the signal region because it is subdominant to the QCD contribution. In practice, the large event weights of the low p T heavy regions do not limit the statistical accuracy of the background estimate.
The resulting backgrounds are shown in Table 1, where the subscript i in n i indicates the highest jet multiplicity considered in the matched sample. The different p T heavy bins are listed in the third column. The five flavor matching scheme introduces one additional complication for diboson and single top production. This arises because there can be resonant top production in these two channels that has already been included in ordinary tt production. In order to exclude this double counting, a requirement of |m t − m bW | > 15Γ t is imposed.
The two most important non-QCD backgrounds for our search are V +jets and tt+jets. These backgrounds not only have large cross sections but are also jet rich and result in a reasonable amount of / E T . Backgrounds like tttt, ttV and ttH, which have jet multiplicities and / E T comparable to that of some of the benchmark signals we study, have small cross sections and make a negligible contribution to the total background (although we include them for completeness).

QCD
Several techniques are used to calculate the QCD contribution to signal regions. Sec. 3.2.1 describes a calculation of the QCD background using an MLM-matching scheme implemented in MadGraph and PYTHIA using unweighted events with up to four partons matched. This is a relatively low multiplicity method of generating backgrounds and relies heavily on the parton shower to generate high multiplicities; nevertheless, it is a standard calculational method that makes it easy to to get good Monte Carlo statistics over the entire signal region. Sec. 3.2.2 describes a calculation of the QCD background using a CKKW-matching scheme implemented in SHERPA using weighted events with up to six partons matched. This method allows significantly higher multiplicities to be generated and samples the high energy and high multiplicity tails with weighted events. The use of weighted events tends to hurt convergence and gives relatively poor Monte Carlo statistics.

MadGraph+PYTHIA
One set of QCD backgrounds was generated with MadGraph 4.5.1 [28][29][30] at O(α 4 s ) and showered with PYTHIA 6.4 [31] using MLM matching [32]. The events were generated in multiple exclusive samples varying both the matrix-element parton multiplicity from n j = 2 to n j = 4 and the scalar transverse energy H T using the 5-flavor matching scheme [33]. It is not practical to generate multiplicities higher than n j = 4 in MadGraph due to computational limitations. Since the event selection for our search strategy requires n j ≥ 4, all jet substructure will be modeled by the parton shower. This is clearly insufficient for gaining much confidence in the resulting background estimates. Nevertheless, the MadGraph events will serve as a useful crosscheck in what follows. In addition, the high statistics available from MadGraph will be very useful in creating missing energy templates in Sec. 4.1.

SHERPA
The second event generator used to generate QCD backgrounds is SHERPA 1.4.0 [34][35][36][37][38]. SHERPA uses CKKW matching [39] and is capable of generating up to n j = 6 at the matrix element level. SHERPA can therefore generate more hard substructure using matrix elements without relying on the parton shower. Consequently we will primarily be relying on SHERPA for our QCD background estimates. This sample was generated and fully described in [49] One of the drawbacks of SHERPA is that it is not straightforward to generate separate samples binned by H T . Instead weighted Monte Carlo events can be generated. The main problem with relying on weighted Monte Carlo is that it becomes more difficult to obtain convergence in the signal region. One frequently observes that a single (or handful of) high weight event(s) can make large contributions to the tails of distributions, with the consequence that statistical uncertainties in the tails become large.
Two separate weighting methods were used in generating the QCD backgrounds. The first is the default weighting procedure in SHERPA. The second skews the weights towards higher multiplicities, with w(n j ) = 4 n j −2 (10) for 2 ≤ n j ≤ 6. Thus, 256 times as many n j = 6 events are generated as compared to n j = 2 events. This allowed for the generation of relatively more high multiplicity events so as to better fill out the tails of the distributions. Together these two weighting methods resulted in a total of 4.8 × 10 6 events passing our basic fat jet requirements (see Sec. 2.1).

Comparison of MadGraph and SHERPA
In Figs. 4 and 5 the SHERPA results are compared to MadGraph+PYTHIA. We see that there is generally good agreement between the two. Even the subjet count N CA shows good agreement up to N CA = 8, a regime in which both generators (especially MadGraph) are relying on the parton shower to generate substructure. The biggest differences appear in the tails of the distributions. As discussed above, the presence of high weight events in SHERPA can lead to poor convergence. Whenever there is a large disagreement between the two generators in these figures, it seems to be largely driven by this effect (c.f. the large statistical uncertainties in the regions of largest disagreement).
The disagreement in the tails of the distributions between SHERPA and MadGraph + PYTHIA deserves further comment. While the naive expectation is that SHERPA should provide a better description in the high H T , M J and N J regime, there are significant statistical uncertainties in the SHERPA sample arising from slow convergence of the weighted Monte Carlo calculations. The features of the distributions also appear to be slightly pathological in their behavior, appearing as inflection points in regions that one would expect to be relatively smooth. In the case of N CA , some of the bins are not even monotonically decreasing. This article will rely on the SHERPA calculation for its background estimates; fortunately the design of the search regions in Sec. 5 will not be heavily influenced by these features. In Sec. 4, a data driven approach to calculating these backgrounds is presented, and the disagreement between the data driven approach and the straight Monte Carlo calculation will be, at least in part, related to these same features.

Detector mockup and jet clustering
After showering, all hadron-level events are passed to the PGS 4 [40] detector simulation, which parameterizes the detector response. The detector parameters used are those of the default ATLAS PGS card. The PGS output is clustered into 0.1 × 0.1 cells in η − φ space, and then each cell is represented as a massless four-vector pseudoparticle. Finally, those pseudoparticles with rapidities |y| < 2.5 are fed into FastJet 3, which we use for jet clustering [19].
The primary purpose of PGS is to give estimates of the missing energy that arises from imperfect detectors. As we will see in Sec. 4.1, for the QCD backgrounds this is best accomplished by parametrizing the PGS QCD missing energy spectrum in terms of template functions. In Sec. 4.1.1, our Monte Carlo background calculations are compared against published ATLAS data. There we will see that the QCD missing energy templates will need to be rescaled to obtain a better fit to the data. PGS is also useful for simulating lepton identification efficiencies. However, for the primary proposal of this paper, no lepton requirements or vetoes are made, and so lepton identification efficiencies will not play a large role.

Treatment of leptons
The goal of this paper is to develop a search strategy that can dramatically reduce Standard Model backgrounds while making the least number of assumptions about the characteristics of the final state apart from its having a high multiplicity. Consequently, while it may be advantageous to require or veto on something like b-tagged jets or isolated leptons for a particular signal model, we do not do so here. For the broad class of signals we are interested in probing, the high final state multiplicity may be exclusively hadronic in origin or it may include some number of leptons. For models with multiple possible cascade topologies (or indeed any with top quarks or electroweak gauge bosons in the final state) both the hadronic and semi-hadronic modes may be simultaneously present, and signal discovery may require sensitivity to both channels. Consequently throughout this study we treat leptonic energy as hadronic energy. That is to say that in both fat jet clustering and subjet counting, hadronic and leptonic energy are treated democratically. This helps to ensure that signal efficiencies are not unnecessarily degraded. It is interesting to ask whether alternative treatments of the leptons might lead to effective search strategies without having to sacrifice the relative inclusiveness of the present search. Doing so, however, lies outside the scope of this paper.

DATA DRIVEN BACKGROUNDS
High multiplicity searches push into regions of phase space that are challenging to model, particularly in the case of the pure QCD backgrounds. For this reason it is important to have as many handles on the backgrounds as possible. In particular a data driven extrapolation of the background from a control region to the signal region would be especially valuable for corroborating background estimates available from Monte Carlo. In this section we explore how such a data driven estimate might be made. In Sec. 4.1 we specialize to the particular case of missing energy. The resulting missing energy templates allow us to achieve significantly improved statistics in our Monte Carlo estimates of QCD missing energy acceptances. In Sec. 4.1.1 we compare our Monte Carlo background estimates to published ATLAS data. Finally in Sec. 4.2 we discuss the possibility of extending the template method to take into account M J and N J . These latter results are preliminary and are not used for the background estimates that enter in the expected limits in Sec. 5.

QCD missing energy templates
One of the purposes of this work is to reduce the dependence of new physics searches on missing energy requirements, which are particularly effective in reducing QCD backgrounds. Thus it is important that we model / E T , and in particular QCD / E T , as accurately as possible. QCD missing energy typically arises from two distinct sources at the LHC. The first is from neutrinos lost in semi-leptonic decays of bottom and charm quarks. This irreducible form of missing energy gives a long non-Gaussian tail to missing energy distributions but can be estimated through Monte Carlo calculations. The second form of missing energy arises from detector effects that result in particles being lost or otherwise mismeasured. This form of missing energy is usually parameterized as a response function on the jet-by-jet level and is typically Gaussian for several standard deviations. The typical amount of missing energy scales as the square root of the jet energy, although there is a small linear term that takes over at large jet energies. For QCD events it is this latter form of missing energy, which arises from detector effects, that is dominant.
This article uses the approach of creating a probability distribution function for the missing energy of QCD events as a function of H T (c.f. Sec. 4.2). This allows for a huge reduction in the number of Monte Carlo events necessary for accurate background estimates in the presence of missing energy requirements. Because the detector response is orthogonal to other jet properties that will be used (and is in any case being parametrized by PGS) this approach should faithfully reproduce the results of a much larger Monte Carlo calculation.
Specifically, a probability distribution function for missing energy significance, as a function of H T is constructed from the unweighted MadGraph QCD sample: Thus for the rest of this study (where SHERPA will be used for all our QCD background estimates), although the H T distribution will be modeled by SHERPA the mapping H T → / E T will be modeled by a combination of PGS and MadGraph. We use MadGraph for this purpose because doing so yields much better statistics (the ability to target specific H T bins is one advantage). This PDF can be used event-by-event to determine the probability that an event, e i , passes a specific / E T requirement: Fig. 6 shows the missing energy significance distributions for various H T windows. Note that these PDFs are just another way of parametrizing PGS's detector response. We will see below, in Sec. 4.1.1, that it will be necessary to scale these PDFs to ensure a better fit to published ATLAS data. Looking forward to Sec. 4.2, the factorization we have introduced is given succinctly by which can be compared to the analogous expression in Eq. 18 below.

Background validation
We validated our backgrounds against the signal regions used by an ATLAS search for squarks and gluinos [41]. This lead us to rescale our / E T PDFs (see Sec. 4.1 and below), since the ATLAS PGS parameters were resulting in too large QCD / E T acceptances. The results after the rescaling are shown in Table 2. Overall, our Monte Carlo background estimates agree with the data driven estimates given in the ATLAS study. In those cases where there is disagreement, our Monte Carlo always overestimates the ATLAS estimates so that the expected sensitivities presented in Sec. 5 should be reasonably conservative.
Each of the ATLAS signal regions includes a lepton veto. Because ATLAS lepton identification is only approximately reproduced by PGS, we expect larger differences between Monte Carlo estimates and ATLAS estimates for those backgrounds with leptons. This is indeed what we find in Table 2, where backgrounds with no leptons (like Z 0 /γ * +jets) match  the data well. For events with / E T arising from the decay of W ± bosons, such as W ± +jets and tt+jets, the comparisons are systematically off because PGS is not identifying isolated leptons with sufficiently high efficiency. This is not a major issue in the validation because the searches described in this article neither require nor veto on leptons.
As mentioned above, the PGS treatment of angular and energy smearing does not faithfully reproduce the response of the ATLAS detector. In particular, in the presence of / E T cuts the ATLAS PGS parameters result in an overestimate of QCD backgrounds by an order of magnitude solely due to the treatment of / E T . In order to better reproduce the QCD backgrounds, the / E T templates are rescaled by making the replacement The best fit to ATLAS estimates for QCD backgrounds in / E T -rich regions is with α = 0.8. This rescaled / E T template leads to significantly improved agreement between the Monte Carlo and ATLAS estimates of the QCD background.

Fat jet templates
The main obstacle to modeling 4-jet QCD production is the large dimensionality of the space of observables under consideration. The quantity we would like to understand is the 9-dimensional 4-(fat)jet differential cross section dσ 4J ( / E T , m i , n i ). Here / E T is the missing energy of the 4-jet event, the m i are the masses of the four jets, and the n i are the four subjet counts (using e.g. the n k T algorithm defined in Sec. 2). With dσ 4J in hand the chosen cuts on / E T , can be imposed, thus yielding the expected QCD background in the signal region.
To make progress it is useful to reduce the dimensionality of the problem. This can be done by making the assumption that each of the four jets is governed by a universal probability distribution ρ J (x, n; p T ) (16) with x ≡ m/p T which describes the probability of a fat jet having n subjets and a particular value of m/p T , as a function of the fat jet p T . The mass of a fat jet is correlated with the number of subjets it contains, since it is impossible to get multiple subjets without having a sizable mass; consequently ρ J does not factorize and it is necessary to construct a two dimensional PDF.
The assumption of universality is not completely valid, for one reason because quarkinitiated jets and gluon-initiated jets will have different distributions. The hope is that ensembles of jets will have similar ratios of quark-versus gluon-initiated jets and that the distribution functions will not be radically different. In practice, this is the case since it is challenging to distinguish quark-initiated jets from gluon-initiated jets and since it is difficult to construct selection criteria that isolate one from the other. The assumption of universality is even more aggressive, however, since it implies that these distribution functions are independent of their environment. This assumption is known to be violated to some degree, particularly as jets come closer together, but the pull of the environment on properties of fat jets tends to be less than O(10%) in magnitude. 6 In this section we will be satisfied to check these assumptions empirically with Monte Carlo calculations, leaving a more detailed study to future work. Note that we will not be applying the resulting background estimates when calculating the estimated sensitivity of our search strategy: the results presented in Sec. 5 will use the SHERPA Monte Carlo calculations described in Sec. 3.2.2. The results in this section are a first attempt at studying more aggressive uses of data driven approaches to QCD backgrounds.
The assumption underlying the form of this jet template is that a jet's substructure (e.g. its mass) is determined by its p T and is independent of other jets in the event. The full 4-jet distribution is then obtained via the product: Here the p T i are the transverse momenta of the four jets. Thus the 9-dimensional distribution dσ 4J ( / E T , m i , n i ) has been re-expressed as a function of the 5-dimensional distribution dσ 4J ( / E T , p T i ) and the 3-dimensional jet template ρ J (x, n; p T ). A further reduction can be made by assuming that / E T only depends on the quantity H T ≡ p T i . With this assumption we can introduce the / E T template P MET (y; H T ), with T thus ending up with the factorization Ultimately it is an experimental question whether such a factorization holds. At some level we certainly expect correlations between the four jets and deviations from the form of Eq. 18. For example, we would expect correlations to arise from color (re)connections as well as outof-jet radiation. The presence of significant pile-up (so long as it remains unsubtracted) would also tend to result in (positive) correlations between the jets. In the case that the correlations are large it may be necessary to systematically include corrections to Eq. 18. We anticipate that some kind of principal component analysis or form of tensor decomposition would be applicable. We leave this interesting question to future work. For the remainder of this section we would like to explore the degree to which the universality assumptions underlying Eq. 18 are valid in the only data sample available to us, namely the 4.8 million SHERPA events described in Sec. 3.2.2.
In a realistic experimental study one would presumably want to measure dσ 4J (p T i ) and ρ J (x, n; p T ) from independent samples. Given our somewhat limited statistics, however, we will instead 'measure' dσ 4J (p T i ) and ρ J (x, n; p T ) from the same 4-jet sample and use Eq. 18 to construct an estimate of the full 9-dimensional distribution. This will allow us to estimate acceptances after imposing / E T , M J and N J cuts. The degree to which this procedure reproduces cut acceptances in the raw event sample will reflect the viability of the jet template and / E T template ansätze. Given our somewhat limited statistics, it is difficult to judge whether deviations between the raw and template cut acceptances (see Fig. 7) are an indication of deviations from Eq. 18 or just statistical fluctuations. Nevertheless, the approximate agreement over 7 orders of magnitude of cut acceptance in Fig. 7 is a promising result, although more sophisticated statistical methods would likely be required for a robust experimental analysis.
When the jet template and / E T template ansätze are appropriate, they have the advantage of reducing the statistical uncertainties in dσ 4J ( / E T , m i , n i ). This follows directly from the reduced dimensionality of the problem. This reduction is especially significant in the tails of the distribution, where statistical uncertainties are parametrically reduced by virtue of the fact that-due to the convolution-they receive sizeable contributions from statistically rich parts of the component probability distributions. For example, a rare 4-jet event with N J = 12 will be dominated by contributions from four jets with three subjets each. The probability of a single fat jet having three subjets at a given m/p T and p T can be measured (or calculated) readily.
Note that in the above we have discussed template methods in the context of LHC data. Another possibility is to use template methods to extend Monte Carlo calculationsindeed this is precisely what we did in the previous section for the specific case of missing energy. In the context of Monte Carlo template methods have the obvious advantage of reducing statistical uncertainties in the tails of distributions. They also offer the possibility of extending lower multiplicity calculations to a higher multiplicity regime in the following sense. Take for example SHERPA, which can generate up to n j = 6 jets in the final state at the matrix element level. Thus a SHERPA dijet sample will include jets with up to five partons generated through matrix elements. When a fat jet template obtained from such a dijet sample is combined with a 4-jet distribution dσ 4J (p T i ), the resulting distributions will extend the nominal reach of SHERPA beyond n j = 6. Of course, the theoretical validity of such a method is a delicate matter. Given the importance of having reliable Monte Carlo estimates in the tails of distributions, however, such an approach deserves further study. For example, it would be important to investigate correlations between fat jets. Since SHERPA extends to n j = 6, one would be able to, among other possibilities, study 3-jet events and observe what happens as fat jets come closer together.

RESULTS
This section investigates the benefit of incorporating a subjet counting observable, namely N J , into high multiplicity searches based off the summed jet mass observable, M J . Sec. 5.1 discusses the models used to quantify the improvement in searches that results from incorporating N J . These are supersymmetric models whose phenomenology involves the pair production of gluinos that subsequently decay into the lightest supersymmetric particle. Both R-parity conserving and R-parity violating models are considered. We choose these benchmark signals because they are well known in the literature and are easy to implement in Monte Carlo event generators. Sec. 5.2 describes the signal and background distributions in signal-like regions. Sec. 5.3 describes the criterion that was used to create optimized search regions for the benchmark signals. Sec. 5.4 describes the expected sensitivity of the optimized search regions to the benchmark signals. Finally, Sec. 5.5 compares the optimized search regions to previous searches.

Benchmark signals
The goal of this work is to gain access to a large class of signals without specifically targeting any one signal. Nevertheless, it is useful to have some benchmark models to consider. While these benchmark models are plausible extensions of the Standard Model, more than anything else they are meant to exhibit features of theories that produce high multiplicity final states. For any single theory, there are numerous handles beyond large multiplicity that could allow for additional discrimination between signal and background. For instance, many of our benchmark signals have leptons and b-jets. These are powerful handles that can be used in conjunction with the methods in this article, but they are not

Model
Gluino Decay Electroweakino LSP Decay Final State Partons qqχ(+4) ttχ i (+12) χ 0 χ 2 (+8) Stable(+0) cbs(+6) generic to all signals that produce high multiplicity final states. Therefore, these additional handles will not be used. The eight benchmark models considered in this article arise from the pair production of gluinos. These benchmarks provide relatively straightforward ways of toggling the multiplicity of final state partons within a class of models that is easily implemented in all the standard Monte Carlo calculation packages. Each benchmark model is generated using MadGraph 4.5.1 [28][29][30] and with up to two additional jets: The MLM matching scheme with a shower-k ⊥ scheme was used to account for the extra radiation. The events were showered and hadronized using Pythia 6.4 [31]. K factors for the signals were calculated using Prospino 2.1 [42]. A collection of signals with diverse phenomenology is considered in order to better explore/delineate the efficacy of the subjet techniques used in this paper. This diversity arises through the variety of gluino decay topologies that are possible. The gluino can decay to light or heavy quarks; it can decay directly to the LSP or instead through a cascade. The theory can be R-parity conserving or R-parity violating. Decay topologies with cascade decays or decays involving top quarks as well as certain RPV topologies will lead to very high multiplicity events with 12 or more final state partons. Indeed one benchmark signal we consider (G 7 ) has a spectacular 26 final state partons. The expected sensitivities to all the signals presented here will be given in Sec. 5.4.
In more detail, all the processes outlined here start with a gluino decaying to either a light quark or a top quark pair and a neutralino: The neutralino χ may be the LSP χ 0 or one of the heavier electroweakinos. For simplicity, only decays to the LSP and to the NNLSP χ 2 are considered, where the latter decay chain results in a 2-step cascade: Finally the LSP may or may not decay into jets. The constraints on R-parity violation are much weaker for decays into heavy flavor. If the LSP is lighter than 200 GeV, then the decays will be dominantly with the λ ijk U c i D c j D c k flavor structure (ijk) = (2, 3, 2). The resulting decay topology is If the mass of the LSP χ 0 is above 200 GeV, then it is possible for the dominant R-parity violating decay mode to be tbs, resulting in four more final state partons over the cbs decay mode. To keep the number of benchmark signals to a manageable number, we do not include this decay mode in any of our benchmarks. All of these possibilities taken together result in eight different gluino decay topologies that span a range of final state parton multiplicities, see Table 3. 7 For all signals, we choose the LSP mass using the formula For the signal models involving heavier electroweakinos we choose the intermediate masses as follows: Note that if the gluinos decay to light quarks and the LSP, the final state will have between 4 (RP conserving) and 10 (RPV) partons, which will make these topologies hard to discriminate against the tt background, especially in the former case. In the case of cascade decays or decays involving top quarks, however, there will be at least 12 partons in the final state and the method outlined in this article should prove more effective. Moreover, if R-parity is violated, each LSP will decay to three quarks, thus adding 6 jets to the final state. Cuts on the total number of subjets could then provide a competitive replacement for MET cuts for these kinds of signals.
These signals are simply meant as benchmark models to test the sensitivity of our search to high multiplicity final states. The search presented here should prove effective for any signal implying the existence of final states with 8 or more final state jets.

Distributions for signals and backgrounds
One of the goals of this paper is to investigate the degree to which / E T cuts for the signals of interest can be substituted (more realistically, loosened) by requiring particular jet substructure. That this is challenging can be inferred from Fig. 8, which shows the / E T distributions of the various backgrounds with the / E T distributions of two benchmark signals superimposed. Although the dominant QCD background is significantly reduced by a / E T cut of order 150 GeV, any loosening of this cut dramatically increases the number of QCD events in the search region.
The remaining backgrounds, most of which have intrinsic / E T , require additional cuts to be suppressed. As shown in Fig. 8, a cut on the sum of the jet masses of order 300 GeV is effective. That a cut on M J does not exhaust the discrimination available from jet substructure can be seen in Fig. 9, which illustrates how even at large values of M J the N CA and N kT distributions of a typical high multiplicity signal are well separated from that of the QCD background. The observation of similar behavior for the N-subjettiness ratio τ 3 /τ 2 [43] suggests that this separation should hold for real QCD data as well.
Thus N J cuts should be complementary to M J cuts, allowing for the possibility that / E T cuts could be loosened. This complementarity is made more explicit in the bottom row of Fig. 8, which shows the distributions of N CA and N kT for the various backgrounds and two benchmark signals after imposing / E T and M J cuts. Once these cuts have been imposed, the dominant remaining background comes from tt+jets (and to a lesser extent V +jets), as it has the largest number of final state partons. In order for tt+jets to pass the / E T cut, the W ± bosons have to decay semi-leptonically; while to pass the M J cut, the final state partons must be distributed in phase space such that they form massive jets upon fat jet clustering. The combination of all these cuts strongly suppresses the various backgrounds. The events considered are required to have at least four jets with p T > 50 GeV and the p T of their leading jet must be greater than 100 GeV.

Optimizing search strategies
The simplified models introduced in Sec. 5.1 can be used to develop broad search strategies that cover the model space. This section describes the method that was used to construct the minimal number of signal regions necessary to cover the entire space of simplified models. The method used was introduced in ref. [44], developed further further in ref. [45], and is based off the variable "efficacy," which is defined below.
In order to demonstrate the usefulness of N J cuts, we present two separately optimized search strategies. The first uses only M J and / E T cuts, while the second uses N J , M J and / E T cuts. Since the first set of searches is a subset of the second, the second will always do better. The degree to which the more complex search strategy can be judged superior (if at all) will depend on the resulting sensitivities, the number of search regions required, and the sorts of cuts favored by the introduction of N J .
The two search strategies are defined aŝ where the values of each of the cut requirements are taken from the following sets: This results in 1625 search regions forĈ and 27, 625 search regions for C. The optimized search strategies will make use of only a small subset of these search regions. A given signal region or set of cuts, C i , will yield an expected limit on the cross section times branching ratio, σ × B, for a given simplified model at the 95% C.L. given by Here (M ) i is the efficiency of C i for the model M and L is the integrated luminosity, while ∆(B) i is the maximum number of allowed signal events at the 95% C.L. if B background events are expected after the cuts and in fact fit the data. We take where Stat(B) is the Poisson limit on B and syst is the systematic uncertainty. Throughout we will take syst = 30%. The Monte Carlo statistical uncertainties in B and (M ) i , i.e. δB and δ (M ) i , are taken into account by making the replacements which result in conservative limits.
The optimal limit on a model M is then given by where the number of search regions is N cuts = 1625 or 27, 625 depending on whether N J cuts are being used. It is natural to quantify the "goodness" of a cut C i by how close it is to optimal. For this purpose, we introduce the efficacy of a cut This is the ratio of the expected limit on the production cross section using a particular cut C i divided by the expected limit on the cross section using the optimal set of cuts. An efficacy of 1.0 is ideal. Thus the best search strategy for covering a collection of model points {M j } will be a combination of cuts {C i } such that E is close to one for each M j for at least one of the C i . This article will use E ≤ E crit = 1.5 as the criterion for optimizing the number of search regions. That is, each model point M will be covered by at least one cut that yields a limit on σ × B that is within a factor of 1.5 of the optimal limit. The efficacy approach has several advantages: • it ensures near optimal coverage over the range of signals; • it allows for a fair comparison between different sets of observables; • it allows for a reasonable comparison to the ATLAS high multiplicity search, which makes use of 6 search regions; and • each signal is grouped with like signals on the basis of which search region it is covered by.
Finding a search strategy that covers all models with a desired efficacy is computationally challenging because the configuration space is enormous, with 2 Ncuts possible search strategies. Since a brute force search is not feasible, we use a genetic algorithm to construct the minimal set of search regions needed to cover the entire space of models. This algorithm, which we find to be quite effective for the task at hand, is described in App. A and is based off a genetic algorithm described in detail in [45].

Expected sensitivity
Expected sensitivities to the various benchmark signals at √ s = 8 TeV are depicted in  These are presented as expected 95% exclusion limits on σ×B (the production cross section times the branching ratio into that particular gluino decay topology) as a function of the gluino mass and for an integrated luminosity of 30 fb −1 . As expected, the performance of the M J + / E T +N J search depends strongly on the final state multiplicity as well as the intrinsic / E T of the signal. The results of the subjet counting search are best revealed by comparing the optimal search regions for the M J + / E T search to those of the M J + / E T +N J search. For the case of N CA the former has 6 search regions, while the latter has 5 search regions, see Tables 4  and 5. Interestingly, the efficacy criterion groups the signals into roughly similar signal classes, with the difference that some of the M J and / E T cuts move around once N CA cuts are introduced.
The first class of signals consists of G 0 alone, has intrinsic / E T from the stable (non-RPV) LSP and only 4 final state partons. Consequently the cuts (and expected limits, see Fig. 10) do not change substantially after the introduction of a (trivial) N CA cut.
The second class of signals consists of G 4 alone, which differs from G 0 in that the LSP undergoes the RPV decay χ → cbs. Consequently there is no intrinsic / E T , and both search strategies cover G 4 with search regions that have trivial / E T cuts. Since, however, G 4 is a high multiplicity signal with 10 final state partons, the corresponding N CA search region imposes a significant cut N CA ≥ 13 with a loosened M J cut. This results in an expected limit on σ × B that is better by a factor 2 to 4 compared to the optimized M J + / E T search (see Fig. 11) but that is nevertheless weaker than what would be needed to exclude the benchmark gluino cross section.
The third class of signals consists of G 5 , G 6 and G 7 . These signals have intrinsic / E T from top quarks or electroweak gauge bosons produced in the gluino decay chain. They also have especially large final state multiplicities, since the LSPs at the end of the decay chain end in the RPV decay χ → cbs. The inclusion of a cut N CA ≥ 12 − 14 improves the expected limit by a factor of 2 − 5 depending on the specific signal and gluino mass. The / E T cut is loosened by 50 GeV, while the M J cut is lowered by 200 − 300 GeV. This represents a modest success in trading our reliance on / E T cuts for a more refined use of jet substructure observables.
The fourth and final class of signals consists of G 1 , G 2 and G 3 . These signals have large intrinsic / E T because the LSPs at the end of the gluino decay chain are stable and because top quarks and/or electroweak gauge bosons are produced in the decay chain. The top quarks and electroweak gauge bosons also ensure that the final state multiplicity is high. The inclusion of a N CA cut improves the expected limit on σ × B by a factor 2 − 4 for low and intermediate gluino masses, with little or no improvement at large gluino masses. The inclusion of a N CA cut also loosens (in most places) the requirements on M J and / E T by 50 − 200 GeV and 50 − 150 GeV, respectively. This demonstrates that for these signals with significant / E T there is more room for loosening / E T requirements in favor of N J requirements.

Comparison with previous searches
This section presents a comparison of the techniques proposed in this article to previous searches. This is not meant to be a complete survey of all the searches that have been performed at the LHC and that are sensitive to high multiplicity signals. Two searches are considered. The first, presented in Sec. 5.5.1, is an ATLAS search that requires up to 9 R = 0.4 jets with missing energy. The second search, presented in Sec. 5.5.2, is a search for "black holes" at CMS. These two searches are different attempts at gaining access to high multiplicity final states. We find that the methods presented in this article are competitive.  [47] (dashed black) as well as the NLO gluino production cross section (grey solid) are also shown. The systematic uncertainty on the background is assumed to be 30%. Note that the CMS limit is rescaled by a factor of 5.

ATLAS High Multiplicity Search
A comparison is made with ATLAS's most up-to-date high multiplicity search, which makes use of 5.8 fb −1 at 8 TeV [46]. This search clusters events into R = 0.4 anti-k T jets and looks at 6 search regions: • n j ≥ 7 with p T ≥ 55 GeV It is further required that events contain no isolated leptons and that / E T / √ H T > 4 GeV , and for the M J + / E T +N k T search (dash-dotted brown) for signal G 4 , which has 10 final state partons and no intrinsic / E T . The exclusion limit given by the CMS black hole search [47] (dashed black) as well as the NLO gluino production cross section (grey solid line) are also shown. The systematic uncertainty on the background is assumed to be 30%. The ATLAS limit is not shown because it is orders of magnitude worse than the others due to its strict / E T requirement.
rescaling the expected number of background events and the corresponding uncertainties (as given by ATLAS) to the new luminosity and computing the exclusion limit as outlined in Sec 5.3. Such a linear rescaling, which assumes that systematic uncertainties do not come to dominate the limits, is probably overly optimistic. We find that for those benchmark signals with large final state multiplicities and intrinsic / E T (G 1,2,3 and G 5,6,7 ), i.e. the sorts of signals that the ATLAS search is designed to be sensitive to, the M J + / E T +N CA search generally outperforms the ATLAS search, particularly at higher gluino masses, where the exclusion limit on σ × B is improved by a factor 2 − 5. For these benchmark signals this corresponds to an extended gluino mass reach of order 100 GeV to 250 GeV. While it is difficult to know the extent to which this promising result can be realized at ATLAS or CMS, it is worth emphasizing that whatever the final search sensitivity should turn out to be, it is already valuable to have a search strategy that is governed by different systematic uncertainties.

CMS Black Hole Search
The CMS black hole search [47], which makes use of 4.7 fb −1 at 7 TeV, is also sensitive to high multiplicity final states. This search makes use of 16 search regions, each of which corresponds to different S T min and N min cuts. Here S T min ∈ [1.9, 4.1] TeV is a cut on the scalar sum over transverse energy and N min ∈ [3,7] is a cut on the total number of reconstructed objects with E T > 50 GeV. See ref. [47] for details.
In order to compare the expected performance of the CMS search to the optimized search , and the M J + / E T +N k T search (dash-dotted brown) for the R-parity conserving topologies G 1 , G 2 , and G 3 (left, top to bottom) and the corresponding RPV ones, G 5 , G 6 , and G 7 (right, top to bottom). The exclusion limits given by the ATLAS high multiplicity search [46] (dash-dotted green) and the CMS black hole search [47] (dashed black) as well as the NLO gluino production cross section (grey solid line) are also shown. The systematic uncertainty on the background is assumed to be 30%. Note that the CMS limit is rescaled by a factor of 5. strategies in Tables 4 and 5, it is necessary to extrapolate the CMS background estimates from 7 TeV to 8 TeV. Because there are no missing energy requirements, the background is completely dominated by QCD events. The absence of any intrinsic high energy scale in the background allows us to adopt the following approximate extrapolation. For each value of N min , CMS provides the expected number of background events as a function of S T , which we fit to an exponential, where α and S T are fit parameters. The number of background events as a function of S T at 8 TeV is then estimated to be These background estimates can then rescaled to 30 fb −1 and combined with the efficiencies of the benchmark signals in each of the 16 search regions to obtain the expected sensitivity of the CMS search at 8 TeV. While these background estimates are inexact, they are sufficient for demonstrating that the CMS search results in expected limits that are about two orders of magnitude weaker those obtained by a M J + / E T +N CA search (see Fig. 12). This is because for the gluino masses that are accessible at 8 TeV, the S T cuts are highly inefficient, and the absence of missing energy requirements results in large QCD backgrounds. This demonstrates that high multiplicity searches targeting black holes are not necessarily well suited for other kinds of high multiplicity signals.

DISCUSSION
Recent years have seen an impressive amount of research on a large variety of jet substructure techniques. 8 The majority of this work has focused on the development of either general purpose tools (jet grooming, top tagging, etc.) or jet substructure analyses tailored to specific search channels (e.g. the BDRS boosted Higgs search [14]). One area that has seen less work is the design of search techniques for topologies that are more complicated or whose structure is not known a priori. In this paper we have taken a step in this direction by arguing that jet substructure suggests a different approach to counting jet multiplicities that results in an effective search strategy that is sensitive to a variety of high multiplicity topologies.
The flexibility inherent in this approach raises the possibility of loosening missing energy cuts in favor of well chosen jet substructure cuts. This is of special interest for new physics scenarios in which signals exhibit little or no intrinsic missing energy, such as supersymmetric scenarios with baryonic RPV. In Sec. 5.4 we have seen that for signals with large final state multiplicities and some (though not necessarily very much) intrinsic / E T , the introduction of N CA cuts does in fact lead to lower / E T requirements. While this represents only a modest push towards the regime of (near)-vanishing / E T requirements, it is nevertheless an encouraging result given how effective / E T requirements are in reducing the huge QCD backgrounds. In fact, we find that trading / E T cuts for N CA cuts is particularly effective for the QCD background-it is the need to suppress the tt+jets background that prevents the / E T cuts from being loosened further in Table 5. We anticipate that if additional handles were introduced to combat the tt+jets background (e.g. vetoing on b-jets), then / E T cuts could be loosened even further. We have not pursued this interesting direction here, since our goal was to keep the search strategy as inclusive as possible.
One possible concern with high multiplicity searches at the LHC is their potential sensitivity to pile-up, something that becomes more pressing as the LHC pushes towards higher and higher luminosities. In this paper we have advocated the use of jet trimming to reduce the present search's sensitivity to pile-up, but something like the technique introduced in ref. [48] might also be necessary. This whole issue would need to be revisited if the search were to be performed by ATLAS or CMS. This is particularly the case because it is impossible for us to thoroughly examine pile-up effects given their sensitivity to detector effects and the fact that each collaboration has detector specific methods for mitigating pile-up effects. It is worth pointing out that one possible advantage of N CA is that it includes some built-in jet grooming by virtue of the veto it imposes on asymmetric subjet energy sharing.
In conclusion, we have seen that an effective search strategy can be developed by exploiting missing energy, a sum over fat jet masses, and a sum over fat jet subjet counts. The two subjet counting algorithms presented, N CA and N kT , yield comparable results, so that a choice between the two would need to be guided by experimental studies (with a particular focus on inherent systematic uncertainties, performance under pile-up, etc.). Other subjet counting algorithms are possible, 9 but what we would like to stress here is that, as has been seen in many other jet substructure studies, the flexibility of the fat jet approach is very powerful. In this case the potential for systematic data driven estimates of the QCD background is of particular importance. As another example of the flexibility of the fat jet approach, we refer the reader to the related work in ref. [49], which focuses on high multiplicity hadronic final states with vanishing missing energy.

Note
An implementation of n CA and n k T has been made freely available at http://fastjet.hepforge.org/contrib/.
for helpful discussions. MJ would like to thank Michael Spannowsky for interesting conversations. Special thanks to Timothy Cohen for having provided Monte Carlo data for the QCD background. SE and JW are supported by the US DOE under contract number DE-AC02-76-SF00515. SE is supported by a Stanford Graduate Fellowship. AH is supported by the US DOE under contract number DE-FG02-90ER40542. The genetic algorithm is initialized with 1000 search strategies, where each search strategy is a set of search regions. Each of these search strategies is formed as follows. First a random selection of 40 of the N cuts search regions is chosen. Each of these 40 search regions is assigned a weight proportional to the number of models it covers with E ≤ E crit . Finally, 1000 search strategies are created by sampling (without replacement) from these 40 search regions. This gives a slight preference in the initialization stage to search regions that are sensitive to more models. The exact initialization procedure is not critical for rapid convergence of the algorithm The search strategies are evaluated to see how many models they cover within the desired efficacy, and a "fitness" is assigned to them with the formula where M is the number of models covered, C is the number of search regions in the search strategy, and M max is the total number of models. This fitness function strongly penalizes search strategies that do not cover all models, followed by a penalty for having too many search regions. After evaluating the fitness of the search strategies, the least fit 50% are removed. Pairs of fit search strategies are then selected and a new search strategy is created by taking a randomly determined fraction of each search strategy's search regions. For instance, if the two selected search strategies had N 1 and N 2 search regions, then a uniform random number on the unit line segment, x, would determine that xN 1 search regions would be taken from the first search strategy and (1 − x)N 2 would be taken from the second search strategy. So if N 1 = 20 and N 2 = 30 and x = 0.20, 4 search regions would be taken from the first search strategy and 24 would be taken from the second. If duplicate signal regions are selected, the duplicate is removed, reducing the number of search regions. After creating a new search strategy, the search is mutated to guarantee that the population of search strategies has sufficient diversity. Each search region within a search strategy has a finite probability of being changed to another random search region. We use 6% for this probability known as the "mutation rate". Thus for the 16 search regions in the example, 1 change would be made on average.
If after ten consecutive generations no progress has been made, i.e. if no solution has been found that covers the entire model space, then a solution is manually created by forcing every model to be covered by some search region. This can be done by increasing the number of search regions in the search strategies until full coverage is achieved. Finally, if every model is covered and no further progress is achieved for seven generations, search strategies are scoured to see if any search regions can be removed without reducing coverage. Either way, the genetic algorithm is restarted. If no progress in reducing the number of search regions in a search strategy has been made in twenty generations, the program ends.