Dissecting jets and missing energy searches using n-body extended simplified models

Simplified Models are a useful way to characterize new physics scenarios for the LHC. Particle decays are often represented using non-renormalizable operators that involve the minimal number of fields required by symmetries. Generalizing to a wider class of decay operators allows one to model a variety of final states. This approach, which we dub the n-body extension of Simplified Models, provides a unifying treatment of the signal phase space resulting from a variety of signals. In this paper, we present the first application of this framework in the context of multijet plus missing energy searches. The main result of this work is a global performance study with the goal of identifying which set of observables yields the best discriminating power against the largest Standard Model backgrounds for a wide range of signal jet multiplicities. Our analysis compares combinations of one, two and three variables, placing emphasis on the enhanced sensitivity gain resulting from non-trivial correlations. Utilizing boosted decision trees, we compare and classify the performance of missing energy, energy scale and energy structure observables. We demonstrate that including an observable from each of these three classes is required to achieve optimal performance. This work additionally serves to establish the utility of n-body extended Simplified Models as a diagnostic for unpacking the relative merits of different search strategies, thereby motivating their application to new physics signatures beyond jets and missing energy.


Introduction
Hadron colliders provide some of the most important experimental inputs in high energy physics. At the microscopic level the colliding particles are quarks and gluons, implying that the production cross section is highest for states that either carry color or have a large interaction strength with quarks. In many beyond the Standard Model scenarios these new physics states then decay back to colored Standard Model particles, along with some dark sector objects that escape detection. The resulting experimental signature is multiple high p T jets and missing transverse energy, H T . Searches for new physics that are characterized by this final state have very high priority at the Large Hadron Collider (LHC) [1][2][3][4]. Hence, many ideas for distinguishing signal from background have been proposed [5][6][7][8]. The framework introduced in this article has been developed in order to quantitatively compare and contrast these different approaches.
It is particularly interesting to understand how the observables respond as a function of the final state parton multiplicity, which can vary between new physics models. The canonical jets + H T searches at the LHC are currently framed in terms of Simplified

JHEP08(2016)038
Production Decay Channel Final State q q q → q χ 2 partons + H T q g g → q q χ 3 partons + H T q → q χ g g g → q q χ 4 partons + H T q g g → q q Z 0 χ 5 partons + H T q → q χ t t t → t χ 6 partons + H T q g g → t t χ 7 partons + H T q → q χ g g g → q q Z 0 χ 8 partons + H T Models [9], the majority of which have been extracted from the Minimal Supersymmetric Standard Model (MSSM) [10]. In particular, these searches have been optimized for signals that are motivated by supersymmetric (SUSY) models, involving both gluinos g and squarks q (fermionic color octets and scalar color triplets respectively) which decay to a stable neutral particle, the neutralino χ, some number of light flavor q, bottom b, and top t quarks, along with the option of additional weak gauge bosons W ± , and Z 0 . The simplest and best studied decay modes are g → q q χ and q → q χ [11,12]. In typical R-parity conserving models, these particles are produced in pairs and the parton level final state involves some number of colored objects and H T . Table 1 provides a detailed picture a variety of the possible final states, including those which are currently being searched for, ignoring possible flavor-tagging.
While this suite of signal topologies covers a wide range of possible final states (not all of which have associated public results from the LHC collaborations), the relative optimizations are complicated by the fact that the different production modes do not yield the same cross section, and the presence of intermediate on-shell states can lead to additional features in the signal distributions. It is therefore difficult to contrast the variety of approaches for digging new physics out of jets + H T . In order to minimize the differences between the ways of generating these various final states, we are introducing a novel variation of the Simplified Models paradigm which we will refer to as the "n-body extension." As outlined in detail in the next section, we will be performing our analysis as a function of the final state parton multiplicity, which we achieve by varying the number of partons that result from the direct decays of "gluinos." This allows to us compare observables in JHEP08(2016)038 the same regions of phase-space, without needing to correct for the relative effects inherent to different Simplified Models, and yields a concrete and transparent assessment of the performance of a wide variety of variables and their combinations.
In order to achieve a fair comparison between the observables considered below, we use boosted decision trees (BDTs) to optimize the different search cuts applied, thereby achieving maximum signal-to-background discrimination. While BDTs are by now a widely used technique experimentally, they are not as familiar to theorists -we provide a brief technical introduction to them in appendix A. We are advocating for the use of multivariate tools here, not as an in-practice analysis strategy, but instead as a guiding principle to evaluate the relative importance of different multivariable approaches. In particular, BDTs permit the straightforward analysis of both correlated and uncorrelated variables, which in turn allows for the identification of powerful combinations.
In this study, we focus on the observables that have been used by ATLAS and CMS in their multijet plus missing energy analyses. Since our multivariate approach is not meant to be taken as a new search strategy, we neglect the possible signal and background uncertainties. Therefore, our results represent how the different variables would perform in an ideal optimistic case. We compare the discrimination power of different sets of observables to a nearly optimal benchmark analysis, in which we combine all considered variables using a single BDT. This 'aggregate' result is of course unrealistic and should be interpreted as an approximate upper limit on the possible performance.
Studying single variables alone leads to a classification scheme into three classesmissing energy-, energy scale-, or energy structure-type -along with a few hybrids which exhibit characteristics of more than one of the relevant behaviors. In general, we find that most of the information about the final states studied can be captured using multivariable analyses including at least one of each type. For most of the parameter space, combinations of standard variables such as the number of jets, the sum of jet mass (or H T ), and H T lead to near optimal performance. For the simple topologies studied here, more sophisticated variables are usually strongly correlated with one of these well-known variables. This provides justification for the canonical approaches already in place, and helps guide modifications that can be used when designing future searches.
The rest of this paper is organized as follows. Section 2 provides a detailed definition of n-body extended Simplified Models, along with some comments on their theoretical consistency. Section 3 details our approach, the variables that we consider, and the signal and background simulations. Finally, section 4 shows the results of our analysis for both compressed and non-compressed gluino-neutralino topologies. We conclude in section 5. A number of appendices are given which provide some additional details and justify some of the approximations taken in the main text.
2 The n-body extended gluino-neutralino simplified model Simplified Models are a convenient way to organize signals relevant for new physics searches at the LHC. The philosophy is to identify models involving the minimal number of new JHEP08(2016)038 particles and couplings in order to populate regions of signature space. 1 Altering the masses of the relevant states leads to kinematic differences which motivate multiple signal regions that can be designed in order to provide sensitivity across parameter space. This approach has taken hold at both ATLAS and CMS, and most of the new physics results are now cast in terms of Simplified Models.
Given a Simplified Model, there are many ways one can extend it. For example, one can add additional states and couplings which could lead to new production modes, new branching ratios, and/or new kinematic features. One of the key ideas in this paper is a new augmentation of the Simplified Model theory space, which we will refer to as the "n-body" extension. The starting point for n-body extended Simplified Models is a set of states and a Lagrangian. Take the now canonical "Gluino-Neutralino" Simplified Model, which will be the example used throughout this work. The full beyond the Standard Model new particle content is a color octet Majorana fermion g (the gluino) and a singlet Majorana fermion χ (the neutralino). The Lagrangian is given by where L decay is the Lagrangian that specifies the decays of the gluino, and D µ is the appropriate covariant derivative. 2 We will assume R-parity is conserved in this study. This parity along with gauge invariance implies that the gluino must decay via a non-renormalizable operator to some Standard Model states and a neutralino. For example, the gluino could decay via an offshell squark, see figure 1. This yields the standard g → q q χ decay channel, and the decay Lagrangian is given by the four-fermion operator where 1/Λ is the suppression scale for this operator, y is a dimensionless coupling and the superscript refers to the number of final state colored partons that will result from the decay. It is straightforward to complete this operator in the ultraviolet (UV) by introducing a squark, as illustrated in figure 1; the scale Λ is proportional to the squark mass. The quarks q in eq. (2.2) can be light flavor quarks, bottoms, or tops; all three cases have been searched for at the LHC in the jets + H T channel (with b-decay/tags when appropriate) and in channels with leptons for the decays involving tops.
The simple extension proposed in this paper, which will be of use below when exploring the variables utilized for jets + H T searches, is to introduce a larger set of possibilities that allow us to vary the number n of partons in the final state: where G µν is the SU(3) field strength with associated gauge coupling g s , Λ is a dimensionful scale, and y is a dimensionless coupling, see appendix B for a detailed discussion of these operators. Note that in practice we will ignore the angular correlations predicted by the Lorentz structure of these specific operators by assuming a flat matrix element and allowing relativistic phase space to predict the final state distribution in the parent rest frame. Note that here we have chosen the n-body extension at each higher point which adds the maximum number of quarks. It is entirely plausible that all the partons could be gluons, and the operator would just include more powers of G µν . However, such operators would either require the existence of lower order operators -these would provide the dominant gluino decay modes -or involve loop interactions. In the latter case, the additional loop suppression factors would severely restrict the parameter space available for prompt decays. This issue is discussed in more detail in appendix B. It is also possible that the n-body extension could involve Standard Model states besides quarks and gluons. We will leave exploring the implications of these additional directions in model space to future work.
Considering these effective operators allows us to focus our study on final states only, while not including information about possible intermediate particles in the gluino decay chain. Most of our results, however, are expected to also apply to topologies including intermediate on-shell particles, this comparison is made more concrete in appendix C. In particular, although some of our decay Lagrangians might not admit reasonable UV completions (in the sense that many coincidences would be required such that the decay mode of interest would actually dominate), or would lead to long-lived gluinos, they can be used to study generic configurations where the gluino cascade decays through one or more intermediate on-shell particles. Possible UV completions and gluino lifetimes for the operators considered in this study are discussed in appendix B.
For the sake of specificity, it is worth pausing to define some crucial notation. A "parton" is either a quark or gluon that is produced at the matrix element level before JHEP08(2016)038 the parton shower has been applied to a given event. We will distinguish between partons that result from the direct decays of the gluino, the "decay partons," from those that come from higher order initial state radiation (ISR) of gluons and/or quarks off the hard collision process as implemented using the MLM merging procedure [15], the "matrix element partons:" n ≡ number of total decay partons.
m ≡ maximum number of matrix element partons. This terminology will be used extensively in the discussion below.
Finally, we conclude this section by contrasting n-body extended Simplified Models with On-Shell Effective Theories (OSETs) [16]. OSETs are a class of non-Lagrangian based parameterizations for new physics scenarios. They are characterized by the masses of the on-shell particles that are involved in the process of interest, along with additional parameters that determine the size and shape of the production cross section. This framework was introduced as a suggested shift in the approach to interpreting new physics searches at the then-upcoming LHC program. The goal was to move away from relying on "full" models as the jumping off point for designing searches, the classic example being scans in the M 0 versus M 1/2 plane of the Constrained MSSM. Instead, OSETs were invented to provide a signature based signal injection. These ideas ultimately lead to the invention of Simplified Models and their subsequent adoption by both CMS and ATLAS. Additionally, the MARMOSET Monte Carlo program was developed as an approach to solve the "LHC Inverse" problem in a systematic way [16]. In a similar spirit, many related tools have been released that facilitate the straightforward reinterpretation of LHC results [17][18][19].
One of the key points of this approach was to recognize that the leading kinematic aspects of the production cross section are simply due to the behavior of the parton distribution functions in most cases of threshold production. This implies that one could reproduce the majority of observable features across a wide variety of models using only a few parameters. Since the parton luminosities fall with a very high power of the momentum fraction, a Taylor expansion of the cross section is essentially truncated at leading ordera few exceptions were identified, e.g. p-wave production, and the required modifications for their inclusion into the OSET paradigm was provided [16]. For our purposes, their work demonstrates that even though we have chosen to use "gluinos" as our parent particles, the implications of our results are expected to hold in a much wider variety of theories that are dominated by non-zero s-wave production. This is part of the justification for the comparison between the distributions provided for our 2-parton results with stop pair production (that subsequently decay yielding t t χ χ) given in appendix C. In our view, one advantage of using Simplified Models is that they are well-defined Lagrangian based theories, which implies that one can analyze the feasibility of UV completions (see appendix B), and an investigation of higher order perturbative corrections can be performed straightforwardly. Additionally, modern simulation tools can be utilized which allows us to include the impact of merging matrix elements involving different numbers of ISR partons, which are important for the modeling of signals from compressed spectra as discussed below in section 4.2.

JHEP08(2016)038
The OSET framework obviously must additionally involve some mechanism that decays the parent particles. The approach taken in [16] was to ignore spin correlations and decay all unstable new physics states using a flat matrix element integrated against the standard phase space distribution. As already discussed, this is the approach taken in this work as well. While this assumption does not provide a good approximation for all possible models of interest, it is broadly applicable to a wide variety of scenarios and is an obvious choice for the kind of study we are performing here. Certainly exceptions can be found, e.g. if there is an on-shell intermediate state and the invariant mass of the resulting decay products is used as a discriminator, see [16] for additional examples along with simple modifications to move beyond the assumption of phase space only decays. Furthermore, there are cases where the angular dependence of the decay products becomes important, see e.g. [20,21] for a discussion. However, much of this information is washed out once one boosts the decay products to the lab frame and integrates over the possible orientations of the intermediate particles. This accounts for the lack of sensitivity to these effects and further justifies the assumptions made in this work. Clearly OSETs and our n-body extended Simplified Models are complementary approaches, and the work of [16] gives many of the detailed arguments for the broad applicability of the choices made here.

Dissection toolkit
The n-body extended Gluino-Neutralino Simplified Models can be used to systematically explore a range of jets + H T final states. Search strategies for gluinos at CMS and ATLAS predominantly employ inclusive cuts in a phase space of some number of observables, which vary from analysis to analysis [1,3,[22][23][24][25][26][27][28][29][30]. Our goal is to understand the performance sensitivity of these observables for various injected signals, including the impact of correlations that are taken advantage of through different variable combinations. Developing quantitative intuition for which observables can be best used to distinguish between a given signal and background will lead to a better understanding of how to maximize coverage for a given space of signals.
There is an additional practical matter due to the fact that systematic errors are present for every aspect of an analysis, be it from theoretical uncertainties, e.g. due to working at finite order in perturbation theory, or from experimental issues that arise from a variety of sources as one goes from hits in a detector to reconstructed physics objects, e.g. jet energy scale uncertainties, isolation requirements, and so forth. When working with real data, any observable that one wants to include in an analysis must be validated, requiring suitable control regions along with the ability to make reliable computations in order to extrapolate into the signal region. This implies there will always be a trade off between including more information, and maintaining a reasonable level of systematic control. Hence, in practice the number of observables is limited. It is outside the scope of this study to quantify such systematic effects. However, when performing a multi-variable analysis it will always be desirable to optimize the sensitivity of those variables. Given that there are several analyses which use different sets of observables, if/when a putative signal is discovered, understanding the correlations between given observables will be necessary to properly JHEP08(2016)038 characterize the new physics. In this article we will generally consider combinations of up to three variables; the reason for this choice will become apparent as we go through the results.
To develop intuition for a broad set of variables, we will characterize sensitivity using curves which detail the signal efficiency versus background rejection power for a given cut in observable space, often referred to as receiver operating characteristic (ROC) curves. We first start by comparing single observables against each other; however, it is important to also consider multiple observables together since in practice a multi-dimensional space must be probed to achieve maximum sensitivity. Our quantitative analysis relies on BDTs, which are preferred for their convenience and flexibility. It is expected that while the absolute performance of the BDTs is better than that of the coarsely binned multidimensional templates often used in experimental searches. The ROC curves are then an interesting metric for comparison of variables. The effect of binning is not explored in greater detail as it is luminosity and background estimation method dependent and our goal is to derive results independent of these effects. The details of the BDT implementation are given in appendix A.

Observables
Many searches have been designed to access new physics in jets + H T . We choose to study the following suite of observables, which incorporates the predominant variables used on the 8 TeV LHC data.
• H T is defined as the scalar sum of the p T of all jets in the event whose p T > 30 GeV and |η| < 2.5. This variable is particularly powerful for topologies like ours, where a new heavy particle decays to multiple objects.
• H T is defined as the negative vector sum of the transverse momenta of all jets in the event whose p T > 30 GeV and |η| < 5.0. Then H T = H T is the scalar missing energy. For signal events, non-zero H T dominantly results from neutralinos in the final state.
• N j is defined as the number of jets in the event whose p T > 30 GeV and |η| < 2.5.
Jets are clustered using the anti-k T algorithm with a cone size R = 0.4. [31,32], where H T and H T are defined above. This variable discriminates against events where the H T comes from jet mismeasurement.
• M J ≡ m J [33], where m J is the mass of a given anti-k T (R = 1.0) jet with m J > 50 GeV. This variable is predicted to be particularly useful for large multiplicity signals where multiple objects with moderate p T can be clustered into hard fat jets [33][34][35][36][37][38].
• m eff ≡ p j T + H T is the effective mass variable often utilized in jets + H T searches performed by ATLAS, see [28] for a recent example.

JHEP08(2016)038
• Razor [7], is a two-dimensional variable m R , R 2 , used to identify final states with two visible objects j 1 and j 2 and H T . In a so-called "Razor frame", obtained by applying the longitudinal boost from the lab frame, M R is defined as The dimensionless variable R is defined as For cases where an event contains more than two jets, jets are combined into two pseudojets by finding the combination of jets that minimizes the sum of the squared masses, p 2 pseudo,1 + p 2 pseudo,2 .
• M T 2 [5,6], is used to identify signals where a pair-produced particle decays semivisibly. It is defined as where p are the possible transverse momentum assignments for the two invisible objects of an assumed mass, and m T is the usual transverse mass variable. In the case where an event contains more than two jets, the jets are combined into two pseudojets using the same criteria as for the Razor.
We will utilize two versions of this variable below, denoted M T 2 and M CMS T 2 . For both, the input neutralino test mass is taken to be zero. One makes use of the measured mass of the input pseudojets when performing the min/max calculation inherent in M T 2 . The other, M CMS T 2 , explicitly assumes that the input jets are massless and is used by the CMS collaboration in its searches. Note that the pseudojets can have significant masses -this will lead to significant differences in performance, and in fact we will find that these two implementations of the M T 2 have quite different behaviors as discussed in the next section.
• α T [8,30] is used to reject multijet events in which H T arises as a result of jet mismeasurement. It is defined as

JHEP08(2016)038
where H T and H T are defined above. A di-pseudojet system is constructed using the Razor definitions above. 3 Then ∆H T is the scalar p T difference between the two jets.
The number of b jets, n b , is also often used which greatly changes the background composition and suppresses light flavor jet backgrounds. As our study focuses primarily on kinematic properties of the event, we do not consider n b directly; however, we do consider observable performance for different backgrounds separately which effectively separates performance in n b bins, e.g. searches with multiple b-tags will be dominated by t t backgrounds.

Simulation
We simulate both signal and background events using MadGraph5 v1.5.14 [39] using CTEQ6L1 parton distribution functions [40], interfaced with Pythia v6.4 [41] for parton showering and hadronization. Basic detector simulation is performed in Delphes [42], with the default implementation of the CMS detector. For the matched signal and background samples, MadGraph and Pythia are interfaced for MLM matching [43] with the k T shower scheme [44]. All our samples are generated at a center-of-mass energy √ s = 13 TeV.
Signal simulation. We generate the following samples: for n = 2, . . . 8 denotes the total number of "decay partons" while m gives the number of "matrix element partons." When n is even, we require each gluino to decay to n 2 partons+ χ with 100% branching ratio. For n odd, we require each gluino to decay to n±1 2 partons+ H T with 50% branching ratio for each decay mode and keep only the events where the gluinos decay asymmetrically. Note that while this procedure is artificial in the case of identical particle production, it is possible to have odd numbers of final state partons in associated production, i.e, if the production channel involves multiple states, see table 1 above for examples. The quark-gluon content of the final states is determined using the operators shown in eqs. (2.4)-(2.7). In practice, we decay the gluinos in Pythia by simply specifying the final states, allowing the program to choose an appropriate color connection and decay the parent using phase space integrated against a flat decay matrix element.
In the first part of our study, we set the neutralino mass to be m χ = 1 GeV in order for the gluino decay not to be constrained by phase space; this will be referred to as the "uncompressed" signal phase space. We do not require any jets from ISR (m = 0) and consider the following gluino masses mg = {500, 1000, 1500, 2000} GeV. (3.7) The second part of our study considers "compressed" spectra where the LSP mass is 5% less than the gluino mass. Due to the limited phase space for the decays, the gluino decay
products are now expected to be soft. We therefore consider topologies where the gluinos are boosted against one or two ISR jets by generating matched events with m = 0, 1, 2 with the matching scale set to 100 GeV. Specifically, we consider the following gluino and neutralino masses: mg, m χ = 500, 475 , 1000, 950 , 1500, 1425 GeV (3.8) We will only show results for mg, m χ = 1000, 950 TeV below, but will comment on the behavior of other masses.
Background simulation. We generated matched samples of Z(→ ν ν) + jets, t t + jets, and QCD multijet events. We accommodate up to four partons in the final state, which determines the maximum number of jets for a given process. In order to efficiently populate the tails of the background distributions, we split each background into bins of the variable S * T , which is defined as the scalar sum of the p T of all generator level particles, i.e., at parton level. Following the procedure detailed in [45], we modify MadGraph to implement a cut on S * T at generator level and require each bin to satisfy where htmax i , htmin i are the edges of the i-th bin. The final overflow bin has to satisfy N overflow /10 > σ overflow × L, where N overflow is the total number of events to be generated in the overflow bin, σ overflow is the cross section in this bin, and L is the luminosity. Table 2 shows the various background categories, the number of events generated per S * T bin and how many S * T bins were generated. These events are then showered and passed through Delphes independently, before being weighted by the cross section in each bin and combined.

Results of gluino-neutralino study
Motivated primarily by trigger thresholds from LHC Run 1 analyses, we preselect our signal and background samples requiring Preselection: • H T > 500 GeV,

JHEP08(2016)038
• two or more jets above 30 GeV, • no isolated leptons above 20 GeV, where min(∆φ) is the minimum azimuthal angle between the missing energy and the two highest p T jets. This is a standard cut which is used to reduce the fake missing energy from mismeasured jets. 4 The motivation for these preselection thresholds are driven primarily by the current trigger thresholds of CMS and ATLAS analyses which are restricted to have H T in the trigger. Note this is not always the case and alternate triggers are sometimes utilized to reduce the QCD rate, although we leave this for future studies to explore. We then evaluate the performance of the variables described in section 3.1 by computing the background rejection rate for different values of the signal acceptance. In the following, we consider the signal topologies described in section 2 as well as the Z(→ ν ν) + jets, t t, and QCD multijet backgrounds. Although not shown explicitly, we find that the results for W + jets and Z(→ ν ν) + jets backgrounds are qualitatively the same. This is not surprising as the topologies are very similar. To reduce complexity and number of results, we show only the results for Z(→ ν ν) + jets backgrounds.
The distributions of the variables that are the focus of this study are shown in figures 2 and 3, after the application of the preselection cuts. The statistics are equivalent to 10 fb −1 of data at the 13 TeV LHC. The different background contributions are stacked together: Z(→ ν ν) + jets, W + jets, t t + jets, and QCD. A variety of signal distributions are also shown in the figures as empty histograms, both with compressed and uncompressed spectra along with a few choices for the number of partons n in the final state.
We validate these plots and also perform additional comparisons that have a tighter phase space requirement against public material from CMS [25]. We generally find good agreement in the relative normalization of the background contributions and overall agreement within a factor of 2 using the Delphes simulation. The largest discrepancy with ATLAS and CMS public results is the QCD H T distribution resulting from fake missing energy which has a noticeably longer tail in our Delphes samples. This can be understood as resulting from jet resolution mismodeling in our simulation. This implies that the results regarding missing energy variables in QCD could be quantitatively different than what is shown below, although the qualitative conclusions are robust.
Since the relative importance of each of these backgrounds will depend heavily on the selection cuts, we will study the performance of our variables against each background separately. This helps accommodate the application of our results to searches where cuts on variables other than the ones we consider lead to an alternate background composition. For example, in a search that requires two or more b-tagged jets, the dominant background would be t t while for a compressed spectrum search without b-tags, the dominant background would be QCD. Note that we will not provide results for a "total" background for this reason. Furthermore, since we have not included K-factors the relative cross sections JHEP08(2016)038 are not robust, and additionally the mismatch found when validating the H T QCD tails could be exacerbated by such a naive combination.
Finally, notice that figure 2 does not include the α T distribution. We find that this variable is highly correlated with min(∆φ) and H T , and thus loses much of its discriminatory power after applying the pre-selection cuts. This effect be inferred from figure 4, where after applying the H T and min(∆φ) cuts, the QCD distribution in α T looks much more signal-like. By considering all three variables, the contribution to the signal region of QCD can be greatly reduced as designed. Since our interest here is to compare observables in the same region of phase space, we have chosen to use a pre-selection which roughly conforms to the ATLAS and CMS H T /H T triggers. We leave the study of α T outside of the preselection region to future work.
In the following, we consider the background rejection power for a given set of observables as a function of the number of matrix element partons in the final state and for each of the different background processes. We fix the signal efficiency with respect to the preselection cuts to ε sig = 10%, and compute the background rejection power 1/ε bkg , again JHEP08(2016)038 where the background efficiency is computed with respect to the pre-selection cuts. A signal efficiency of 10% is typical of most searches -we checked additional signal efficiency points ε sig 25% and find that the results do not change qualitatively. For completeness, figure 5 shows the absolute selection efficiency after preselection so that it is possible to infer the implications of our results in terms of limits on signal production cross section × branching ratio. To get the absolute signal efficiency of the final cuts, multiply the pre-selection efficiency by 10%.

Massless neutralino limit
This section applies our methodology to the study of n-body signatures with a massless neutralino. Backgrounds are considered separately to isolate the essential kinematic features of each signal and background. We begin with a study of the individual variables of interest, followed by judiciously chosen combinations of two and three observables.
One variable at a time. We first evaluate the performance of each observable as a function of the total number of decay partons in the final state. The results for a 1.5 TeV gluino and for the different backgrounds are shown in figure 6. Note that we consider both Razor variables M R and R 2 separately. We also define an aggregate analysis which feeds all the variables given above to the BDT. We regard this as the "optimal" background rejection rate that is possible, and show it in each plot as a reference. Based on the behavior of these variables versus number of partons, we can already learn many valuable lessons and define the following variable categories: 5

JHEP08(2016)038
• H T -type : the missing energy variables are sensitive to the properties of the invisible states, e.g. how many neutralinos in the event, what is their mass, etc.; • E scale-type : the energy scale variables H T , M T 2 , M R , m eff are sensitive to the overall energy scale of the event, e.g. the gluino mass; • E struc-type : the energy structure variable N j : is sensitive to the structure of the visible energy, e.g. how many partons are generated in the decay; • Hybrid-type : the hybrid variables Razor R 2 , H T / √ H T , M J exhibit characteristics from multiple types depending on the number of decay partons in the event. 6 The performance of some of the variables obey trends that are independent of the background.
H T -type variables perform best at low number of partons since, for low multiplicity final states, each individual final state jet or particle is expected to have a large p T . As the number of visible objects and visible energy increases, the total energy has to be split between more and more final states.
H T -type variables therefore become progressively supplanted by E scale-type and E struc-type variables.
The relative importance of these different types becomes apparent when we consider performance across different backgrounds.
H T -type variables are more important in t t and QCD events, while in Z(→ ν ν) + jets they are less effective due to the large recoil of invisible energy already present in the background. Therefore, in Z(→ ν ν) + jets, E scale-type and E struc-type variables perform better. E struc-type variables tend to be more powerful against Z(→ ν ν) + jets compared to t t. This can be understood since in t t events, energy structure is a natural consequence of the multiple scales in the problem. It is interesting to note that for low n partons, the H T -type variables perform very well, and in the particular case of QCD and t t are near optimal.
We also provide results as a function of the gluino mass as shown in figure 7 for the different backgrounds under consideration. We find that for a low number of partons, the performance of H T -type variables improves quickly with mass, though this trend is mitigated as n partons increases. E struc-type variables do not have a strong dependence on the gluino mass as they are more sensitive to the structure of the energy. Focusing on the individual variables we can infer the following lessons: • For uncompressed spectra, M CMS T 2 tends to perform better than H T . This is unsurprising since the variable is optimized for a massless neutralino.
• H T and M T 2 perform very similarly while M R tends to perform worse in terms of background rejection; this is expected as M R is highly complementary to R 2 and will be examined further below. Meanwhile, m eff tends to do the same or slightly better than H T at low n partons where performance gains are largest for H T . However, m eff is not superior to H T itself. 6 The hybrid variables can be categorized as Razor

JHEP08(2016)038
• R 2 tends to perform like a H T -type variable at low n partons and like a E scaletype variable a high n partons although the performance overall is worse; again this is expected since the real power of Razor comes from the exploiting R 2 and M R together.
• H T / √ H T tends to perform like a H T -type variable at low n partons and like a E scaletype variable at high n partons; though it never performs better than both types.
• M J tends to perform like a E scale-type and E struc-type variable becoming more E struc-type -like at high n partons. It typically does better than both H T and N j except at the highest n partons for lower gluino masses.
Studying each variable separately shows that unsurprisingly no single variable maximizes the performance throughout all of the phase space considered here. Although the performance of observables like H T and M CMS T 2 has a weak dependence on n, variables such as H T or N j exhibit much better discriminating power, but only for some categories of signals.
Inclusive searches aimed at a large variety of signatures therefore consider a minimal set of discriminating variables that cover complementary regions of the parameter space. Building such a set requires understanding the correlations between the different variables, which cannot be captured by the previous study. In the following sections, we study the discriminating power of algorithms that take into account more than one variable.
Correlations between variables. At this point, we have explored the performance of the variables individually, and used the results to classify them into three basic categories (plus hybrid). Yet, no single variable was a clear winner for the full phase space explored for all values of n. Therefore, it is interesting to explore how complicated of an approach is required to asymptote to the "ideal" aggregate result for all signals and backgrounds. This section is devoted to exploring combinations of variables that will lead to an improved discrimination power.
In order to organizing the huge number of possibilities, we start by taking one variable from each category to generate two-or three-variable combinations. By analyzing the pairwise discriminating power, we can understand which variables are least correlated thereby leading to the best complementarity when designing a search. The results of these explorations are given in figure 8, where we show 1/ bkg as a function of n partons for a set of multivariable combinations. In order to reduce the many possible combinations, we show the selection of combinations with the most striking features. As was done above, performance is separated by Z(→ ν ν) + jets (top), t t (middle), and QCD (bottom) backgrounds. The left and right columns of the figure show different variable combinations in an attempt to minimize clutter. The single variables are given as dotted lines, the twovariable combinations as solid lines and the three-variable combinations as dashed. The aggregate, denoting a BDT combination of all variables, is shown in black long-dashed. 7 7 We note that for some comparisons, particularly against QCD, there are slight inconsistencies, e.g. adding a variable slightly decreases discrimination power. This is the result of non-zero statistical errors (which can be exacerbated by the presence of rare high weight background events) along with systematic errors associated with slight over/under training of the BDT. We do not attempt to quantify these effects, but caution the reader to keep these issues in mind so as to not over-interpret these results. The qualitative behavior shown in these plots is robust.

JHEP08(2016)038
From the figures, it is clear that no two-variable combination is optimal across all regions of phase space and all backgrounds. However, the doublet M CMS T 2 , M J does out perform all other two-variable combinations essentially across the full range -this can be understood by realizing that M J is a hybrid of E scale-type and E struc-type variables. This combination is only deficient at the very highest n partons and perhaps in the Z(→ ν ν) + jets case where the interplay between H T and N j is not fully captured in M J . For the other two-variable combinations we find the following general tends: • H T -/ E struc-type : deficient at high n partons where N j is more important; • H T -/ E scale-type : deficient at medium to high n partons where visible energy becomes more important; • E scale-/E struc-type : deficient at low n partons where missing energy variables are most dominant.
Moving on to the three-variable combinations, we see that adding N j to M CMS T 2 , M J provides nearly optimal performance for the full range of n-partons shown here. There are additional three-variable combinations which are near optimal over the full uncompressed phase space in n partons and for different backgrounds. The near optimal three-variable combinations all involve one of each type of variable. Because M CMS T 2 is the best performing of the H T -type variables for the uncompressed spectra (see figure 6), we find that triplets which include it do the best: M CMS T 2 , H T or M J , N j . Another conclusion one can draw from figure 8 is a compelling confirmation that M J outperforms H T , especially for high multiplicity final states. Ignoring correlations with other observables, these two variables are approximately proportional to each other [33]: where κ √ α s when the jet mass is the result of the QCD parton shower, as compared to κ ∼ 1 for events characterized by high multiplicity signals which manifest accidental substructure. For events in a fixed H T bin, M J signal M J background , thereby providing enhanced signal discrimination. The results of figures 6 and 8 clearly demonstrate this effect showing that M J does as well or better than H T both alone and in combinations for the three dominant backgrounds considered here. The case to use M J is especially strong since it already been demonstrated to be a useful variable in the complex environment of the LHC [37,38], and is also amenable to analytic study, e.g. [48][49][50][51]. Ideally, M J would replace H T for the wide class of jets + H T searches whose phase space is covered by n-body Simplified Models, and can be done so while maintaining the core strategy implemented by many existing approaches for these beyond the Standard Model searches. However, it is worth acknowledging that once one goes to multi-dimensional variables the improvement in replacing H T by M J is not as dramatic and would require a careful job of finding a new class of control regions, along with a reassessment of systematic errors.
Next, we turn our focus to the m eff results. Recall in figure 7, we found that m eff turned out to be slightly better performing than H T , particularly at low n partons. However, when  we add additional variables to these E scale-type variables in order to find the optimal triplet, the final performance is extremely similar. For example, the triplets m eff , H T , N j and H T , H T , N j have essentially the same discriminating power. To end this section, we will comment on the Razor variables. The combination M R , R 2 does significantly better than each variable individually. This implies that they are highly complementary, as expected by the design of the variable and now seen explicitly. This is also illustrated in figure 9 where the 2-D profile distributions of R 2 and M R are shown for the QCD background (right panel) and an uncompressed 1 TeV gluino signal with n = 4 (left panel). The optimal cut is non-rectangular in the Razor 2D plane such that a 1D projection of either R 2 or M R is not expected to be a particularly good discriminator. However, as can be seen from figure 8, the two-variable combination of M R , R 2 behaves like a H T -type missing energy-like variable which decreases in performance as a function of n parton, although Razor does display less degradation than a H T -type variable. This motivates combinations with either N j or H T to see if it would then perform as well as other three-variable choices. The better combination appears to be M R , R 2 , N jets , which is fairly close to optimal although it does not perform as well as M CMS T 2 , H T or M J , N j .

Degenerate gluino-neutralino limit
We now consider compressed topologies with a 5% splitting between the gluino and neutralino masses. With such a small splitting, if the gluinos are not boosted against additional objects, the final state jets and H T are expected to be soft and difficult to distinguish from the SM background. We therefore include topologies where the pair-produced gluinos are boosted against at least one ISR jet by producing matched signal samples. We consider the background rejection rate as a function of the number of final state partons n, for a signal efficiency of 10%, a 1 TeV gluino and a 950 GeV LSP. Figure 10 shows the individual performance of each of the variables listed in section 3.1. Unlike in the uncompressed scenario, the discriminating power of the different observables is practically independent of the number of visible gluino decay products. This result is explained by the large neutralino mass, which causes most of the gluino momentum to manifest as missing energy. Conversely, the light jets coming from the gluino decay get most of their momentum from the gluino-neutralino mass splitting and will therefore be too soft to pass the jet selec-JHEP08(2016)038 tion criteria. In this compressed scenario, all the signals can then be reduced to a single topology with a small number of hard partons accompanied by H T . Looking more closely at the individual variables in figure 10, there are some striking differences with respect to the uncompressed spectra. The missing energy-like H T -type variables all behave similarly when discriminating against QCD, but H T is more powerful than M CMS T 2 for both Z(→ ν ν) + jets and t t backgrounds. However, it is true that the most important single variables are still of H T -type or a related hybrid. For example, H T / √ H T is quite powerful against t t and R 2 distinguishes itself from the other E scale-type and E struc-type variables. It is also important to note that the H T -type variables are not very close to optimal in the case of the QCD and Z(→ ν ν) + jets backgrounds. This means that additional information from the visible energy and its structure can be utilized to additionally distinguish signal from background.
We have also checked the gluino mass dependence of the single variables and the performance is relatively independent of these variations, in contrast with the uncompressed spectra case. Specifically, the discrimination power changes by a factor of ∼ 2 when changing the mass from 500-1500 GeV while in the uncompressed spectra case, the discrimination power changes by more than 2 orders of magnitude (see figure 7 above). This is somewhat expected as the visible energy resulting from the decay of the compressed system does not drastically change as a function of mass.
As we begin to look at multi-variable combinations as shown in figure 11, we come to the same general conclusions as in the uncompressed spectra case. No two-variable combination achieves near optimal performance across the range n partons and various backgrounds. However, it is possible to essentially maximize sensitivity using three-variable combinations. Among these, H T , H T , N j and m eff , H T , N j stand out, and in particular do better than combinations involving M CMS T 2 . It is interesting that in the case of QCD, m eff yields additional discrimination power over H T within the top-performing triplets. Combinations including Razor also achieve a good performance. However, the best performing triplet involving Razor depends slightly on the background under consideration: The main lessons for the compressed study is similar to those learned in the uncompressed case. Optimal performance for all three backgrounds considered is achieved by combining variables from each of the three classes. For some backgrounds, two-variable combinations are nearly optimal but if one is interested in near-ideal discrimination across all backgrounds three variables are required.

Conclusions
The n-body extension of Simplified Models provides a class of signal injections with which one can model a wide range of possible final-state phase-space within a unified phenomenological framework. This has many applications in collider searches for beyond the Standard Model physics, and is particularly well suited for seeking out final-state topologies which require additional optimization beyond the searches that are currently being performed. The focus in this work was to utilize this tool in order to assess the discriminating power

JHEP08(2016)038
for many of the ever-growing number of variables used for searches in the classic jets + missing energy final state. This was an ideal forum to explore the utility of the n-body extended approach since the only observables in these searches stem from a single class of object: jets of visible hadronic energy.
A large number of variables were considered: H T , As was expected and shown in figure 6, no variable can do the job alone. A winning strategy derives from placing cuts on maximally uncorrelated observables in order to generate signal regions where background events are very rare. Boosted Decision Trees were used in order to access the strength of correlations between observables. Once a choice of variables was made, the BDT was trained to distinguish signal from background using only these inputs. Then the resulting machine takes events and generates an output which yields the optimal background rejection efficiency as a function of a target signal efficiency. The result is a quantitative assessment of performance.
Analyzing single variables alone led us to a classification scheme based on their trends as a function of the number of final state partons, assuming an uncompressed spectrum: "missing energy"-type, "energy scale"-type, and "energy-structure"-type. This scheme grouped the variables based on the region of phase space where each provides the best discrimination against the backgrounds. Not all of the observed behaviors were intuitive. For example, M CMS T 2 behaves like missing energy, although it is slightly better optimized for uncompressed signals as compared to compressed ones. For uncompressed signals, M J and m eff tend to be slightly better performing than H T . However, the general lesson is that an ideal search strategy requires at least one variable from each class. Differences in both the power and behavior of combinations as a function of n partons or mg are reduced when analyzing triplets that include variables of each type. As expected, the combination of classic variable types H T , H T , N j performs very well in most cases. However in some instances it is not fully optimal, and other triplets should be considered when performing searches in the future, e.g. M CMS T 2 , H T or M J , N j . While this study reveals many properties of the search variables and their correlations for a large range of jets + H T signals, we have not attempted to realistically include sources of error. In particular, the use of BDTs obscures the exact nature of the "signal region," thereby making it difficult to assess the quality of agreement between the Monte Carlo predictions and the measured backgrounds in a control region. This is a standard issue with using machine learning tools, and we are not advocating to replace the traditional cut-and-count approach. Instead, our point of view is that one can use this technology to quantitatively evaluate the performance of variables with a particular emphasis on their correlations.
There are a variety of future directions which will be interesting to explore. The nbody framework could be extended to other searches for supersymmetry, such as those for R-parity violation, as well as more directed searches involving heavy flavor tags and also electroweak production. This framework is also clearly useful for non-SUSY new physics searches as well. Along with extending the framework in theory space, it would be interesting to realistically quantify the effects of systematic and other errors on our conclusions.

JHEP08(2016)038
With the LHC on the cusp of delivering up to 100 fb −1 of data in the next few years, understanding the optimal ways to search for new physics has never been more important. We have provided a new framework for organizing and studying the collider phenomenology of a variety of beyond the Standard Model scenarios, which can be utilized to more deeply understand the breadth of results from the LHC, whatever they may be. And once a hint of new physics begins to emerge, n-body extended Simplified Models will be very useful as a signal injection. This will allow us to quantitatively unravel the properties of whatever decays are generating the anomalous signal, ultimately yielding new insights into the nature of our Universe.

A Review of boosted decision trees
Decision trees are a method of separating a parameter space into signal and background (or noise-like) regions. They have been used in particle physics for over a decade (early examples include particle identification at MiniBOONE [52,53] and the search for single top-quark production at the Tevatron [54]). Decision trees operate through a recursive partitioning of parameter space into signal and background-like regions which are determined through the use of training datasets. In that sense they represent the optimal cut-and-count discrimination between signal and background that can be performed.
Informally, a single decision tree can be imagined as a cut-flow through a series of nodes. Each node corresponds to a cut in a particular variable, with events being partitioned into different bins as they progress further down the tree. The end (or terminal) nodes of the tree correspond to signal or background-like regions, depending on whether they contain a majority of signal or background events from the training data-set used. However, single trees can be unstable, in the sense that the cuts chosen at each node are sensitive to the details of the training dataset. A more powerful a approach derives from JHEP08(2016)038 the use of a multiplicity of trees -effectively a vote by committee. Such a collection is a called a boosted decision tree. This also has the advantage that events which were misclassified by the original single tree can now be up weighted, leading to greater attention from succeeding trees. Essentially, it this series of trees collectively act to minimize a predetermined loss function. An example is a least-squares loss function for fitting an unknown multidimensional function, although often other functions can be chosen which lead to greater stability against outlying points. We now describe this more formally, before providing an illustrative example using the razor variable. A good introduction to machine learning techniques is [55] (which we have adapted the following from), while the original ideas of boosted decision trees can be found in [56,57].
Formalism. A tree is defined as a series of nodes, each of which corresponds to a cut on an observable calculated from the input data -in our case these correspond to the event observables such as M J , H T , M T 2 and so forth. More formally, a tree partitions the parameter space into a set of disjoint rectangular regions R j , which are represented by the final (terminal) nodes of the tree. Each region is associated with a constant γ j , which indicates whether that node or region of parameter space is considered signal-like or background-like. For classification into two classes these are usually taken to be {−1, 1}. Then any event which falls into the region R j is assigned value γ j . If we define the indicator function I x ∈ R j to be 1 if x ∈ R j and 0 otherwise, we can represent the a decision tree T by where the parameters of the tree are Θ = R j , γ j , and the number of regions J is a meta-parameter which is usually 4 ≤ J ≤ 8. The numerical optimization problem which must then be solved is to find the regions R J and constants γ J . These parameters are set by requiring that they minimize a loss function L over a large set of training data whose properties are already known (that is to say, whether a given event is signal or background), so that the chosen parameters arê This is a difficult problem in numerical optimization, and so approximate solutions are usually used to find the regions R j and γ j , which we will describe below. The output of a single decision tree can be quite sensitive to minor changes in the training sample. Furthermore, since the decisions at each node are only locally optimal, there is no guarantee that the globally optimal decision tree is obtained in this way. It is also possible to overfit the training data using complex trees. To avoid these issues we use boosted decision trees in our study.
Boosting starts with a group of individual 'weak learners' such as single trees whose output may be only slighter better than random guessing. Then by weighting their outputs, a much better 'strong learner' can be constructed, whose output is very well correlated with JHEP08(2016)038 the true classification of any unknown event. In our work we use the gradient boosting algorithm as implemented in the TMVA class within ROOT [58]. A boosted tree model can thus be represented as a sum of trees, We do not attempt to solve for all trees simultaneously, but rather do so in forward stagewise manner: i.e., we solve for one tree at a time, where each tree is fit to the residual of the training data and the sum of all previous trees. In other words, the parameters R jm and γ jm of the m th tree are determined by minimizinĝ where the sum is over the elements in the training dataset and (m − 1) th boosted model. For example, if we wished to fit a sum of trees to a function using a squared-error loss function, the m th tree would be the tree that best predicts the residuals y i − f m−1 (x i ), and the constant γ jm would be given by the mean of the residuals in each region R jm . Such trees can be constructed relatively quickly. For more general loss differentiable loss functions simple fast algorithms do not exist for solving eq. (A.4).
The forward stage-wise boosting strategy outlined above is very computationally greedy: it seeks to maximally minimize eq. (A.4) at each step of the process. To do this in practice, we calculate coefficients of the negative of the gradient of the loss function L at for each stage m: The approximate solution to eq. (A.4) is then given by fitting a tree to the negative gradients using a squared-error loss functioñ As noted above, these trees can be constructed quickly. On the other hand, the regions R jm which result from the above process are not necessarily the same as those R jm which solve the exact problem in eq. (A.4). Of course, the forward stage-wise boosting procedure we employ is itself also approximation to the exact result (if it could be constructed). For our classification problem of discriminating between signal from the n-body extended Simplified Models and SM backgrounds we use the binomial log-likelihood loss which is also implemented in the TMVA package in ROOT. This is known to be more robust than the common exponential loss function L(y, f (x)) = e −F (x) Y , since misclassified points and outliers effectively are assigned a linear penalty, as opposed to an exponential one.

JHEP08(2016)038
Example: razor variables. We now show a simple application of the BDTs to differentiate gluino events from QCD backgrounds using the razor variables M R and R 2 only. As shown in figure 9 and mentioned in section 4.1, combining these two variables together provides a much higher discriminating power than using either of them individually. Additionally, simple rectangular cuts in the two-dimensional parameter space would overlook some subtle features of the signal and background distributions such as the increase in the typical value of R 2 at low M R . For this simple study, we use the scikit-learn module from Python 2.7.9. We consider a training sample composed of an event mix of 10 5 unweighted QCD and g → j + χ events. We use gradient boosting to generate different numbers of decision trees with maximal depth J = 4. In order to limit overfitting, we multiply T (x i ; Θ m ) in eq. (A.4) by a "learning rate" coefficient α = 0.1. Introducing a small learning rate is a standard regularization procedure when using gradient boosting. Here, we use the Huber loss function, that is a combination of a squared-error and a least absolute deviation loss function. When applied to a given (M R , R 2 ) doublet, the final classifier will output a number 0 ≤ r ≤ 1 that is close to 0 if the event is background-like and close to 1 if the event is signal-like. Figure 12 shows color plots of r in the (M R , R 2 ) space for classifiers involving n = 5, 10, 50 and 100 trees. In regions with a large number of training events, comparing these r plots to the density profiles in figure 9 demonstrates that the BDT successfully captures the characteristic features of the signal and the background. In particular, when the number of trees is large, one can clearly see the red region departing from the y-axis for R 2 0.15. A more quantitative estimate of the performance of the BDTs can be obtained by applying the n = 100 classifier to a separate test sample of 4 × 10 5 evenly mixed signal and background events. Tagging the events with r < 0.5 as background and the events with r > 0.5 as signal, our classifier correctly identifies 95% of the background events and 80% of the signal events. A better accuracy could be obtained by increasing the size of the training sample. Also note that the partial dependence plot associated to the n = 100 classifier exhibits strange features that do not appear in the profiles shown on figure 9. With such a large numbers of trees, the classifier starts overfitting the training samples and loses part of its predictive power. This is a concrete demonstration of the issues involved in optimizing the BDT parameters discussed previously.

B Consistency of n-body decay operators
It is reasonable to wonder if the n-body operators in eqs. (2.4)-(2.7) can be realized in any complete new physics scenario. There are two issues that will be discussed: are there any regions of parameter space where a given L i would model the dominant decay mode of the gluino, and what would be the corresponding lifetime of the gluino. We begin with the L 1 and L 2 operators which can yield the dominant decay modes in models that include one extra heavy scalar particle, as shown in figure 13. A well-known example of such model is splitSUSY [59][60][61], where the squarks are decoupled. The relative importance of L 1 ∼ 1/Λ compared to L 2 ∼ 1/Λ 2 depends on the mass scale of the heavy particle. For splitSUSY with a 1 TeV gluino, L 1 starts dominating over L 2 only for squark masses heavier than 10 9 GeV [62]. Note that in this region of parameter space, the gluino tends to be JHEP08(2016)038  sufficiently long lived to warrant a different class of search strategy. 8 This example already demonstrates the point of this appendix -the L i operator should not be interpreted as a physical gluino decay mode, but rather as a topology representative of (i + 1)-body decays in general. For example, our results for this operator can be used as a qualitative proxy with which to analyze signals of the form q → q χ as shown in appendix C.

JHEP08(2016)038
Similar reasoning can be applied to the L 3 , L 4 and L 5 operators. However, it is important to note that concrete UV completions might involve taking couplings that do not obey the SUSY relations, even though we continue to call them gluons and squarks for convenience. These operators have a relatively high mass dimension, and most of the time will either give subdominant contributions to the gluino decays or will be associated with long-lived gluinos. Operators involving gluons, in particular, will either be loop-suppressed, as shown in figure 14, or will imply the existence of lower order operators without the presence of the gluon. For example, if the gluon in the L 3 operator is generated at treelevel, the gluino decays will be dominated by L 2 . Although operators other than L 2 will not describe dominant SUSY processes, they can be used to study either exotic processes with similar topologies or long decay chains involving intermediate on-shell states. As shown in figure 14, for example, the L 4 operator can have UV completions analogous to gluino cascade decays through one or more heavy squarks and/or electroweakinos. The operators studied here therefore do not apply exclusively to direct gluino decays but also to the general classes of models targeted by multijet plus / E T searches such as [3,24,25,27,28,[63][64][65][66]. Our study can also be applied to cascade decays involving top quarks or gauge bosons, such as the ones investigated in [1,67]. A more in-depth discussion of the effects of intermediate on-shell states is shown in appendix C.
Lifetime estimates. High dimensional operators such as L 4 or L 5 or operators involving gluons in the final state -and therefore loop interactions -will be associated with highly suppressed gluino decay rates. If no other decay channel is open, the gluinos could therefore be long-lived, and the search strategies studied here will be less relevant. Note however that most of our results for these operators will remain applicable to signals involving long decay chains through on-shell particles. We provide a naive estimate of the lifetime associated with each L i . The effective operator corresponding tog → k (q + q) +χ (where k counts the number of quark-anti-quark pairs), takes the form with a corresponding decay width .

(B.2)
Again, we emphasize that the y couplings might not be related to Standard Model couplings by SUSY. For operators that involve an additional gluon in the decay, L = y 2 k g s 16 π 2 Λ 3 k+1 (q q) k G µν g σ µν χ,  Figure 14. Top: illustrative Feynman diagrams corresponding to a UV completion for the fourbody decay of the gluino at loop level (the L 3 operator). Gluino decay modes involving gluons in the final state can be generated at tree-level, however the corresponding operators imply the existence of lower-order effective operators (here L 2 ). Center and Bottom: illustrative Feynman diagrams corresponding to the UV completions for the five-body decay of the gluino. The process for a gluino decay to four quarks and a neutralino (L 4 operator, center) can be used to study gluino cascade decays through new particles. The operator involving two gluons at loop-level (bottom) will be highly suppressed. Note that, although we use the SUSY notation for the names of the new particles, the relevant couplings are not necessarily described in standard SUSY models.
where we have assumed that the gluino interacts with the gluon at one-loop; as discussed above the tree-level operators with gluons will never dominate over the lower point decay without this extra state. Note that the lifetime of a particle in its rest frame is given by Even with order one couplings and for heavy gluinos, L 5 will lead to macroscopic decay lengths due to the large number of final states along with the loop suppression associated with the gluon emission. Again, due to the gluon loop factor, L 3 will be more suppressed than L 4 and will be associated with gluino decay lengths longer than a millimeter for couplings less than one or gluinos lighter than a PeV. For gluino masses of a TeV or larger, L 4 will lead to tracks shorter than a millimeter for couplings of order one, but the lifetime of the gluino will strongly increase for smaller couplings. Finally, the operators L 1 and L 2 will lead to a short-lived gluino even for weak couplings.

C Mapping onto n-body decays
In this appendix we quantify the effects of intermediate on-shell states on the kinematic distributions of the variables studied throughout this paper. We first consider a two-body decay of the form where j is either a quark or a gluon and assumed to be massless. We denote the fourmomentum of A as (E, p). There are two configurations in which the momentum of X in the lab frame can be simply estimated: • The Wide (W) scenario, with M A M X : in this case, the energy of A is evenly split between the jet and X.
• The Compressed (C) scenario, with M A − M X M A : in this case, the jet p T will be negligible and the momentum of X will be similar to the momentum of A.
Consider the decay g → q q χ, through the diagram shown on the right-hand side of figure 13. If the squark is on-shell, this decay can be decomposed into two two-body decays, with (A, X) being (g,q) in the first step and (q,χ) in the second. Studying each of these successive two-body decays within both the C and W  Table 3. Estimates for H T and H T for g → q q χ for different possible squark and neutralino mass configurations.
As shown in this table, the n-body formalism that we have adopted throughout this paper accurately describes the kinematics when one set of particles in the cascade is compressed. For contrast, when the neutralino is much lighter than the gluino, some discrepancies can appear from the presence of on-shell intermediate particles when comparing these decays to the n = 4 case. However, the typical values of H T and H T for the WC and CW scenarios -for which the highest disagreement is observed -are similar to the values associated with the gluino effective two-body decay operator L 1 when the neutralino is light. For these WC and CW mass hierarchies, the process dominating the kinematics is indeed the gluino two-body decay to a quark and a squark. Figure 15 illustrates this effect using the decay of a heavy stop to a top quark and a neutralino as an example. Although the top quark decays to three jets, the H T and H T distributions for the stop decay are very different from the ones corresponding to a gluino four-body decay, which is the naive guess for the relevant n-body decay by simply counting the number of decay partons. These distributions are however similar to the ones obtained for a gluino two-body decay, as can be inferred from the figure.
A similar reasoning applies to more complicated processes such as g → t t χ or g → q q Z χ. For a TeV-scale gluino, the masses of the top quark and the Z can be neglected to good approximation and the two processes can be compared to g → j j χ and to g → j j j χ respectively. The comparisons are shown in figure 16 for g → t t χ and in figure 17 for g → q q Z χ. In both cases, the H T and H T distributions for the on-shell processes look similar to the ones corresponding to the appropriate choice of n-body operators.
This demonstrates that although some of the higher-dimensional operators considered in this study will not lead to prompt gluino decays, most of our results can still be applied to models involving on-shell intermediate states. In particular, as shown in this section, a given cascade decay scenario can be mapped to a given effective decay operator with similar kinematic distributions for the "missing energy" and "energy scale"-type variables. This mapping, however, cannot be solely determined by the number of final state objects and will depend on the mass hierarchy between the parent particle and the intermediate states. It is also important to note that, as shown in Figs 15, 16 and 17, "energy structure" JHEP08(2016)038 H T ( top right) and N j (bottom) distributions for g → j + χ (dashed red), g → j + χ (dot-dashed green) and t → t + χ (solid blue). We have only included hadronic top decays. Here, the stop mass is 1.5 TeV and the neutralino mass is 1 GeV.  H T ( top right) and N j (bottom) distributions forg → 3j +χ (dashed red) andg → q +q + Z +χ (solid blue). We have considered only hadronic Z decays. variables such as N j will remain by definition highly sensitive to the number of final states and to the quark or gluon nature of the jets, see appendix D. However, this difference will not impact the conclusions drawn in the main body of the text.

D Comparing quark and gluon final states
This appendix addresses the differences from having quarks versus gluons as the final state decay partons. Figure 18 shows the H T , H T and N j distributions for gluino three-and five-body decays with either the mix of quarks and gluons used in the n-body operators given in eqs. (2.4)-(2.7) or final states with only gluons. As can be seen in this figure, the only variable that discriminates between quarks and gluons is the number of jets in the JHEP08(2016)038 event, since gluons are associated with higher rates for hard splittings due to the larger Casimir. Since the angles between these jets due to showering and the initial parton tend to be collinear, the discrepancy between the N j distributions should decrease for larger jet radius. The other kinematic variables that we consider only minimally distinguish between quarks and gluons.

E Comparing shapes of various signals
In this appendix we compare kinematics of the n-body Simplified Models to show trends as a function of n-partons and the gluino mass. We show only three variables, one of each type presented in section 4.1, H T , H T , and N j . We have checked that the trends for variables within the same classification are qualitatively similar.   H T (left), H T (middle), and N j (right) distributions for various gluino mass hypotheses. The gluinos are always restricted to decay to either one parton and a massless neutralino (solid) or four partons and a massless neutralino (dashed). Three different mass hypotheses are shown: mg=500 GeV (red), 1000 GeV (green), and 1500 GeV (blue). kinematic distribution for a 1 TeV gluino decaying to different numbers of partons for an uncompressed and compressed mass spectrum respectively. We find that the similarity observed in the signal distributions for the compressed mass spectrum, see in figure 20, is a generic feature of compressed signals. Figure 21 compares kinematic distributions for various gluino mass hypotheses with the gluino restricted to decay to either one parton and a massless neutralino or four partons and a massless neutralino.
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.