Six Top Messages of New Physics at the LHC

Six top signatures provide a novel probe of new physics. We discuss production of six top quarks as the decay products of a pair of top partners in the setting of a composite Higgs model, and argue that the six top signal may generically provide one of the first final states to show a discrepancy. We construct an analysis based on quantities such as $H_T$ and the numbers of jets which are tagged as boosted tops, $W$s, or containing $b$-tags, and show that the LHC with 3~ab$^{-1}$ can discover top partners with masses up to around 2.5 TeV in the six top signature.


Introduction
The Large Hadron Collider (LHC), with its unparalleled energy and high luminosity, will definitively explore the physics at the TeV scale. The discovery of Higgs boson at the LHC is a triumph of the Standard Model (SM), however, the Naturalness problem associated with the self-energy of the Higgs particle argues that it is likely that there is new physics around the TeV scale [1][2][3][4][5][6][7]. Various new physics models addressing this problem have been proposed, such as Supersymmetry (SUSY), little Higgs, Composite Higgs etc. Deep investigation of the naturalness problem may reveal new details underlying the physics of the electroweak symmetry breaking (EWSB) and could also provide the evidence of new physics.
Beside the Higgs, the top quark is central to arguments concerning naturalness, since it has the largest mass of the SM fermions, and hence the largest coupling to the Higgs. For this reason, partners of the top quark are ubiquitous in models of new physics at the weak scale, and their production often results in multi-top signatures at the LHC, leading to many interesting phenomena. The four top final state has been previously investigated [8][9][10][11][12] and is starting to be visible in experimental analysis [13,14]. However, even more tops in the final state naturally occur under simple assumptions and provides a spectacular collider signature and a complementary method to search for new physics.
In this paper, we systematically investigate the phenomenology of six-top final states in a simplified model inspired by a composite Higgs scenario. We estimate the sensitivity of the LHC to six-top final states for channels with different number of charged leptons, and the upper limit on the top partner branch ratio into ttt are obtained in the case that no signal is observed with 3 ab −1 of integrated luminosity. We also discuss the extraction of the top partner mass. It should be stressed that six-top final states occur in many other models of new physics, and our general analysis framework can be applied to those cases with simple adjustments.
The paper is organized as follows. In Section 2, we introduce a simplified composite Higgs model which inspires our analysis and in Section 3 discuss general features of the six top signature and current LHC constraints. The analysis strategy of LHC data are described in Section 4. We reserve Section 5 for our conclusions.

Six Tops from a General Composite Higgs Model
Generally, composite Higgs models with a simple UV completion (such as SU (4)/Sp(4) [15][16][17] or the isomorphic coset space SO(6)/SO(5) [18][19][20] and SU (4)×SU (4)/SU (4) [21][22][23]), contain a singlet scalar pseudo-Nambu-Goldstone boson (pNGB) field s corresponding to a broken U (1) s global symmetry. This pNGB can decay into di-bosons through Wess-Zumino-Witten (WZW) terms via fermion loops. In theories with partial compositeness, s can also decay into fermion pairs through the elementary-composite mixing terms between the SM fermions and the composite top partners t . Since the decay into dibosons are effectively at loop level, and the large top mass implies in such theories that the top partners predominantly mix with the SM top, s generically decays into a top pair with very close to 100% branch ratio (BR). The same large mixing generically implies that, provided the mass of the s is not too large, the top partners themselves decay into s and top with a significant BR. As a result, a single top partner typically undergoes the decay chain, t → ts → ttt, (2.1) and an event originating from pair production of the top partners results in a six top final state (see Figure 1 left panel): p p →t t →t s t s →t tt tt t.
We work with an effective Lagrangian capturing the essential features of the interactions between top partners and s. Requiring that the singlet s renormalizably couples to the top and its partner, the vector-like top partners must either be electroweak singlets (T = t ) or doublets (ψ = (t , b )) with hypercharge Y = 1/6. In the first (singlet) case, the effective Lagrangian reads And the doublet case is described by Here H is the SM Higgs doublet field, D µ is the appropriate covariant derivative, m t and m s are the masses for top-partner and s respectively, and λ i are coupling constants. We work in the limit where the coupling λ is much larger than λ 1,2 or the electroweak coupling, such that the top-partner decays are predominantly into top and s with almost 100% BR, but is small enough that the width of the top partner remains relatively narrow. In this limit, the relevant parameters are the top partner and scalar masses, with mild dependence on the strength of the interactions. In the more general case where the top partners have appreciable decays into other channels, our results can be rescaled with the corresponding BR and continue to apply.

Top Partner Pair Production and Signatures
For modest mixing, the dominant top partner production mechanism at the LHC is production of a t t pair through the strong force of which the rate only depends on the partner mass and the strong coupling. The rate at the LHC operating at √ s = 14 TeV as a function of the top partner mass is shown in the right panel of Figure 1.
As with other multi-top final states, it is convenient to classify six top final states based on the decay modes of the W bosons. Leptonic decay modes allow for up to six very energetic charged leptons ( = e, µ) in the final state. In Table. 1, we list the channels containing up to three isolated charged leptons along with their corresponding branching ratios and the primary SM backgrounds leading to topologies similar to a six top final state. Final states with four or more charged leptons are not considered, as the BR for these channels is highly suppressed. While several of these channels have previously been analyzed at the LHC [24][25][26][27][28][29], the focus was on a different production mechanism, and thus not optimized to extract a six top final state. A six top final state also allows for the new, not previously analyzed, signatures such as three same-sign charged leptons.
In addition to channels with various numbers of leptons, there are several other generic features which commonly appear in the six top signature, including:  Table 1. Analysis channels arising from six-top final states with corresponding branch fraction, organized according to the number of leptons in the final state. The events fraction including possible mixing between different channels when considering mis-identification and detector effects for m t = 1500 GeV are listed in the third column. Note that the around 1% lepton fake rate from the jets which results in more leptons due to the large multiplicity of the jets in each events. Dominant SM backgrounds are also listed in the last column.  [24], and the green line corresponds to Ref [25]. Right: Upper limit on the s-t-t coupling strength as a function of m s from LHC searches for top pair production through scalar resonance [30]. • Boosted top jets which may appear as fat jets in the detector; • High multiplicity of bottom-flavored and/or light jets.

Current Constraints
Most searches for top partners at the LHC have considered missing transverse momentum signatures (based on SUSY searches [31][32][33]) which occur in theories in which the top partner is connected to a dark matter candidate. These searches exclude scalar top partners with masses up to 900 − 1000 GeV, depending on the mass of the dark matter candidate. We evaluate the constraints from visible signatures using CheckMATE [34], the results are shown in the m t -m s plane in the left panel of Figure 2. The most stringent constraints are coming from multi-lepton (red line) [24] and multi top quarks searches (green line) [25]. These constraints exclude cross section σ(pp → t t ) < 28.63 fb at the 95% C.L. for √ s = 13 TeV, corresponding to top partner masses up to nearly 1 TeV.
There is also the possibility to directly produce the s from gluon fusion, which results in a tt final state whose invariant mass is resonantly enhanced at m s . In the right panel of Figure 2, we show the observational upper limit derived from 8 TeV LHC search for resonant top pair production [30] on the s-t-t coupling strength as a function of the s mass. Note that, here we only present the constraints from 8 TeV analysis. New 13 TeV searches [35] will definitely improve the sensitivity. However, the detailed reanalysis of the 13 TeV result in our scheme is beyond our scope, we leave this for future works.

Identifying Six Top Events at the LHC
We divide our analysis into channels with 1, 2 or 3 isolated leptons (1, 2, 3-) in the final state. The 2-and 3-lepton channels are further divided according to the charges of the isolated leptons. Hence in total, we have five different channels: 1-lepton, 2 opposite sign leptons (2-os ), 2 same sign leptons (2-ss ), 3 mixed sign leptons (3-ms ) and 3 same sign leptons (3-ss ). These channels are by definition orthogonal to each other, such that a direct combination is straightforward.

Simulation and Event Reconstruction
We simulate signal and background events for the LHC running at √ s = 14 TeV. Events are generated at the parton level via the MadGraph5 package [36], using CTEQ6L parton distribution functions (PDFs) [37]. Resonances are decayed either via MadSpin [38] for top quarks and W bosons, or PYTHIA8 [39] for the top partners. Parton level events are then passed to PYTHIA8 for initial state radiation, showering and hadronization. The detector reconstruction is simulated by Delphes [40] using the default CMS configuration with modified lepton isolation and b-tagging efficiency (described below). Selection cuts are imposed through the ROOT framework via the PyROOT interface, with FastJet [41] providing further jet reconstruction and clustering analysis.
The signal process is generated as pp → t t for the set of top partner masses m t = 1.0, 1.3, 1.5, 1.8, 2.0 and 2.5 TeV. As mentioned above, PYTHIA8 decays the top partners into top quarks via t → ts → ttt, with an assumed 100% branching ratio. This process loses information regarding spin correlations, and thus we do not explore related observables in this analysis. For each choice of m t , we fix the singlet mass to be m s = m t − 500 GeV. While this choice is not general, our analysis does not rely on any selection related to this choice, and so we expect the derived efficiencies to be roughly independent of m s . However, the kinematic endpoints m t ≈ m s + m t or m s ≈ 2m t produce unusually soft top quarks, which could impact the distribution of events containing top quarks or W bosons reconstructing as fat jets. We minimize the impact by restricting ourselves to softer requirements on the corresponding variables, but it would be worthwhile to explore this region of parameter space in more detail.
with a cut of H T > 1.5 TeV imposed at the generator level to improve reconstruction efficiency. Even with this selection, we are computationally limited to processes with at most five final state particles, and restrict ourselves to sufficiently inclusive quantities in our analysis such that this limitation is unlikely to be important. We incorporate the possibility of "lepton charge flip" manually according to the prescription in Ref [42]. After the detector simulation, physics-level objects are reconstructed in both signal and background processes as: • Leptons are required to be isolated according to the prescription in Ref. [43].
• Jets are reconstructed using the anti-k T algorithm [44] with r = 0.4 and p T > 30 GeV; • Fat jets are reconstructed using anti-k T with r = 1.0 and p T > 200 GeV; • Jets are bottom-tagged according to the DeepFlavor performance shown in Ref. [45] using the 70% tagging efficiency as the work point; • Tops are tagged using a convolutional neural network (CNN) described in Appendix. A at the 50% benchmark operating point.
These reconstructed objects are fed into the selection described below to assess how well the signal may be extracted from the background. The distributions of H T , n f j (number of fat jets), n tf j (number of top-tagged jets) and n b (number of b-tagged jets) from the SM background and the signal (with two choices of top partner mass, 1.5 TeV (red line) and 2.5 TeV (orange line)) are shown in Figure 3 for the 3 mixed sign leptons case. We can clearly see from this figure that H T of the signal process is usually larger than the background processes and will increase with the mass of the top partner, m t . The same behavior also appears in the distributions of n f j and n tf j , as the more boosted jet is easier to be reconstructed as fat jets and further identified as top jets. The last distribution of n b is almost independent of m t , as it is almost controlled by the true number of the b-jets in the events, and we model the b-tagging efficiency as a constant (70% as described above) throughout the central region.

Event Selection and Sensitivity
We sort our events into five channels based on the number (and charge) of the leptons they contain as described above. The event fractions for each channel considering the detector effects are also listed in the third column of Table. 1. Note that we also include 1% lepton fake rate from jets which results in more leptons than expected just from the branch fraction due to the large multiplicity of jets in the events. For channels with two or After the Pre-Cuts, for m t = 1.5 TeV, the signal of 1-and 2-channel is typically 10-100 times smaller than the sum of the backgrounds, while other channels have similar with or even larger signal than the backgrounds. We further optimize the significance of the top partner signal by considering following kinematic variables (Cut I): • The number of fat jets n f j ≥ 3; • The number of top tagged fat jets n tf j ≥ 1; • The number of b-tagged jets n b ≥ 5.
It is likely that the number of untagged jets, n j is also a useful discriminant. However, the simulations are limited to five final state particles, n j may not be modeled well in our simulations, and we do not consider it here. Including this with sophisticated analysis will improve the sensitivity. For each channel, the cross section of the signal (for m t = 1.5 TeV) and corresponding backgrounds after each set of cuts, and the statistical significance of that channel (assuming 3 ab −1 of integrated luminosity) are summarized in Table 2. We find that the single best channel is the one demanding two same sign charged leptons, which balances rate against standing out from the background.  Table 2. Cut flow for m t = 1.5 TeV of all five channels with different number of leptons. The corresponding significance with 3 ab −1 luminosity for different channels and the combined significance are also list in the last column. Note that for 3-ss channel, we do not apply Cut I, as the event rate is already extremely low, further selection will decrease the sensitivity.
For each value of m t , we repeat this procedure for the same set of cuts. In each case, assuming that the top partners are pair produced exclusively through the strong force, the sensitivity maps into a bound on the branching ratio for t → ts → ttt. In Figure 4, we show the limit on this branching ratio as a function of m t from 1000 GeV to 2500 GeV. As m t approaches 2500 GeV, the upper limit on the branching ratio approaches 1, implying that higher masses will only be accessible if there is an additional mechanism responsible for producing t t beyond the strong interaction.

Reconstructing m t
In the case that an excess is detected, it would be desirable to reconstruct the origin of the signal from top partner pair production, and determine the t mass. Direct reconstruction as an invariant mass is challenging, since the leptonic top decays produce undetectable neutrino which results in missing momentum, and the decay products of six top quarks result in a large combinatoric confusion.
In order to improve the sensitivity to the mass, another CNN is trained to predict the probability that a set of events originate from a particular value of m t . This CNN has similar structure as the one explained in Appendix. A. However, instead of the data associated with one particular jet, the whole p T distribution in the calorimeter for the event after converting into "tensor image" is used as the input of the CNN. Using the whole p T distribution in one event actually captures following two features: • The H T distribution, the sum of the p T of all visible particles, which increases with m t ; • The dispersion, which describes the p T distribution in the whole space, which decreases with m t .
We show the output distribution for the 1.5 TeV classifier when fed simulated events with a variety of values of m t in the left panel of Figure 5. For simplicity, we neglect the background in this assessment; while this is not a good approximation for all of the channels, it well approximates the channels with the largest sensitivity (such as 2-ss ). We leave a more realistic analysis for future work.
Based on the distributions shown in the left panel of Figure 5, a binned likelihood is constructed and its negative log-likelihood is shown in the middle panel of Figure 5. Also for comparison, the result corresponding to the H T distribution alone is also presented, illustrating the increase in sensitivity achieved by the CNN. A more detailed analysis for 1.5 TeV case is shown in the right panel of Figure 5, and an O(100) GeV determination of the top partner mass can be achieved.

Conclusions
Events containing six top quarks are within grasp of the LHC Run 3, and provide a fascinating laboratory to search for physics beyond the Standard Model. We have explored a simplified model which arises as the low energy limit of compelling theories of a composite Higgs, and in which top partners decay into three top quarks with a large branching ratio. We have constructed inclusive observables which are able to tease the signal out of the otherwise large Standard Model background, and find that top partner masses up to around 2.5 TeV are accessible with ∼ 3 ab −1 as can be seen from Figure 4.
Further, the distribution of the final state particles also provides information about the mass of the top partner. A CNN-based method is used to investigate how well one can determine the top partner mass, with the whole p T distribution over the calorimeter used as the input to the CNN. As shown in Figure 5

A Boosted Jet Tagging
Our jet classification is based on a Convolutional Neutral Network (CNN) which combines calorimeter and tracker information for each fat jet to assign the probabilities that the jet originates from a top, W boson or light parton. For recent work on related strategies, see Refs. [6,[46][47][48][49][50].
The training and testing samples are generated through the same procedure as for the signal and background events, simulating the processes pp → XX with X = j, t and W . After reconstructing the fat jets using the anti-k t algorithm with ∆R = 1.0 and p T > 200 GeV, each of them is converted into a "tensor image". A square region in the (η, φ) plane of size 1.0 × 1.0 is constructed centered at the center of the jet and divided into 50 × 50 equal-sized pixels. Each pixel records the total incident p T and the multiplicities of both the track and tower classes (from Delphes). This results in a four channel image with dimensions 50 × 50 × 4.
The tensor image serves as the input to the CNN constructed using the PyTorch framework. The CNN consists of the following elements: • Four convolutional layers with a Rectified Linear Unit (ReLU) activation function; • Two max-pooling layers; • Classification block layers, including two linear layers with a dropout of 50% probability and ReLU activation function; • Final linear layer classifying the jet images into different categories.
In each sample, jets are divided into three bins according to their p T : 200 GeV < p jet T < 400 GeV, 400 GeV < p jet T < 800 GeV and p jet T > 800 GeV, and the CNN is trained separately for each p T bin. The tagging performance is characterized by the Receiver Operating Characteristic (ROC) curve. For each pair of jet classes j 1 and j 2 (tagging j 1 against j 2 ), the ROC curve (see Figure 6) shows the "tagging efficiency" (the probability of correctly tagging the jet of class j 1 as j 1 ) on the horizontal axis, and 1-"mistagging rate" (the probability of incorrectly tagging jet of class j 2 as j 1 ) on the vertical axis.
In Figure 6, the left panels show the ROC curves for tagging a top quark against a W -boson and a light jet, while the right panels are the ROC curves for tagging a W boson against a top quark and a light jet. The top, middle and bottom panels correspond to the p T bins: [200,400] GeV, [400,800] GeV and [800,∞] GeV, respectively. As expected, higher p T tops and W s are identified much more efficiently. Two benchmark working points corresponding to 50% and 80% efficiency for top tagging are marked on each curve in Figure 6, and the corresponding mistagging rates are listed in the legend of each panel. In practice, the 50% working point is used to tag the top jets.