Boosted Event Topologies from TeV Scale Light Quark Composite Partners

We propose a new search strategy for quark partners which decay into a boosted Higgs and a light quark. As an example, we consider phenomenologically viable right handed up-type quark partners of mass $\sim 1$ TeV in composite pseudo-Nambu-Goldstone-boson Higgs models within the context of flavorful naturalness. Our results show that $S/B>1$ and signal significance of $\sim 7\sigma$ is achievable at $\sqrt{s} = 14$ TeV LHC with 35 $fb^{-1}$ of integrated luminosity, sufficient to claim discovery of a new particle. A combination of a multi-dimensional boosted Higgs tagging technique, kinematics of pair produced heavy objects and $b$-tagging serves to efficiently diminish the large QCD backgrounds while maintaining adequate levels of signal efficiency. We present the analysis in the context of effective field theory, such that our results can be applied to any future search for pair produced vector-like quarks with decay modes to Higgs and a light jet.


I. INTRODUCTION
The Large Hadron Collider (LHC) has begun to explore the electroweak symmetry breaking (EWSB) scale. With a successful completion of Run I, highlighted by the discovery of the Higgs boson [1,2], the Standard Model (SM) is now complete. The Higgs boson accounts for the EWSB, generates masses of fermions, provides an explanation for the short range of the weak force, as well as unitarizes the W -boson scattering cross section. However, within the SM there is no explanation for why the Higgs boson mass itself is O(100 GeV). The naive expectation from perturbation theory shows that the Higgs mass should be close to the ultra-violet (UV) scale of the theory, due to the large couplings of the Higgs to the top quark (i.e. the hierarchy problem). There is a-priori no physical principle which prevents the Higgs mass from being finely tuned, although it is extremely uncommon to encounter such finely tuned quantities in nature. The latter prompted much of the theoretical work in the past decades to seek the explanation for the hierarchy problem within the scope of the "naturalness" paradigm.
There are two common "natural" solutions to the hierarchy problem. The first is to introduce additional symmetries to protect the Higgs mass from large corrections. The second is to model the Higgs boson as a composite object [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18], such that the Higgs mass becomes irrelevant above some dynamically generated compositeness scale, analogous to the pion mass in Quantum Chromo Dynamics (QCD). From the low energy effective theory point of view, both mechanisms introduce additional degrees of freedom (i.e. top partners) to the SM 1 , which cancel the top loop induced quadratic divergences in the Higgs mass. The top partners can be scalars, as in the case of supersymmetry, and fermions, as in the case of composite Higgs models. Together, the two mechanisms provide a "litmus test" for the naturalness paradigm.
The LHC is finally able to put naturalness to a meaningful test, where most of the experimental effort has been focused on searches for top partners [21,22]. The fact that no super-partners have been observed at the LHC is already pushing the supersymmetric models into a tuned regime. However, as the bounds on the scalar top partner mass increase, there have been several attempts to relax the bounds on the top partners via compressed/stealth spectrum, R-parity violation, Dirac gauginos, split families, etc. [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38]. Composite Higgs models are in a similar situation, although the bounds on the spin 1/2 partners in such models are somewhat milder compared to the already existing bounds from LEP and Tevatron constraints on the oblique parameters [39,40]. With the increased center of mass energy, Run II of the LHC will soon be able to cover the interesting region of parameter space of composite top partners [41].
An interesting avenue to bypass existing bounds is to employ non-trivial flavor structure for top partners 2 , where a large mixing is allowed between the right-handed (RH) top and RH charm partners. The basic idea comes from a simple observation that scalar top partners (i.e. stops) need not be mass eigenstates in order to cancel the large SM loop corrections to the Higgs mass. Instead, a stop flavor eigenstate made up of a stop-like and scharm-like mass eigenstates can serve the same purpose [35,36]. An analogous approach has recently been applied to composite Higgs models for light non-degenerate composite quarks [42]. The analysis focused on the Minimal Composite Higgs model (MCHM) [43] based on the coset structure SO(5)/SO (4), in which the Higgs doublet was realized as a pseudo-Goldstone boson.
Implementing non-degenerate composite quarks into composite Higgs models without conflict with the existing bounds from flavor physics and electro-weak (EW) precision observables is a non-trivial task. However, Ref. [44] showed that flavor alignment allows models with non-degenerate light generation partners to satisfy the constrains from flavor physics observables 3 . In addition, models with custodial parity [45,46] have been shown to be consistent with the constraints from EW precision tests [47,48]. Collider implications for such scenario have also been studied in Refs. [49,50].
Ref. [42] studied the implications of non-degenerate composite partners of the first two generation quarks for LHC phenomenology and derived the LHC bounds on fermionic resonances in the SO(4) fourplet representations. In particular, Ref. [42] showed that, without assuming degenerate compositeness parameters, the fourplet RH upquark partners have to be heavier than ∼ 2 TeV or the degree of compositeness of RH up quark has to be very small. In the latter case, a lower mass bound of ∼ 530 GeV still applies. At the same time, the fourplet RH charm quark component can be mostly composite and its partners can be as light as 600 GeV even with a large degree of right-handed compositeness.
Contrary to fourplet partners, SO(4) singlet partners are barely constrained by the LHC Run I searches. Ref. [51] recently obtained the first non-trivial bound on SO(4) singlet partners utilizing the h → γγ results from ATLAS [52]. However, the bound (i.e. the RH up-type partner mass M U h > 310 GeV) is very mild as the experimental searches were not designed to search for Higgs bosons arising from composite light quark partner decays.
The main focus of this paper is to design a dedicated search for singlet partners of light quarks, and study the potential of such searches to discover the quark partners at the Run II of the LHC. For the purpose of illustration, we study right-handed up-type quark partners, which are QCD pair-produced and decay dominantly into a Higgs boson and an up-type quark. We design the analysis in an effective theory framework, such that -although being motivated by composite quark partner searches -our results can be applied to any heavy vector-like quark model in which the vector-like quark has a decay channel into a Higgs and a light quark.
We focus on the potential of LHC Run II to probe light quark partners of mass ∼ 1 TeV, where the decays of light quark partners typically result in boosted Higgs bosons. In order to increase the signal rate, we consider only the decays of the Higgs boson to a bb pair. Seemingly complicated, such final states are particularly interesting, as traditional event reconstruction techniques fail. Due to the large degree of collimation of Higgs decay products, methods of Higgs tagging via "jet substructure" need to be employed [53]. In addition, the boosted di-Higgs event topology accompanied by two light jets offers a myriad of handles on large SM backgrounds. As we will show in the following sections, a combination of kinematic constraints of pair produced heavy particles, boosted Higgs tagging and double b-tagging is able to achieve a signal to background ratio S/B > 1 for light quark partner masses of 1 TeV. The same analysis shows that signal significance of ∼ 7σ can be achieved with 35 fb −1 of integrated luminosity, sufficient to claim a discovery.
For the purpose of boosted Higgs tagging, we use the Template Overlap Method (TOM) [54][55][56][57]. We propose a new form of overlap analysis which utilizes both Higgs template tagging and top template tagging in order to optimize the rejection of SM backgrounds while maintaining sufficient signal efficiency. The "multi-dimensional" TOM tagger compares the likelihood that a boosted jet is a Higgs to the likelihood that a boosted jet is a top quark, whereby a Higgs tag assumes that a jet is sufficiently Higgs like and not top like. Furthermore, we find that requiring at least one b-tag in each of the Higgs tagged jets significantly improves signal purity, especially with respect to large multi-jet backgrounds.
We organized the paper in three sections. Sec. II summarizes the theoretical framework of MCHM with partially composite RH up-type quark partners and introduces the effective model of the light up-type quark partners. In Sec. II we also discuss the diagonalization of mass matrices, calculation of the couplings in the mass eigenbasis and other relevant parameters which enter the effective parametrization used throughout the paper. Sec. III deals with a phenomenological study of LHC Run II searches for up-type quark partners. We propose and discuss in detail a set of observables which can be used to efficiently detect and measure the partners at 1 TeV mass scales, as well as present results on S/B and signal significance using our cutflow proposal. We conclude in Sec. IV. A brief discussion of models in which the quark partner is not dominantly RH can be found in the Appendix.

II. PARTIALLY COMPOSITE LIGHT QUARK PARTNERS
In this article we focus on the MCHM based on the coset structure SO(5)/SO (4). We follow the conventions and notation of Ref. [42] based on the Coleman-Wess-Zumino (CCWZ) formalism [58,59]. The Higgs multiplet is nonlinearly realized as the Goldstone Boson multiplet of the SO(5)×U (1) X → SO(4)×U (1) X ∼ SU (2) L ×SU (2) R ×U (1) X breaking. Gauging the SU (2) L and Y = T 3 R + X assigns the correct SU (2) × U (1) Y quantum numbers to the Higgs multiplet, which is parameterized by the Goldstone boson matrix. In unitary gauge, it reads [42,60] where h is vacuum expectation value of the non-linearly realized Higgs field which is related to the Standard Model vacuum expectation value by 246 In composite Higgs models, the Higgs transforms non-linearly under the global spontaneously broken symmetry group, while elementary fermions transform linearly. Yukawa-type interactions of purely elementary quarks (and leptons) with the Higgs are hence forbidden. However, the strongly coupled sector is expected to contain QCD charged fermionic resonances (i.e. "quark partners") at or below a scale Λ ∼ 4πf which can have Yukawa-type couplings with elementary quarks and the Goldstone boson matrix (which contains the Higgs). Electroweak symmetry breaking then yields mass mixing terms between the composite quark partners and the elementary quarks such that the lightest quark mass eigenstates (which are identified with the SM-like quarks) are partially composite. The mass spectrum and couplings of the SM-like quarks and their heavy partners to electroweak gauge bosons and the Higgs depend on the SO(5) representations in which the elementary quarks and the heavy partner quarks are embedded. For concreteness, here we focus on one minimal embedding.
The elementary left-handed and right-handed quarks are embedded into incomplete 5 representations of SO(5) with a U (1) X charge of 2/3 for q U L and −1/3 for q D L . The lightest composite quark partner resonances are assumed to be in the 5 of SO(5) as well with U (1) X charge of 2/3 for ψ U and −1/3 for ψ D .
Using the CCWZ prescription we can construct the fermion Lagrangian of the model which reads with where e µ and d i µ are the CCWZ connections (cf. Appendix A of Ref. [42] for the explicit expressions), M U,D 1,4 and c U,D L,R are matrices in flavor space, and where the pre-yukawa couplings y U,D L,R are matrices in flavor space. Typically, the composite sector is assumed to be flavor-blind in order to avoid constraints from flavor changing neutral currents (cf. e.g. Ref. [48]). In such a setup, the flavor structure only enters via the pre-Yukawa couplings, and the partners of the different SM quark flavors are mass degenerate, up to Yukawa-suppressed corrections. However, as has been pointed out in Ref. [61], partners are allowed to be non-degenerate within models of flavor alignment [62,63]. In this article we allow for non-degenerate quark partner masses M U,D 1,4 and treat them as free parameters. LHC run I established various constraints on the different quark partners already: 4 • The top partner multiplet Q U 3 contains a charge 5/3 particle X T 5/3 as the lightest member with a mass M 4 . Its decay channel X T 5/3 → W + t yields a same-sign dilepton signal which has not been observed, yet. This results in a lower mass bound of M U 4 3 > 800 GeV established by CMS [64].
• The singlet top partnerT ≡Ũ 3 (as well as the the charge 2/3 partners in Q U 3 multiplet) has decay channels into tZ, th, and W b. CMS established a lower bound on the mass of a charge 2/3 partner of 687 -782 GeV [21], with the strongest bound applying ifT → tZ is the dominating decay. The analogous ATLAS bounds are ∼ 350 -810 GeV [65].
• Bounds on partners in the multiplets Q U 1,2 have been studied in detail in Ref. [42], where a bound of M U 4 1,2 > 530 GeV for QCD pair produced partners was established, which also applies to partners in the Q D 1,2 multiplets. These bounds on light quark partners are weaker than the bounds on 3rd generation quark partners. Third generation partners decay into electroweak gauge bosons (or a Higgs) and a third generation quark, leading to final states which can be efficiently "tagged" at the LHC and hence allow to reduce or eliminate the numerous SM backgrounds. On the other hand, partners of light quarks decay into light quark flavors which are significantly more difficult to distinguish from the SM background channels. 4 All bounds quoted refer to QCD pair production and subsequent decay of the quark partners. This production channel only depends on the mass of the quark partner and is therefore rather model-independent. The various partners can also be single-produced via electro-weak interactions. The mass bounds from such channels can be more stringent in some part of the parameter space (cf. e.g. [42,60]) but the production cross section for these processes depends on the model parameters y U,D L,R , c U,D L,R such that these constraints can be alleviated. 5 Again, the bounds are strongest when the branching ratio into Zb is large. However, a recent CMS study [68] focussed on the the all-hadronic channel pp → BB → hbhb → bbbbbb and showed that limits are improved when making use of jet-substructure techniques.
Assuming 100 % branching ratio of B → hb, [68] obtained a lower bound on the mass of 846 GeV.
• So far, the most unconstrained partners are the light quark singlet partnersŨ 1,2 andD 1,2 . The dominant decay mode into hj, leads to a (potentially large) di-Higgs signature which has not been searched for at LHC run I. 6 The only constraint we are aware of has been obtained in Ref. [51], where the absence of h → γγ decays with high p γγ T has been used to establish a bound of M 1 > 310 GeV. In this article, we study the discovery reach for the weakest constrained and therefore potentially lightest quark partner at LHC run II: a light-quark SO(4) singlet partner. Focussing on the singlet partner, the model defined in Eq. (5) can be simplified. For simplicity, we take the limit M 4 M 1 , and discuss the model for the up-partner only. Note that the phenomenology of d, s, c partners is analogous. 7 Under these simplifying assumptions, the Lagrangian of the up-quark sector following from Eq. (5) is [51] Expanding around the vacuum expectation value h yields the effective quark mass terms Note that the effective mass terms m L and m R arise from the left-and right-handed pre-yukawa mass terms which have inherently different symmetry properties. The y L coupling links a fundamental fourplet to a composite SO(4) singlet while the y R coupling links a fundamental singlet to a composite fourplet. Therefore, y L and y R are independent parameters which are not required to be of the same order of magnitude by naturalness. For simplicity, we choose y R y L here, and discuss consequences of the opposite limit y R y L in Appendix A. For y R ≥ y L , the mixing mass terms have a hierarchy m R m L . The eigenvalues of the squared mass matrix are where the lighter eigenvalue M u l is to be identified with m u , implying |m L m R |/M 2 1 1. The bi-unitary transformation which diagonalizes the mass matrix is a rotation by ϕ L,R on the left-and right-handed up-quarks where The couplings of the mass eigenstates to the Z bosons follow from rewriting in the mass eigenbasis (u l , U h ). Note that the couplings arising from the U (1) X gauge couplings are universal. A rotation into the mass eigenbasis of these terms does not induce any "mixed" interactions of the Z to u l and U h and leaves the Z couplings to right-handed light quarks unaltered. Mixing in the left-handed sector induces non-universality of the light quark couplings to the Z, but the correction to the left-handed coupling is of order , such that corrections to the hadronic width of the Z are negligible 8 . The "mixed" coupling of the Z to u l and U h in the left-handed sector is 6 ATLAS [69] and CMS [70] published results on di-Higgs signals which result from the decay of a heavy resonance (KK-graviton or, respectively, a heavy Higgs), but these searches do not apply to the di-Higgs signal considered here, as the sum of the invariant mass of the decay products does not form a resonance in our case. 7 In this article we focus on parameter independent bounds which arise from QCD pair production of quark partners. For (parameter dependent) single production, the quark flavor affects the production cross section (cf. [51]). 8 For d, s, c partners, the analogous corrections are 10 −6 , 10 −4 , 10 −3 such that no bounds apply as long as y R ≥ y L .
Analogous to the neutral current, the mass mixing in the left-handed sector also induces negligible corrections to the W ud vertex and a "mixed" coupling between the W , U h , and d: The Higgs couplings to the quark mass eigenstates follow from expanding Eq. (8) to first order in and subsequent rotation into the mass eigenbasis. In the gauge eigenbasis the Yukawa terms read with Rotating into the mass eigenbasis, the mixing Yukawa interactions In the regime y L y R considered here, the mixing couplings to h, W, Z which are proportional to y L can be neglected, and the model is described by the simple effective action The Lagrangian in Eq. (20) and the definition of the effective coupling of Eq. (19) is valid for up-type quark partners. The analogous calculation for down-type partners yields the same Lagrangian with the charge factors 2/3 being replaced by −1/3 as directly follows from the U (1) X charge assignments.
The phenomenology of this model is particularly simple: • The partner state U h carries color charge and can therefore be produced via QCD pair production. 9 • The dominant decay channel for the quark partner is U h → uh. 10 This model hence predicts pp → U hŪh → hhjj as a distinct signature at the LHC. In the following sections, we will explore the prospects for discovery of such signals at the LHC Run II, with the focus on partner masses of ∼ 1 TeV.

III. SEARCHING FOR LIGHT QUARK PARTNERS AT THE LHC RUN II
In the benchmark model we consider, the singlet partner U h decays exclusively into a Higgs and an up-type quark. The topology of signal events is characterized by a pair of boosted Higgs bosons (if the mass of the singlet partner is sufficiently heavy) accompanied by two light jets. We further require that the Higgs decay into bb in order to avoid a reduction of signal cross section due to small branching ratios of the Higgs to other SM final states. Due to the boosted Higgs topology, the final state bb pairs are expected to be collimated into a cone of roughly 2m h /p T , where p T is the transverse momentum of the decaying Higgs. 9 For a large value of λ mix R gs and depending on the partner quark flavor, additional production channels exist which have been discussed in Ref. [51], however here, we focus on the parameter independent QCD pair production. 10 Decays into Zu and W d are suppressed in the regime y L y R which is described by the effective Lagrangian Eq. (20). The decays are only present in the regime y L y R with branching ratios Γ U h →hu : Γ U h →Zu : Γ U h →W d of 1 : 1 : 2 in the limit y L y R . For a detailed discussion cf. Appendix A  Here we consider only pair production of U h partners at a √ s = 14 TeV pp collider (see Fig.1), where the U h pairs are produced via QCD interactions. Hence, the production cross section is rather model independent, depending solely on M U h . The dominant background channels to the all hadron final states in our signal events are tt + jets, bb + jets, and light multi-jet channels.
The scope of our current effort is to study the ability of various jet observables to suppress the before-mentioned background channels and enhance the signal for U h partners of mass O(1 TeV). To our knowledge, such searches for light quark fermionic light quark partners in the fully hadronic channels have not been studied in the past. As here we are interested in a "proof of concept" type of study, we will only consider signal and background events in a pileup-free environment.

A. Data Generation and Pre-Selection Cuts
We generate all events using leading order MadGraph 5 [71] at a √ s = 14 TeV pp collider, assuming a CTEQ6L [72] set of parton distribution functions. At the hard process level, we require that all final state partons pass cuts of p T > 15 GeV, |η| < 5. Next, we shower the events with PYTHIA 6 [73] using the MLM-matching scheme [74] with xqcut > 20 GeV and qcut > 30 GeV. We match the multi-jet events up to four jets, while the tt and bb samples are matched up to two extra jets. We cluster all showered events with the fastjet [75] implementation of the anti-k T algorithm [76]. In order to perform the analysis with a manageable number of events in the background channels (i.e. ∼ 10 6 ), we impose a generator level cut on H T , a scalar sum of all final state parton transverse momenta. The motivation for the generator level H T cut comes from the fact that pair produced light quark partner events contain two objects of mass ∼ 1 TeV, implying that the signal will be characterized by H T of roughly 2 TeV. In order to avoid possible biases on the background data by increasing the H T cut too much, we hence require H T > 1.6 TeV on all generated backgrounds.
We summarize the cross sections for the signal parameter point of M U h = 1 TeV and the most dominant backgrounds in Table I. For completeness, we show the U h pair production cross section as function of M U h in Fig.??, where we assume Br(U h → hu) = 1 and the branching ratio of Higgs to a pair of b quarks is included. Notice that the total production cross section for partner masses above 1.3 TeV goes into the sub-femtobarn region which will be challenging to probe at the Run II of the LHC with 35 fb −1 of integrated luminosity. A closer look at the numerical values of the signal and background cross sections suggests that a total improvement in S/B of O(10 5 ) is desired to reach S/B ∼ 1. For that purpose, we will introduce a new cut scheme in Section III D, which exploits the characteristic topology and kinematic features of the signal events.  Table I Table I: Cross sections for the U hŪh pair production (assuming MU h = 1 TeV) and backgrounds (assuming HT > 1600 GeV), at 14 TeV LHC. We normalize the " tt +0,1,2 jets " to the NNLO + NNLL result of Ref. [77], while for the rest of the backgrounds we use a conservative estimate for the NLO K-factor of 2.0.

B. Tagging of Boosted Higgs Jets
The decay products of a boosted Higgs are collimated into a cone of R ∼ 2m h /p T , where p T is the transverse momentum of the Higgs boson. Since we consider light quark partners of mass ∼ 1 TeV, the resulting Higgs bosons will have p T ∼ 500 GeV, and hence will decay into a cone of roughly R ∼ 0.5. Clustering the decay products of a boosted Higgs into a large cone (e.g. R = 0.7), will typically result in a single "fat jet" of mass ∼ m h . However, traditional jet observables such as jet p T and m are inadequate to efficiently distinguish between Higgs, top and QCD "fat jets", and a further consideration of Higgs "jet substructure," is needed to reduce the enormous QCD backgrounds. Many methods designed to tag the characteristic "two prong" substructure of the hadronically decaying Higgs exist in the literature [53,54,56,78,79]. Here we will use the TemplateTagger v.1.0 [80] implementation of the Template Overlap Method [54][55][56][57].
The Template Overlap algorithm for boosted jet tagging attempts to match a parton level model (template) for a boosted jet decay (i.e. the bb system with the constraint of (p 1 + p 2 ) 2 = m 2 h ) to the energy distribution of a boosted jet. The procedure is performed by minimizing the difference between the calorimeter energy depositions within small angular regions around the template patrons and the parton energies, over the allowed phase space of the template four-momenta. Refs. [54][55][56] studied the use of TOM to tag boosted Higgs decays in the context of the Standard Model. To our knowledge, our current effort is the first attempt to utilize TOM for boosted Higgs studies in a BSM scenario.
An attractive feature of TOM is a relatively weak susceptibility to pileup contamination [57]. The overlap analysis is affected only by the calorimeter depositions which land in angular regions of typically r ∼ 0.1 from the template patrons. The rest of the jet energy distribution does not contribute to the estimates of the likelihood that a particular template matches the jet energy distribution. As pileup contamination scales as R 2 , where R is the jet cone, the effects of pileup on the TOM analysis will be of order few percent, compared to (say) the p T of a typical fat jet of R ∼ 1.0.
Ideally, in order to maximize the information extracted from jet substructure, one would perform TOM analysis for all heavy standard model decays on each candidate fat jet. Such analysis would result in a vector of overlap scores where i = W, Z, h, t. Various correlations within the multi-dimensional overlap space could then be exploited to fully maximize the ability of TOM to tag the desired heavy particles. The full multi-dimensional TOM analysis is beyond the scope of our current effort and we find it sufficient to use only a combination of two body Higgs as well as three body top template analysis (in order to further suppress the large tt background) 11 . As the three prong decay of a boosted top is more complex of an object than the typical two prong decay of a boosted Higgs, it is possible for a top fat jet to pass the two-body Higgs template tagging procedure. On the other hand, it is difficult for a Higgs to appear as a fake top [56]. We hence require all Higgs candidate jets to pass the requirement As we will show in the following sections, the combined requirement on Ov h 2 and Ov t 3 is very efficient at removing the tt fake rate.
For the purpose of this analysis, we generate 17 sets of both two body Higgs and three body top templates at fixed p T , starting from p T = 425 GeV in steps of 50 GeV, while we use a template resolution parameter σ = p T /3 and scale the template subcones according to the rule of Ref. [56].

C. b-tagging
The signal final states we consider contain four b-jets from two Higgses, which can be extremely useful in disentangling the signal events from the background channels. However, requiring four b-tags in a boosted configuration comes at a severe cost of the signal efficiency as even in the optimistic scenario of a single b-tag efficiency of 75%, b-tagging four jets alone would cut out about 70% of the signal events. Instead, here we will consider two b-tags, and require that they are contained within the two Higgs candidate jets.
A full analysis of b-tagging requires a detailed detector study which is beyond the scope of our work. Here we adopt a simplified, semi-realistic b-tagging procedure, whereby we assign to each r = 0.4 jet a b-tag if there is a parton level b or c quark within ∆r = 0.4 from the jet axis. We then weight each event by the benchmark b-tagging efficiencies: where b,c,j are the efficiencies that a b, c or a light jet will be tagged as a b-jet. For a Higgs fat jet to be b-tagged, we then require that a b-tagged r = 0.4 jet lands within ∆R = 0.7 from the fat jet axis. Furthermore, we take special care of the fact that more than one b-jet might land inside the fat jet and reweigh the b-tagging efficiencies according to the rule of Table II. b-tag scores of a fat jet Efficiency values 0 (jet: u,d,s,g) Table II: Efficiency that a Higgs fat jet will be b-tagged assuming that it contains a specific number of light, c or b jets within ∆R = 0.7 from the jet axis. j , c and b are b-tagging efficiencies for light, c and b jets respectively. We neglect the possibilities beyond three proper b-tagged jets within a fat jet as they occur at too low of a rate to be significant.

D. Event Selection and Reconstruction of the U h Pair
We proceed to discuss in detail the cut scheme we propose for the all-hadronic searches for pair produced U h partners. For the convenience of the reader, we outline the event selection in Table III, while a detailed description and definition of the observables can be found in the following text. We begin by requiring at least four anti-k T , R = 0.7 jets with p R=0.7 T > 300 GeV, |y R=0.7 | < 2.5 .
The requirement on the presence of four fat jets pre-selects signal event candidates, as we expect two pairs of boosted Higgs-light jets to appear in the final state 12 . In order to determine which of the four jets are the Higgs candidates, we select the two highest p T fat jets which satisfy the TOM requirement of of Section III B. The requirement on peak template overlap is designed to select the two Higgs candidate jets in the event, while ensuring that the jets are not fake tops. If less than two fat jets pass the overlap requirement, the event is rejected. The overlap selections in Eq. (25) deserve more attention. Figure 3 illustrates how utilizing multi-dimensional TOM analysis (i.e. Ov h 2 and Ov t 3 ) can help in reducing the background contamination of signal events. If we consider only Ov h 2 (dashed line), a significant fraction of tt would pass any reasonable overlap cut. However, in a two dimensional distribution, it is clear that many of the tt events which obtain a high Ov h 2 also obtain a high Ov t 3 score. Contrary to tt events, the signal events almost never get tagged with a high Ov t 3 score, as it is difficult for a proper Higgs fat jet to fake a top. Hence, an upper cut on Ov t 3 (solid line) efficiently eliminates a significant fraction of tt events, at a minor cost of signal efficiency. Note that the peak at Ov h 2 ≈ Ov h 3 ≈ 0 in the signal distributions corresponds to events where the hardest/second hardest fat jet is likely a light jet. Figure 4 illustrates the effects of Ov cuts on the mass distribution of the two highest p T jets. Note that the intrinsic mass filtering property of TOM can be clearly seen in the results. The mass resolution of the Higgs fat jets improves upon the cut on the overlap, while the contributions from both high mass and low mass background regions are significantly diminished.
In addition to jet substructure requirements for Higgs tagging, we require both Higgs candidate jets to contain at least one b-tagged r = 0.4 jet within the fat jet, as prescribed in Section III C.
In order to pick out the light jets, we re-cluster each event with r = 0.4 (also necessary for b-tagging) and select the two highest p T jets which pass the requirement of Cut Scheme

Basic Cuts
Demand at least four fat jets (R = 0.7) with pT > 300 GeV, |η| < 2.5 Declare the two highest pT fat jets satisfying Ov h 2 > 0.4 and Ov t 3 < 0.4 to be Higgs candidate jets. At least 1b-tag on both Higgs candidate jets. Select the two highest pT light jets (r = 0.4), with pT > 25 GeV to be the u quark candidates. GeV   Table III: Summary of the Event Selection Cut Scheme. 12 Selecting 4 R = 0.7 fat jets also simplifies the TOM jet substructure analysis.  where ∆R uh stands for the plain distance in η, φ between the r = 0.4 jet (i.e. the up type quark) and each of the Higgs candidate fat jets. We declare these jets to be the u quark candidates.

Complex Cuts
Since we expect two Higgs fat jets in the final state, a comparison between the masses of the two hardest fat jets which pass the overlap criteria provides a useful handle on the background channels. In order to exploit this feature, we construct a mass asymmetry where m h1,2 are the masses of the two Higgs candidate jets. Figure 5 (left panel) shows the distribution of ∆ h for signal events and relevant backgrounds. Even after the overlap selections, the background distributions are significantly wider than the signal. Hence, in order to further suppress the background channels, we impose a cut of Upon identifying the u and Higgs jets, we proceed with the reconstruction of the U h partner. The signal events are characterized by a distinct "2 fat jet 2 light jet" topology, a final state which represents somewhat of a combinatorial  challenge (for each fat jet, two combinations with a light jet are possible). In order to find the correct Higgs-light jet pairs, we construct four different combinations of invariant masses where p h i are the four momenta of the two R = 0.7 jets which pass the Higgs tagging requirements and p u j are the four momenta of the two hardest r = 0.4 isolated from the Higgs jets by ∆R uh > 1.1. A correct Higgs-light jet pair then minimizes the value of Consequently, we take the configuration of Higgs -light jet pair which minimizes ∆ U h to construct m U h1,2 , the masses of the two U h partners in the event. Figure 6 shows the reconstructed invariant mass distribution of the U h pair (assuming M U h = 1 TeV) and the background distributions. The signal events show a prominent peak at the correct partner mass for both U h partners in the event, while the background distributions are smeared over a wide range of mass values. The results of Figure 6 illustrate well the degree to which our proposal is able to resolve the mass of the U h partners.
The value of ∆ U h represents the minimum of a mass asymmetry between the two reconstructed objets and hence utilizes the fact that the U h partners are pair produced. In addition to allowing us to overcome the combinatorial issues when reconstructing the U h partners, ∆ U h provides another handle on the background channels. Because the U h partners are pair produced, we expect the value of ∆ U h to peak at 0 for signal events, while we expect the background channels to be characterized by wider distributions of ∆ U h as there is no kinematic feature in the background channels which would lead to a reconstruction of two same mass resonances. Figure 5 (right panel) shows ∆ U h distributions for signal and relevant backgrounds. As in the case of ∆ h , the background distributions of ∆ U h are significantly broader compared to the signal, hence providing another unique handle on the background channels. In order to exploit this feature, we impose a cut on In this section we investigate the ability of our cutflow proposal to detect ∼ 1 TeV light quark partners which decay to a Higgs-light jet pairs at the Run II of the LHC.   Our results show that boosted jet techniques combined with fat jet b-tagging and kinematic constraints of pair produced heavy particles can achieve S/B > 1 with signal significance of ∼ 7σ at 35 fb −1 , assuming light quark partners of M U h = 1 TeV. The significance we obtain is sufficient to claim a discovery of 1 TeV light quark partners. In addition, we find that probing masses higher than 1 TeV will require more luminosity and will be challenging at Run II of the LHC. However, even with 35 fb −1 signal significance of more than 3σ is achievable for M U h = 1.2 TeV, enough to rule out the model point.
Requiring that there exist four fat jets with p T > 300 GeV in an event, together with our boosted Higgs tagging procedure result in an improvement of S/B by roughly a factor of 70-100 at ∼ 20% signal efficiency relative to the pre-selection cuts. Additional cuts on mass asymmetries improve S/B by roughly of factor a 3 in total.
The greatest improvement in both S/B and S/ √ B comes from fat jet b-tagging, where we find an enhancement by a factor of ∼ 500 − 600 in S/B and 15 − 20 in signal significance. The improvement is largely due to the enormous suppression double fat-jet b-tagging exerts on the multi-jet and bb backgrounds, with the signal efficiency of ∼ 50%. The high rejection power of b-tagging can be understood well from results presented in Figure 7. The signal events almost always contain at least one b quark in each of the fat jets which pass the boosted Higgs tagging criteria. Conversely, almost no multi-jet and bb events contain two "Higgs like" fat jets with each of the tagged heavy boosted objects containing a b-jet. The only background channel which seems to contain a significant fraction of events with both fat jets containing a proper b-tag is Standard Model tt. Still, we find that only about 10% of the tt events survive the double b-tagging criteria.

IV. CONCLUSIONS
We studied the LHC Run II discovery potential for the light quark partners in composite Higgs models. As an example, we considered a simplified model based on the SO(5)/SO(4) coset structure containing one up-type quark in the decoupling limit. Of particular interest were pair produced up-type quark partners of mass ∼ 1 TeV which then decay into two boosted Higgses (which we take to decay further hadronically) and two hard jets -a final state which can not be efficiently tagged and reconstructed by "traditional" jet techniques. We proposed a new event cut scheme, designed to exploit the characteristic features of the pair produced U h event topology. We found that a combination of b-tagging, jet substructure, and kinematic cuts resulting from the fact that quark partners are pair produced allows to suppress the large QCD backgrounds to a degree where S/B > 1 and S/ √ B ∼ 7 is possible for quark partners of mass 1 TeV with 35 fb −1 of integrated luminosity. Our results show that the LHC Run II could achieve sufficient sensitivity to light quark partners of mass 1 TeV to claim discovery. Probing masses higher than 1 TeV using our proposed cut-scheme will be difficult at Run II of the LHC, yet with 35 fb −1 we find that a signal significance of more than 3σ is achievable for M U h = 1.2 TeV, sufficient to rule out the model point.
The event selection procedure we propose begins by requiring the presence of four fat jets (i.e. R = 0.7), two of which are tagged as Higgs candidates. We perform Higgs tagging by considering a combination of the Higgs two body peak overlap, Ov h 2 , and the top three body overlap Ov t 3 , where we require a lower cut on Ov h 2 and an upper cut on Ov t 3 . The two-dimensional overlap analysis allows us to suppress the QCD backgrounds, including tt, to a much better degree compared to the analysis utilizing only Ov h 2 . In addition to jet substructure tagging, we also require the two Higgs candidate jets to be b-tagged, as well as that the mass difference between the Higgs jets is small. Kinematics of heavy pair produced quark partners offer an additional handle on the background channels, and we require that the mass difference between the reconstructed U h partners also be small. The greatest improvement in the signal significance comes from b-tagging, as requiring two Higgs fat jets to be b-tagged diminishes the enormous multi-jet background.
Our study represents a "proof of principle" that successful searches for TeV scale light quark partners decaying to hj are possible at the Run II of the LHC. Further work is necessary to study the effects of pileup contamination on the results of the analysis. Yet, it is likely pileup effects will be manageable, even at ∼ 50 interactions per bunch crossing. The TOM analysis of boosted jets is weakly susceptible to pileup at 50 interactions per bunch crossing [57], as long as the fat jet p T is corrected so that the appropriate template bin is used in the analysis. Alternatively, many issues with determining the jet p T in a high pileup environment could be bypassed by analyzing each jet with template sets at a range of transverse momenta. Effects of pileup on jet mass do not represent an issue for our event selection proposal, as the combination of Ov h 2 and Ov t 3 selections serves as an excellent intrinsic mass filter. Furthermore, recent experimental studies of Ref. [81] suggest that effects of pileup on b-tagging at LHC Run II will be under control.
Future analyses using our event selection could also benefit from a detailed detector simulation.
In Section II we discussed a partially composite light quark partner model of a minimal composite Higgs model in which the elementary quarks as well as the composite partner quarks are embedded into a 5 of SO (5), and in which the SO(4) singlet mass scale M 1 of one of the partners of the light quarks u, d, s, c is lower than the remaining partners mass scales, such that the model can be described be the effective Lagrangian Eq. (8). In addition, we assumed dominance of the right-handed pre-Yukawa coupling of this quark partner, i.e. y R y L . In this case, the quark partner state decays dominantly into hj and is described by the very simple effective Lagrangian Eq. (20) which we used for our further studies of the hhjj signature at LHC run II.
In the case of general y L , y R , the quark partner mass eigenstate has couplings to the Z, W , and Higgs bosons as given in Eqs. (14,15,19), which depend on the mixing angles ϕ L,R in the left-and right-handed sector. As the light SM quark mass is to be identified with M u l given in Eq. (10), the product the mixing angles is tiny, the couplings in Eqs. (14,15,19) are small (unless an extreme hierarchy between y L and y R is chosen), and effect on U h production processes is negligible. However, changing the left-and right-handed mixing angles modifies the U h branching ratios.
The "mixing" couplings Eqs. (14,15,19) imply decay channels of the U h into Zu, W d, and hu with partial decay widths [51] where Γ W,Z,h = 1 + O( ) are kinematic functions, and we used the expressions for the couplings Eqs. (14,15,19), mixing angles Eq. (12), as well as 246 GeV ≡ v = f sin( h /f ) ≡ f sin( ). Thus, the Higgs decay channel dominates in the limit y R y L , where U h decays through the right-handed channel, where while for y L cos( ) √ 2y R sin( ) decays through the left-handed channel dominate, which leads to branching ratios Γ U h →W d : Γ U h →Zu : Γ U h →hu of ∼ 2 :∼ 1 :∼ 1. In the latter parameter regime, the discovery and exclusion reach of the model purely the hhjj channel (as discussed in this article) is substantially reduced because the cross-section of this channel is reduced by a factor of ∼ 16. However, decays of the U h into W d and Zu imply a variety of final states