Search for electroweak production of a vector-like quark decaying to a top quark and a Higgs boson using boosted topologies in fully hadronic final states

A search is performed for electroweak production of a vector-like top quark partner T of charge 2/3 in association with a standard model top or bottom quark, using 2.3 fb−1 of proton-proton collision data at s=13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt{s}=13 $$\end{document} TeV collected by the CMS experiment at the CERN LHC. The search targets T quarks decaying to a top quark and a Higgs boson in fully hadronic final states. For a T quark with mass above 1 TeV the daughter top quark and Higgs boson are highly Lorentz-boosted and can each appear as a single hadronic jet. Jet substructure and b tagging techniques are used to identify the top quark and Higgs boson jets, and to suppress the standard model backgrounds. An excess of events is searched for in the T quark candidate mass distribution in the data, which is found to be consistent with the expected backgrounds. Upper limits at 95% confidence level are set on the product of the single T quark production cross sections and the branching fraction ℬT→tH\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathrm{\mathcal{B}}\left(\mathrm{T}\to \mathrm{t}\mathrm{H}\right) $$\end{document}, and these vary between 0.31 and 0.93 pb for T quark masses in the range 1000-1800 GeV. This is the first search for single electroweak production of a vector-like T quark in fully hadronic final states.


Introduction
We report on a search for a vector-like top quark partner (T) of charge 2/3. The T quark appears in many extensions of the standard model (SM) and usually mixes with the SM top quark [1][2][3][4][5][6]. Vector-like quarks (VLQs) like the T could have a role in regularizing the SM Higgs boson (H) mass, thus offering a solution to the hierarchy problem [7,8]. The production of VLQs can be in pairs via the strong interaction, or singly in association with the SM top or bottom quarks via the electroweak interaction. The electroweak couplings of the T quarks to the SM thirdgeneration quarks are highly model dependent. These couplings determine the rates of the single T quark production modes, shown in Fig. 1. The expected decay channels of a T quark coupling to the SM top or bottom quarks are T → bW, T → tZ, and T → tH [9]. Probing such processes could shed light on the mixing of VLQs with the SM third-generation quarks.
The VLQs are non-chiral particles, i.e., their left-handed (LH) and right-handed (RH) components are part of the same multiplet under a weak isospin symmetry transformation. As a consequence, their masses are not restricted by their Yukawa couplings to the Higgs field; hence these particles are not ruled out by constraints from measurements of the production and decay rates of the Higgs boson [10].
Searches for pair-production of T quarks have been conducted by the ATLAS and CMS collaborations at the CERN LHC using proton-proton (pp) collision data at √ s = 8 TeV, and limits placed on the mass between 720 and 950 GeV, depending on the decay mode [11,12]. A search for single production of T quarks decaying to Wb was conducted by the ATLAS collaboration using pp collision data at √ s = 8 TeV, and a limit on the T quark mass was set at 950 GeV [13]. For high VLQ masses, the pair-production cross section rapidly decreases as the phase space for producing two massive particles is limited. Above the TeV range, single production via the electroweak process is expected to dominate over pair production [14], and is thus the focus of this search. In this Letter we present a search for a singly produced T quark using pp collision data collected at √ s = 13 TeV with the CMS experiment in 2015, and corresponding to an integrated luminosity of 2.3 fb −1 . The production processes considered are pp → Tbq and pp → Ttq, as shown in Fig. 1. We consider the decay mode T → tH with the top quark decaying fully hadronically (t → bW → bqq ) and the SM Higgs boson decaying to bb. For a SM Higgs boson with a mass close to 125 GeV [15], the decay branching fraction B(H → bb) = 58% [16]. Recently, a companion CMS analysis has searched for a singly-produced T quark with T → tH using this 13 TeV data set in a leptonic final state [17], and set limits on the product of the T quark cross section and the branching fraction B(T → tH) in the mass range 1000-1800 GeV.
The pp → Tbq with T → tH channel contains seven outgoing partons while the pp → Ttq with T → tH channel contains nine, from the hard scattering process. These partons subsequently hadronize to produce jets. For T quarks with a mass above 1 TeV, the decay products of the top quark and the Higgs boson are highly Lorentz-boosted and collimated, producing two hadronic jets. The accompanying jets are softer. Thus, the signature of a massive T quark would be the presence of highly boosted jets with masses corresponding to those of the top quark and the Higgs boson, and an overall large hadronic activity in the event. The T quark candidates are reconstructed using the top quark and Higgs boson jets. In the T quark candidate mass distribution, a localized excess of events above the SM background is expected in the presence of a signal. This is the first search for single electroweak production of a vector-like T quark in fully hadronic final states. Jet substructure and b tagging techniques are employed to identify the highly Lorentz-boosted top quark and Higgs boson arising from the decay of a TeV scale resonance. The search in these final states exploits the ability of jet substructure techniques to reconstruct hadronically decaying SM particles in a challenging fully hadronic environment.

Signal and background modeling
The single T quark production cross section and the branching fraction B(T → tH) are highly model dependent. The Simplest Simplified Model (SSM) framework [18] is used to model the signal events. In this framework, the coupling factors c bW L/R and c tZ L/R determine the strengths of the charged and neutral current interactions, as shown in Fig. 1 left and right, respectively, up to a factor of the electroweak coupling constant g W . Signal events for the processes pp → Tbq and pp → Ttq are generated for LH or RH interactions, with each of the corresponding LH or RH coupling factors set to unity, while the other is set to zero. Events are generated using the tree-level Monte Carlo (MC) event generator MADGRAPH 5.1.3.30 [19] for T quark masses from 1000 to 1800 GeV, in steps of 100 GeV. The signal widths are set to 10 GeV for all masses. The NNPDF3.0 [20] parton distribution function (PDF) set is used.
The main SM background processes are tt+jets and multijet production through the strong interaction. A smaller contribution comes from W+jets events. Events with a single top quark and a W boson (tW) are found to make a negligible contribution to the overall background composition. The tt+jets, W+jets, and tW background events are estimated using MC simulations. As it is difficult to accurately simulate multijet production, the contributions from these processes are estimated from data. All other SM processes have a negligible contribution to the background.
PDFs, the quantum chromodynamics (QCD) factorization and renormalization scales, and the strong coupling constant.
The multijet samples are generated using MADGRAPH 5.1.3.30 with up to four partons included in the matrix element calculation, and are used only to optimize the event selection and validate the background estimation procedure.
The samples generated using the MADGRAPH 5.1.3.30 or POWHEG v2 programs are interfaced with PYTHIA 8.212 [29] for showering and hadronization, using the underlying event tune CUETP8M1 [30], and with the MLM matching scheme [31] to match the additional partons from the hard process with those simulated using the parton shower algorithm. In all simulations, the mass of the Higgs boson is set to 125 GeV, while the top quark mass is set to 172.5 GeV.
Additional pp interactions (pileup) in concurrence with the hard interaction are simulated by overlaying low p T QCD interactions, using the PYTHIA 8.212 MC generator and a total inelastic pp cross section of 69 mb [32]. The distribution of the number of pileup events in the simulated samples is reweighted to match the distribution observed in the data. The generated signal events are processed using a GEANT4-based [33,34] simulation of the CMS detector.

The CMS detector and event reconstruction
The CMS detector, its coordinate system, and its kinematic variables are detailed in Ref. [35]. The detector consists of a superconducting solenoid of 6 m internal diameter at its core, providing a magnetic field of 3.8 T. Within the field volume are housed a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each of which is divided into a barrel and two endcap sections. The tracker extends from −2.5 to +2.5 in pseudorapidity η while the ECAL and the HCAL extend up to |η| = 3. Extensive forward calorimetry, up to |η| = 5, complements the coverage provided by the barrel and endcap detectors. Muons are measured in gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid, covering a region of |η| < 2.4.
Events are selected using a two-stage trigger system, requiring the presence of hadronic jets in the detector. The level-1 trigger selects events with jets, reconstructed from energy deposits in the ECAL and the HCAL, for further processing by the high-level trigger (HLT). The HLT reconstructs jets with p T > 40 GeV and |η| < 3 that are clustered using the particle flow (PF) algorithm [36,37] described below. The scalar sum of the jet p T (H T ) is required to be greater than 800 GeV for the event to be selected by the HLT for further processing.
Charged particle tracks are used to reconstruct the interaction vertices. The vertex with the highest sum of the p 2 T of clusters of associated tracks is chosen as the primary vertex. The PF event reconstruction algorithm reconstructs and identifies stable particles in the detector using an optimized combination of information from all subdetectors. The PF candidates are used to reconstruct jets using the anti-k T algorithm [38] implemented using the FASTJET package [39,40]. Charged particles not originating from the primary vertex are omitted in the jet clustering. The jet momentum is the vector sum of the momenta of all particles clustered in the jet.
The jet energy scale is determined from a detailed simulation of the CMS detector. The estimated pileup contribution to the jet energy is subtracted using an event-by-event jet area based correction [41,42]. Further corrections are applied to account for the detector response to hadrons as a function of the jet p T and η. Additional corrections are then applied to the data to account for any remaining differences with the simulations in the jet energy measurement.
From simulations, the average jet momentum is found to be within 5% of the true momentum over the whole range of detector acceptance. The jet energy resolution varies from 15-20% at 30 GeV to 5% at 1 TeV [43].
Two non-exclusive jet collections are reconstructed, one by clustering the PF candidates using the anti-k T distance parameter, in the η-φ plane, of 0.4 (AK4 jets), and the other using a distance parameter of 0.8 (AK8 jets). The former is used to calculate the H T , while the latter is used to reconstruct Lorentz-boosted top quark and Higgs boson jets. Jets are required to pass a standard set of quality criteria to reject detector and electronics noise misidentified as jets [44].

Event selection
Events passing the jet-based trigger are further required to have at least four AK4 jets with p T > 30 GeV and |η| < 5, and at least one AK8 jet with p T > 300 GeV and |η| < 2.4. We further require that the H T of such an event, where the sum of the p T is taken of all selected AK4 jets, is greater than 1100 GeV. These together constitute the preselection criteria for further processing of an event. The trigger efficiency of events passing the preselection criteria is found to be 100% with negligible uncertainty.
Jet grooming techniques [45] are applied to AK8 jets to identify hadronic decays of Lorentzboosted massive particles like H → bb or t → bqq . The pruning [46] and soft-drop [47,48] grooming algorithms are employed to remove soft contributions to the jet energy from the underlying event and pileup, and to reveal subjets coming from the hadronization of the hard partons arising from the massive particle decay. The mass of the jet is thus closer to that of the massive parent particle after grooming, and the subjets can be associated with its decay products. The groomed AK8 jets are required to pass further selection criteria to be identified either as Higgs boson-tagged (H-tagged), or top quark-tagged (t-tagged) jets. Values chosen for the selection parameters associated with the jet pruning and soft-drop algorithms, as well as with the N-subjettiness algorithm described below, are based on detailed studies of their performance in a sample of semileptonic tt+jets events with t-tagged or W boson-tagged jets, as described in Refs. [49,50].
The pruning parameters used are z cut = 0.1 and D cut = 0.5, while the soft-drop parameters are set to z cut = 0.1 and β = 0. Both the pruning and the soft-drop algorithms are applied to the same set of AK8 jets, yielding the pruned and soft-drop masses, respectively. The H-tagged jets require a pruned mass between 105-135 GeV. The t-tagged jets require a soft-drop mass within 110-210 GeV. The soft-drop subjets are further used for b tagging, for both the H-tagged and t-tagged jets. The combination of pruned mass for H tagging [51] and soft-drop mass for t tagging [52] was found to give the best rejection of pileup events and other backgrounds.
Besides the pruning and the soft-drop algorithms, the N-subjettiness algorithm [53], based on the computation of the inclusive jet shape variables τ N , is used. These variables quantify "lobes" of energy flow inside a jet [53]. A jet compatible with two substructures would have values of the ratio τ 2 /τ 1 much less than unity, as in a boosted H → bb decay. Likewise, a jet from a boosted t → bqq decay, with three substructures, would have the value of τ 3 /τ 2 much less than one. In contrast, jets with no substructure would exhibit larger values for both τ 2 /τ 1 and τ 3 /τ 2 . Thus these variables provide good discrimination against multijet backgrounds. The requirements on the ratios τ 2 /τ 1 < 0.6 and τ 3 /τ 2 < 0.54 are used for H and t tagging, respectively.
The soft-drop subjets are b-tagged to further suppress backgrounds. The combined secondary vertex b tagging algorithm (CSVv2) identifies subjets containing B hadrons using a combination of track and secondary vertex related variables [54]. For H tagging, the CSVv2 discriminator threshold is chosen to give a mistag rate of 10% for subjets from light flavored quarks and gluons, and a signal efficiency of 40-70%, depending on the subjet p T [55]. Both of the subjets are required to pass the b tagging requirement. Boosted jets with both subjets failing the b tagging criteria but otherwise satisfying the H tagging criteria ("anti-H-tagged") are used to define a control region for background estimation. For t tagging, one subjet is required to have a CSVv2 discriminator value that exceeds a more stringent threshold, to give an overall mistag rate of about 0.1% [50].
Jet energy scale corrections are applied to the H-tagged jet mass to obtain a better agreement with the Higgs boson mass. The H-tagged jet mass resolution in the simulations is degraded to match the observed W jet mass resolution in the data in a sample of tt+jets events with boosted hadronically decaying W bosons. The W jets are tagged in the same way as H-tagged jets, except that the pruned mass is required to be within the range 65-105 GeV, and the subjets are not b-tagged. The W jets are also used to obtain the ratio of the N-subjettiness selection efficiencies between the data and the simulations, which is applied to the simulated H jets as a scale factor. A simulation-based correction factor is applied to account for the difference in the jet shower profile of W → qq and H → bb decays. The b tagging efficiency scale factors, measured on a sample of jets with subjets required to contain a muon to enrich them in B hadron flavor [55], are likewise applied. The t jet tagging efficiency scale factor is obtained from boosted hadronically decaying top quarks where the decay products of the daughter W boson and the b quarks are merged and clustered as one AK8 jet [49,50].
It is observed that the MC simulations of the background do not model well the jet p T and H T distributions after the preselection [17]. The data/MC ratio of the H T distribution is described within statistical uncertainties by a 2-parameter linear fit with a significant negative slope parameter. The H T distributions of background components obtained from MC simulations are reweighted using the results of this fit. Cross-checks are performed in different control regions confirming the validity of this correction factor. The correction factor is applied to the predicted background, with a small impact on the background T quark candidate mass distribution. The H T reweighting has a negligible effect on the signal and is considered only as a systematic uncertainty.
The H-tagged and t-tagged jets are required to have p T values greater than 300 and 400 GeV, respectively. An AK8 jet that is simultaneously H-tagged and t-tagged is assigned to the latter category, although this occurs in less than 1% of the events. Furthermore, the H-tagged and t-tagged jets must have a separation in the η-φ plane, ∆R(H, t) > 2.0. These selection criteria define the signal region. The highest p T H-tagged and t-tagged jets satisfying the above requirements in each signal event are paired to form the T quark candidate, where the T quark mass, M(T), is taken to be the invariant mass of the dijet system. The search is performed by looking for a localized excess in the M(T) distribution above the SM background. The simulated reconstructed M(T) distributions for a few representative masses are shown in Fig. 2. The estimated mass resolution of the T quark candidates is about 5% for all simulated T quark masses. Table 1 gives some representative signal efficiencies for different T quark masses, for the pp → Tbq and pp → Ttq processes with LH couplings. The effective integrated luminosities of the simulated signal samples are much larger than the integrated luminosity corresponding to the data; hence the statistical uncertainties in the efficiencies are negligible. The efficiencies for the RH couplings are very similar to those for the LH couplings of the corresponding models.

Background estimation
The main backgrounds in the signal region are tt+jets, multijets, and, to a lesser extent, W+jets. The tW background is negligible, with none of the simulated events passing the full event selection, from a sample whose corresponding integrated luminosity is much larger than that of the data sample. All backgrounds except multijets are estimated using simulations.
The multijet background is estimated from the data by using four selection regions A, B, C, and D. Events in region A are required to have at least one anti-H-tagged jet and no H-tagged or t-tagged jets, while those in region B are required to have at least one anti-H-tagged jet and at least one t-tagged jet, and zero H-tagged jets. Events in region C should have at least one H-tagged jet and zero t-tagged jets. Region D is the signal region and contains events with at least one H-tagged and one t-tagged jet, as defined in the previous section. The tt+jets, W+jets, and the tW backgrounds all contribute to the A, B, and C regions.
The independence of the two variables that span the A, B, C, and D regions, i.e., the H tagging or anti-H-tagging, and the t tagging criteria, was validated using simulations. Since the two variables are uncorrelated, the number of events N A,B,C,D for the corresponding regions should follow the relation N A /N B = N C /N D . Thus, the number of background events in the signal region D would be determined by the number of events in the three control regions: The ABCD method is also used to obtain the background M(T) distribution for the signal region. The anti-H-tagged and t-tagged jets are paired to reconstruct the M(T) shape in the control region B. When multiplied by the ratio N C /N A , this gives the background M(T) shape in the signal region. A validation of the procedure is performed using simulations. The compatibility of the shapes in the B and D regions are verified using simulated QCD multijet samples. Moreover, the shapes of the data and simulation distributions in region B are found to be consistent, and thus the ABCD method is also expected to correctly predict the multijet background in the signal region D from the data in regions A, B, and C.
Since only the multijet background is estimated using the ABCD method, the simulated tt+jets, W+jets, and tW backgrounds are subtracted from the data in each of the A, B, and C regions to obtain the predicted multijet background in data for that region. The resulting numbers of events in the control regions are given in Table 2. The ratio N C /N A is found to be (7.4 ± 0.1) × 10 −2 . The total estimated background from all sources is given in Table 3, along with the number of observed events in the data. Since the backgrounds estimated using MC simulations are subtracted from the data in the control regions to estimate the multijets component of the background, the associated systematic uncertainties are anticorrelated between the simulated backgrounds and the multijets background. Hence the uncertainty in the total background is less than what one would obtain if the uncertainties in the individual backgrounds were added in quadrature. The H T and M(T) distributions in the data, estimated backgrounds, and the simulated signal are shown in Fig. 3. The overall level of agreement between the observed number of events and the background from the SM processes is within the estimated uncertainties (discussed in Section 6).

Systematic uncertainties
There are two types of systematic uncertainties in the signal and background predictions: those that affect only the total rate, and those that affect the rate and the M(T) distribution. Among the former are the integrated luminosity uncertainty of 2.7% [56], the pileup reweighting uncertainty of 5% in the total inelastic pp collision cross section, the cross section uncertainties in the simulated background predictions, and the uncertainties of 1-3% from the choice of the   PDF set, estimated using the PDF4LHC procedure [57]. The scale factor uncertainty due to the N-subjettiness selection for H tagging is 12.5%, and affects only the total event rate.
The jet energy and mass correction and resolution uncertainties affect the shapes of the M(T) distributions for both the simulated signal and background processes. The jet energy scale uncertainty is 1-2% and the jet energy resolution uncertainty is about 1%, while the jet mass correction uncertainty is 10%. The H T -reweighting has an uncertainty of 1-3% for the A-D regions used in the background estimation.
The subjet b tagging and the t tagging scale factor uncertainties also affect the M(T) shape. The t tagging scale factor uncertainty is the largest at about 15-30% over the entire p T range. The subjet b tagging scale factor systematic uncertainties are 2-5% for subjets from b quarks; they are a factor of two larger for c quarks, and about 10% for light quark and gluon subjets. As discussed in Section 5, the systematic uncertainties in the estimated multijets background is anticorrelated with those for the simulated tt+jets, W+jets, and tW backgrounds.

Results
We set limits on the product of the signal cross sections and the branching fraction B(T → tH) for the T quark produced in association with a top or a bottom through electroweak interac-tions. A binned likelihood fit to the data with the shapes of the M(T) candidate distributions for the background and the signal is made to obtain the 95% confidence level (CL) upper limit on the signal. The systematic uncertainties, treated as nuisance parameters in the likelihood function, are marginalized following a Bayesian approach [58,59].
The expected and observed limits are shown in Fig. 4 for different T quark masses, and with LH and RH couplings of the T quark to the third-generation SM quarks. The limits are listed in Table 4. The cross section limits are derived with a signal sample simulated using the narrow width of 10 GeV. Studies on samples generated using larger widths have established that the reconstructed M(T) distributions do not change significantly compared to the narrow width approximation for T quarks having a width of up to 10% of their masses. The signal selection efficiency is estimated to decrease by about 7% for a T quark with a width of 10%, which is well within the uncertainties of the measurement. Hence, the measured limits on the cross sections are valid within uncertainties for a T quark of width of up to 10%. The SSM does not predict a RH singlet or a LH doublet, and thus theoretical curves are not shown for the upper right and the lower left plots of Fig. 4. However, it should be noted that such couplings may still be possible in a non-minimal model. Furthermore, the observed limits on the cross sections correspond to values of the coupling factors that are larger than those associated with narrow resonances in the SSM. For a resonance width of 10% of the mass, which is the largest value for which the quoted limits are valid, the expected couplings lie between 0.6-0.3 for a T quark of mass between 1000-1800 GeV.

Summary
A search for a vector-like top quark partner T in the single production mode is performed using proton-proton collision events at √ s = 13 TeV collected by the CMS experiment in 2015. The T quarks are assumed to couple only to the standard model third-generation quarks. The  [18,60], which predicts the existence of a left-handed and right-handed coupling for a singlet and doublet T quark, respectively. The benchmark coupling parameter values of c bW L = 0.5 and c tZ R = 0.5 are chosen for the comparison.
decay channel studied is T → tH, with hadronic top quark decay and H → bb. Boosted H and t tagging techniques are used to identify the Higgs boson and the top quark decays in the final state, and the invariant mass of the two gives the T quark candidate mass. The background is mostly due to the standard model tt+jets, with some contribution from multijet and W+jets processes. No significant excess of data above the background is observed in the T quark candidate mass distribution. The 95% confidence level upper limits on the product of the signal cross sections and the branching fraction B(T → tH) are set using Bayesian statistics. These vary between 0.31-0.93 pb for a T quark of mass ranging from 1000 to 1800 GeV, in the pp → Tbq and pp → Ttq production channels with left-handed and right-handed couplings to the standard model third-generation quarks. In the mass range considered for this analysis, the search sensitivity is essentially the same as that using leptonic final states [17]. The use of boosted techniques has led to an extension of the search region beyond those of previous analyses. This is the first time fully hadronic final states have been exploited in the search for single electroweak production of vector-like quarks at a hadron collider.