Search for standard model production of four top quarks in the lepton + jets channel in pp collisions at sqrt(s) = 8 TeV

A search is presented for standard model (SM) production of four top quarks (t t-bar t t-bar) in pp collisions in the lepton + jets channel. The data correspond to an integrated luminosity of 19.6 inverse femtobarns recorded at a centre-of-mass energy of 8 TeV with the CMS detector at the CERN LHC. The expected cross section for SM four top quark production is approximately 1 fb. A combination of kinematic reconstruction and multivariate techniques is used to distinguish between the small signal and large background. The data are consistent with expectations of the SM, and an upper limit of 32 fb is set at a 95% confidence level on the cross section for producing four top quarks in the SM, where a limit of 32 +/- 17 fb is expected.


Introduction
Since its discovery in 1995 at the Fermilab Tevatron [1,2], the top quark has been studied primarily using events containing top quark-antiquark pairs (tt) and events containing a single top quark.With the larger centre-of-mass energy and luminosity of the CERN LHC, the study of more rare processes involving top quarks becomes possible.One such process is the production of four top quarks (tttt).In the standard model (SM), tttt production proceeds via gluon-gluon fusion or quark-antiquark annihilation.Feynman diagrams contributing to this process at leading order (LO) are shown in Fig. 1.The cross section for SM tttt production at the LHC is predicted, at LO, to be extremely small: σ SM tttt ≈ 1 fb at √ s = 8 TeV [3].Next-toleading-order (NLO) corrections increase the cross section by as much as 30% [4].The main background is due to tt production, a process that has a cross section more than five orders of magnitude larger [5] and is one of the reasons that a tttt signal has not yet been observed.As the data used in this paper correspond to an integrated luminosity of 19.6 fb −1 , there are ≈20 tttt events expected in the data.Because of the very large tt background, the direct observation of events leading to a measurement of σ tttt is unlikely.However, in many models beyond the SM (BSM) involving massive coloured bosons, Higgs boson or top quark compositeness, or extra dimensions, σ tttt is enhanced [4,[6][7][8][9][10][11].In some supersymmetric extensions of the SM, tttt final states can also be produced via cascade decays of coloured supersymmetric particles such as squarks and gluinos [12].In certain regions of BSM parameter space, these final states have kinematics similar to those of SM tttt production.In such cases, reinterpretation of an upper limit on SM production of tttt has the potential to constrain BSM theories.Moreover, in direct searches for these BSM signatures, SM production of tttt can be a background.Hence, experimental constraints on σ tttt have the potential to enhance the discovery reach of such searches.This paper presents a search for SM production of tttt in events that contain a single lepton ( ) and multiple jets.Signal events are sought in final states with a single muon (µ) or a single electron (e), in what are termed µ + jets and e + jets channels.The muon or electron originates either from the direct decay of a W boson or from the leptonic decay of a τ lepton in t → bW, W → τν τ , τ → ν ν τ .The chosen final state has a larger branching ratio (≈41%) than the zero-lepton (≈30%), two-lepton (≈22%) or three-and four-lepton (≈6%) final states, when only muons and electrons are considered as final-state leptons.Kinematic reconstruction techniques and multivariate analyses (MVA) are used to discriminate the tttt signal from the tt background.

Data and simulation
The data are collected using triggers based on the presence of a muon candidate with p T > 24 GeV, or an electron candidate with p T > 27 GeV.The signal process is modelled at LO using the MADGRAPH (v5.1.3.30)Monte Carlo (MC) generator [14].The tt background is modelled at LO with up to three additional partons in the zero-lepton, one-lepton and two-lepton final states, also using MADGRAPH, as is the production of tt + electroweak (EW) bosons and the production of EW bosons with additional partons.Single top quarks and any additional partons produced in the tand s-channels, and in association with a W boson, are modelled at NLO using POWHEG (v1.0 r1380) [15][16][17][18][19].The tt + Higgs boson and diboson (WW, WZ and ZZ) processes are modelled at LO using PYTHIA 6.426 [20].The decays of τ leptons are simulated with TAUOLA (v27.121.5)[21].The simulation of additional initial-state and final-state radiation, and the fragmentation and hadronisation of quarks and gluons are performed using PYTHIA with the Z2* tune [22].The (LO) CTEQ6L1 [23] set of parton distribution functions (PDF) is used with the MADGRAPH and PYTHIA samples while the (NLO) CTEQ6M set is used with the POWHEG samples.The generated events are passed through a full simulation of the CMS detector, based on the GEANT4 package [24].Next-to-leading-order or, when available, next-to-next-to-leading-order (NNLO) cross sections are used to normalise the predictions.
For all samples of simulated events, multiple minimum bias events generated with PYTHIA are added to simulate the presence of additional proton-proton interactions in a single or neighbouring proton bunches (pileup).To refine the simulation, the simulated events are weighted to reproduce the distribution in the number of reconstructed vertices observed in data.
The cross section ratio σ ttbb /σ ttjj is measured by CMS to be 2.2 ± 0.3 (stat) ± 0.5 (syst)%, where the ttjj process is defined as the production of a tt pair with any two additional jets (j) [25].As the MADGRAPH tt simulation used in this analysis predicts σ ttbb /σ ttjj to be 1.2%, the simulation is corrected to the ratio of cross sections measured in Ref. [25].To illustrate the composition of this dominant background, the sample is divided into two subcategories.The categories correspond to the production of tt with additional light quarks or gluons "tt + ll/gg" and to the production of tt with additional charm or bottom quarks "tt + cc/bb ".The samples of electroweak bosons with additional partons and of single top quarks are grouped into a category termed "EW", while samples of tt + electroweak bosons, tt decaying into final states containing zero or two leptons are grouped into a category termed "tt other".

Event reconstruction and selections
Events are reconstructed using a particle-flow (PF) algorithm [26,27].This proceeds by reconstructing and identifying each final-state particle using an optimised combination of all subdetector information.Each event is required to have at least one reconstructed vertex.The primary vertex is chosen as the vertex with the largest value of Σp 2 T of the tracks associated with that vertex.Additional selection criteria are applied to each event to reject events with features consistent with arising from detector noise and beam-gas interactions.
The energy of electrons is determined from a combination of the track momentum at the primary vertex, the corresponding ECAL energy cluster, and the energy sum of the reconstructed bremsstrahlung photons associated with the track.The energy of muons is obtained from the corresponding track momentum obtained in a combined fit to information from the inner silicon trackers and outer muon detectors.The energy of charged hadrons is determined from a combination of the track momentum and the corresponding ECAL and HCAL energies, corrected for the suppression of small signals, and calibrated for the non-linear response of the calorimeters.Finally, the energy of neutral hadrons is obtained from the corresponding calibrated ECAL and HCAL energies.As charged leptons originating from top quark decays are typically isolated from other particles, a variable (I rel ) is constructed to select lepton candidates based on their isolation.It is defined as the scalar sum of the p T values of the particles reconstructed within an angle ∆R of the axis of the momentum of the lepton candidate, excluding the lepton candidate, divided by the p T of the lepton candidate.The ∆R is defined as ∆R = (∆η) 2 + (∆φ) 2 , where ∆η and ∆φ are the differences in pseudorapidity and azimuthal angles between the lepton candidate and any other track or energy deposition.A muon or an electron candidate is rejected if I rel is greater than or equal to 0.12 or 0.1, for respective values of ∆R of 0.4 and 0.3.
Jets are clustered from the reconstructed particles using the infrared-and collinear-safe anti-k T algorithm [28], with distance parameter R = 0.5, as implemented in the FASTJET package [29].The jet momentum is defined by the vectorial sum of the momenta of all of the particles in each jet, and is found in the simulation to be within 5% to 10% of the true jet momentum, for the entire p T spectrum of interest and detector acceptance [30].Corrections to the jet energy scale (JES) and the jet energy resolution (JER) are obtained from the simulation and through in situ measurements of the energy balance of exclusive dijet and photon + jet events.An offset correction is applied to take into account the extra energy clustered into the jets from pileup.Muons, electrons, and charged hadrons originating from pileup interactions are not included in the jet reconstruction.Missing transverse energy (E miss T ) is defined as the magnitude of the vectorial sum of the p T of all the selected jets and leptons in the event.Charged hadrons originating from pileup interactions are also not included in the reconstruction of E miss T .Jets are classified as b quark jets through their probability of originating from the hadronisation of bottom quarks, using the combined secondary vertex (CSV) b tagging algorithm, which combines information from the significance of the track impact parameter, the jet kinematics, and the presence of a secondary vertex within the jet [31].
To preferentially select tttt events while suppressing backgrounds, events that pass the muon or electron trigger are required to pass baseline selections corresponding to the µ + jets or e + jets channels.The selections comprise a series of criteria applied to the objects in the offlinereconstructed event.The selections require the presence of one well-identified and isolated muon or electron [32,33] with p T > 30 GeV and with respective muon or electron |η| < 2.1 or 2.5.Jets are required to have p T > 30 GeV and |η| < 2.5.All events are required to have the number of selected jets (N jets ) to be at least six.For a jet to be b tagged, it must pass a requirement of the CSV algorithm [31] that provides a misidentification rate of ≈1% for light quark and gluon jets, and corresponds to b tagging efficiencies of 40-75%, depending on jet p T and η.Events are required to have the number of b-tagged jets (N btags ) to be at least two.The requirements on N jets and N tags strongly suppress background events arising from vector boson + jets and single top quark production.The H T of an event is defined as the scalar sum of the p T of all the selected jets in the event.Events are also required to have H T > 400 GeV and E miss T > 30 GeV, which removes the residual background arising from multijet processes.Small corrections are applied to the simulated events to account for the differences between efficiencies in data and simulation for the above lepton and b tagging requirements.

Event classification with an MVA algorithm
To obtain the greatest possible discrimination between tttt events and the dominant tt background, variables sensitive to the different processes are exploited in a multivariate (MVA) discriminant.The selected variables are grouped into three categories based on the underlying physical characteristics that they exploit: content of top quarks, jet activity, and b quark jet content.

Multiplicity of top quarks
The presence of multiple jet-decaying top quarks in tttt events can be exploited to distinguish such events from tt background, which contains only a single jet-decaying top quark.The challenge in the kinematic reconstruction of such top quarks in an event containing many jets is to find correct selections of three jets that arise from any single top quark when many incorrect three-jet combinations are possible.Such correctly selected combinations are referred to as "correct trijets", while combinations containing one or more jets not originating from the same top quark are referred to as "incorrect trijets".The large number of incorrect trijets in signal and background events motivates the use of MVA methods as in Ref. [34] to distinguish between correct and incorrect trijets by combining information from a set of input variables.
Within the trijet, the dijet system originating from W boson decay is attributed to the two jets with smallest ∆R separation.The invariant mass of this dijet and the invariant mass of the trijet are used as input variables in the MVA.For correct trijets, these variables have respective values close to that of the W boson and of the top quark.The azimuthal separations between the trijet and dijet systems and between the trijet and the jet not selected for the dijet system are also used.These variables typically have smaller values for correct trijets.The ratio of the magnitude of the vectorial p T sum to the scalar p T sum of the trijet system has typically larger values for correct trijets and is included as an input variable.Finally, the discriminant of the CSV b tagging algorithm for the jet not selected in the dijet provides, for correct trijets, values expected for b quark jets and is therefore included.These variables are combined in a boosted decision tree algorithm (BDT trijet ) using the TMVA package [35].Of the simulated tt events that pass the baseline selections, approximately 61% have the trijet with the largest BDT trijet discriminant value for the t → trijet decay.
Following the baseline selection, it should be possible to reconstruct multiple all-jet decays of top quarks in tttt events, but not in tt events.The BDT trijet distribution in the highest-scoring trijet discriminant has a similar form for tttt and tt events.However, the second-highest ranking trijet frequently reflects correct trijets in tttt events and incorrect trijets in tt events.Hence, the trijet with the largest value of BDT trijet discriminant is removed from the event, and the BDT trijet discriminant of the trijet of the remaining jets with highest value (BDT trijet2 ) is used to distinguish between tttt and tt events.Distributions in this variable, in N jets and in N btags in data and simulation are shown in Fig. 2. The uncertainty in "scale" in the legend refers to the changes produced through changes of factors of two and one half in the factorisation and renormalisation scales of the calculation, as discussed in Section 6.The successful reconstruction of a third all-jet decay of a top quark is unlikely, and is only possible in the small fraction of events containing at least nine jets.Hence it provides negligible additional discriminating power and is not used.
The reduced event (RE) is constructed by subtracting the jets contained in the highest BDT trijetranking trijet.In tt events, the RE will typically contain only jets arising from t → b ν decays of the top quark, initial-and final-state radiation, the underlying event, and pileup interactions.Conversely, a tttt RE can contain up to two all-jet top quark decays, and as a result numerous energetic jets.Two variables based on the RE are (i) H RE T , i.e., the H T of the RE and (ii) M RE , i.e., the invariant mass of the system comprising all the jets in the RE.

Jet activity
Because tttt events can contain up to ten hard jets from top quark decays, while tt events contain up to four, the following variables based on jet activity of the event possess discrimination power: (i) N jets , (ii) H b T , (iii) H T /H p , (iv) H ratio T , (v) p T5 , and (vi) p T6 .The H b T variable is defined to be the H T of the b-tagged jets.In the H T /H p ratio, H p is the scalar sum of the total momenta of the selected jets.The ratio of the H T of the four leading jets to the H T of the other jets is defined as H ratio T .The p T5 and p T6 variables represent, respectively, the p T values of jets of 5th and 6th largest p T .All these variables are used in the discriminant described in Section 5.4.

Multiplicity of bottom quarks
The analysis assumes that the top quark decays with the SM branching ratio of B(t → bW) = 1.Hence, tttt events contain four bottom quarks from top quark decays whereas tt events contain only two bottom quarks from top quark decays.Therefore the multiplicity of b-tagged jets is a potential source of discriminating power, and is also used in the discriminant discussed in Section 5.4.

Event-level BDT
The ten variables described in Sections 5.1, 5.2, and 5.3 are combined using a second, eventlevel BDT (BDT event ).To maximise sensitivity, the events are divided into three categories corresponding to N jets = 6, 7 and, > 7, where the N jets = 6 category is used as a sideband region to constrain the tt background in the calculation of limits on tttt production.In Fig. 3, distributions in the BDT event discriminant are shown in data and in MC for each of these categories.

Systematic uncertainties and limits on tttt production
The systematic uncertainties considered in this analysis are separated into two categories: (i) those that affect the normalisations of the BDT event discriminant distributions of both signal and backgrounds and (ii) those that affect the form of the distributions of just the backgrounds.The normalisations are affected by the uncertainty in integrated luminosity of the data and the theoretical cross sections of the signal and background processes.An uncertainty in the integrated luminosity of 2.6% is included [36].The uncertainty in the tt cross section is expected to dominate, and is taken from Ref. [5]   Figure 2: The distribution in the BDT trijet2 discriminant for the µ + jets and e + jets channels in (a) and (b), respectively, and the same for N jets in (c) and (d), and for N btags in (e) and (f).The ratios plotted at the bottom of each panel reflect the percent differences between data and MC events.The hatched areas show the changes in the calculated predictions produced by factors of two and one half changes in the factorisation and renormalisation scales in the tt simulation.affect the form of the distributions of the BDT event discriminant.As tt is the dominant background, systematic effects on the form of the distributions are considered only for tt events.The impact of contributions from higher-order corrections in the tt simulation is quantified by Figure 3: The distribution in the BDT event discriminant for data and simulation in events with N jets = 6 for the µ + jets and e + jets channels in (a) and (b), respectively, and the same in events with N jets = 7 in (c) and (d), and in events with N jets > 7 in (e) and (f).The ratios plotted at the bottom of each panel reflect the percent differences between data and MC events.The hatched areas reflect the changes in the calculated predictions produced by factors of two and one half changes in the factorisation and renormalisation scales (see Section 6).comparing alternative tt samples that are generated with the renormalisation and factorisation scales simultaneously changed up and down by a factor of two relative to the nominal tt sam-ple.The matching of partons originating from the matrix element to the jets from the parton showers is performed according to the MLM prescription [37].The uncertainty arising from this prescription is estimated by changing the minimum k T measure between partons by factors of 0.5 and 2.0 and the jet matching threshold by factors of 0.75 and 1.5.To evaluate the uncertainty due to imperfect knowledge of the JES, JER, b tagging, and lepton-identification efficiencies, and the cross section for minimum-bias production used in the pileup-reweighting procedure in simulation, the input value of each parameter is changed by ±1 standard deviation of its uncertainty.A systematic uncertainty due to the imperfect knowledge of the contribution from the ttbb component in tt events is also estimated.As mentioned previously, a correction is applied to the tt simulation to reproduce the observed [25] ratio of σ ttbb to σ ttjj .The systematic uncertainty associated with the imperfect knowledge of this ratio is estimated by changing this correction by ±50%.
No significant excess of events to represent SM tttt production is observed above the background prediction.Therefore, an upper limit on σ tttt is set by performing a simultaneous maximum likelihood fit to the distributions in the BDT event discriminant for signal and background in the six event categories described in Section 5.The systematic uncertainties in the normalisation and the form of the distributions of the discriminant are accommodated by incorporating the uninteresting nuisance parameters into the fit.The contribution of the nuisance parameters to the likelihood function are modelled using log-normal functions for normalisations and Gaussian functions for forms of the distributions.The functions have widths that correspond to the ±1 standard deviation changes of the systematic sources described in the previous section.Statistical uncertainties in the simulation are taken into account by applying a "lightweight" version of the Beeston and Barlow method [38] where one nuisance parameter is associated with the estimate of the total simulation and the statistical uncertainty in each bin.The best-fit values of the nuisance parameters show only statistically insignificant deviations from their input values.In particular, the best-fit value of the parameter corresponding to the ttbb correction is consistent with the result obtained in Ref. [25].
The modified frequentist CL s approach [39,40] using the asymptotic approximation is adopted to measure the upper limit using the ROOSTATS package [41,42].The limit calculated at a 95% confidence level (CL) on the production cross section σ tttt is 32 fb, where a limit of 32 ± 17 fb is expected.These limits are approximately 25 × σ SM tttt .

Summary
A search for events containing four top quarks was performed using data collected with the CMS detector in lepton + jets final states at √ s = 8 TeV, corresponding to an integrated luminosity of 19.6 fb −1 .The analysis had three stages.First, a baseline selection was used to select signal events while suppressing backgrounds.Second, to further discriminate between signal and background, an event classification scheme based on a BDT algorithm was defined to exploit differences in the multiplicity of top quarks, jet activity, and the multiplicity of bottom quarks.Third, a simultaneous maximum likelihood fit of the BDT event discriminant distributions was performed, from which an upper limit on σ tttt of 32 fb was calculated at a 95% CL, where a limit of 32 ± 17 fb was expected.These limits are approximately 25 × σ SM tttt .This result raises the prospect of the direct observation of SM tttt in future CMS data at the higher centreof-mass energies of 13 and 14 TeV, where σ SM tttt is predicted to be ≈9 fb and 15 fb, respectively [3,14].Furthermore, this result has the potential to constrain BSM theories producing tttt final states with kinematics similar to the SM process, and enhance the discovery reach of BSM searches where SM production of tttt constitutes a possible background.

Figure 1 :
Figure 1: Leading-order Feynman diagrams for tttt production in the SM from gluon-gluon fusion (left) and quark-antiquark annihilation (right).