1 Introduction

Top-quark pairs (\(\mathrm {t}\bar {\mathrm {t}}\)) are copiously produced at the Large Hadron Collider (LHC) primarily through gluon–gluon fusion. The measurements of the \(\mathrm {t}\bar {\mathrm {t}}\) production cross section and branching fractions are important tests of the standard model (SM), since the top quark is expected to play a special role in various extensions of the SM due to its high mass (see for example [1, 2]).

The branching fraction of a top-quark decay to a W boson and a b quark is close to 100 % in the SM. Therefore, the final states from the top-quark decays are given by the decay mode of the W bosons. In this Letter top-quark decays in the “hadronic τ+jets” final state are studied. One W boson decays into a hadronically decaying τ lepton (τ h) and a neutrino with a branching fraction of 0.1125×0.647 [3] and the other one decays to a quark-antiquark pair with a branching fraction of 0.676 [3]. Thus, 9.8 % of the \(\mathrm {t}\bar {\mathrm {t}}\) pairs produced are expected to lead to this final state.

The branching fraction of \(\mathrm {t}\bar {\mathrm {t}}\) to τ h+jets final states is expected to be the largest one among those with τ leptons in the final state. The existence of charged Higgs bosons could give rise to an enhanced cross section in this channel. The top quark would decay via t→H+b and the charged Higgs boson subsequently decays to a τ lepton via H+τ + ν τ . The present status of the charged Higgs boson search in \(\mathrm {t}\bar {\mathrm {t}}\) final states with the Compact Muon Solenoid (CMS) detector is described in Ref. [4].

In this Letter we present a measurement of the \(\mathrm {t}\bar {\mathrm {t}}\) production cross section in the τ h+jets final state in proton–proton collisions at \(\sqrt{s}=7\ \mbox{TeV}\) using data collected by the CMS experiment. It is the first such measurement performed using the CMS detector and complements the measurement performed in the τ h+lepton channel [5]. The \(\mathrm {t}\bar {\mathrm {t}}\) production cross section in the τ h+jets final state has previously been measured in proton-antiproton collisions at \(\sqrt{s}=1.96\ \mbox{TeV}\) at the Tevatron [6, 7] and more recently in proton–proton collisions at \(\sqrt{s}=7\ \mbox{TeV}\) using the ATLAS detector [8]. All measurements referenced above have been found to be in agreement with the SM expectations.

2 The CMS detector

The central feature of the CMS apparatus is a superconducting cylindrical solenoid 6 m in diameter, which provides an axial magnetic field of 3.8 T. Within the field volume are the silicon pixel and strip trackers, the crystal electromagnetic (ECAL) and brass/scintillator hadronic (HCAL) calorimeters which provide identification of charged, electromagnetic and hadronic particles up to pseudorapidities of |η|<2.5 (trackers) and |η|<3.0 (calorimeters). The pseudorapidity is defined as η=−ln[tan(θ/2)], where θ is the polar angle measured with respect to the positive z axis of the right-handed coordinate system used by the CMS experiment. The x axis points towards the center of the LHC ring, the y axis is directed upward along the vertical and the z axis corresponds to the anticlockwise-beam direction. In addition the CMS detector has extensive forward calorimetry. Muons are measured in gas detectors embedded in the steel return yoke outside the solenoid. The excellent tracker impact parameter resolution of ≈15 μm and transverse momentum (p T) resolution of ≈1.5 % for 100 GeV particles support a robust identification of τ h and jets arising from b quark hadronization. A detailed description of the CMS detector can be found elsewhere [9].

3 Event simulation

Monte Carlo (MC) simulation is used to determine the signal efficiency as well as the contribution from electroweak and \(\mathrm {t}\bar {\mathrm {t}}\) background processes (i.e. contributions from the full hadronic, lepton+jets, τ h+lepton and τ h τ h channels). The \(\mathrm {t}\bar {\mathrm {t}}\) signal and background events as well as the W/Z+jets events are simulated using the MadGraph (v.5.1.1.0) [10] generator using the CTEQ6L1 [11] parton distribution functions (PDFs). The simulation of parton showering, fragmentation, hadronization and decays of short-lived particles, except τ leptons, is performed by pythia (v.6.424) [12]. Tau lepton decays are simulated using tauola (v.2.75) [13]. Single top-quark events are generated using powheg (r1380) [14] interfaced to pythia and tauola. The top-quark mass is set to 172.5 GeV, and the approximate next-to-next-to-leading-order (NNLO) \(\mathrm {t}\bar {\mathrm {t}}\) production cross section of 164±10 pb is calculated using the MSTW2008 next-to-next-leading-log PDFs [15]. Simulated events are weighted to reflect the distribution of the number of multiple interactions (pileup) observed in data. Data-to-simulation b-tagging efficiency scale factors are applied to correct for the differences between data and simulation.

4 Dataset

The total integrated luminosity of the dataset analyzed is 3.9 fb−1. A multijet trigger, in which one of the jets is required to be identified as a hadronically decaying τ lepton, was designed to record \(\mathrm {p} \mathrm {p} \to \mathrm {t}\bar {\mathrm {t}} \rightarrow \tau _{\mathrm {h}} +\text{jets}\) events. It consists of two consecutively applied filters, referred to as jet and τ filters. The jet filter requires the presence of four central jets reconstructed in the calorimeter (|η|<2.5, p T>40 GeV), referred to as calorimeter jets. The τ filter requires the presence of one isolated particle-flow [16] τ candidate (|η|<2.5, p T>40 GeV, at least one track with p T>5 GeV), matched to one of four trigger jets. Due to the increasing rate of the recorded events with the rising instantaneous luminosity, the thresholds on the jets and τ lepton were raised to p T>45 GeV during the later part of the data taking period. About 80 % of the data were recorded with that more restrictive trigger configuration. The overall \(\mathrm {p} \mathrm {p} \to \mathrm {t}\bar {\mathrm {t}} \rightarrow \tau _{\mathrm {h}} +\text{jets}\) trigger efficiency is small, approximately 1 %, with respect to all generated \(\mathrm {t}\bar {\mathrm {t}} \to \tau _{\mathrm {h}} +\text{jets}\) events. The small efficiency is due to the high p T threshold on the hadronically decaying τ lepton.

5 Event selection

The object reconstruction relies on the particle-flow technique. The event selection is based on the presence of at least four particle-flow jets, reconstructed with the anti-k T clustering algorithm [17, 18] with a distance parameter of R=0.5, and on the presence of one particle-flow τ candidate reconstructed with the hadron-plus-strip (HPS) identification algorithm [19]. The HPS algorithm exploits the ability of the particle-flow to reconstruct resonances in the τ decay. It considers candidates with one or three charged hadrons and up to two neutral pions, with a net charge of ±1e.

The τ candidates are required to be isolated: the sum of the transverse energies of the additional charged hadrons and photons (τ decay products excluded) reconstructed in an isolation cone of \(\Delta R=\sqrt{(\Delta\eta)^{2}+(\Delta\phi)^{2}}=0.5\) (where ϕ is the azimuthal angle in radians) around the τ candidate should be less than 1 GeV. The τ reconstruction efficiency is estimated to be approximately 44 % for τ h candidates with p T>20 GeV, |η|<2.3, selected in genuine Z→τ + τ events, with a corresponding misidentification efficiency for jets of 0.5 % [19].

Furthermore, τ candidates are required to pass discriminators against muons and electrons. The discrimination against electrons relies on a boosted decision tree that combines variables that characterise the presence of neutral particles reconstructed in the τ decay (e.g., number of constituents, cluster shapes, energy fractions), as well as the presence of a charged hadron and electromagnetic particles (e.g., energy fractions, electron–pion discriminator). To suppress the contamination from muons, the leading track of the τ h candidate is vetoed if identified as a muon in the muon detectors. In addition, a single-charged τ h candidate should not be identified as a minimum ionising particle: the ratio of the sum of the energy deposits in the ECAL and HCAL calorimeters associated to the τ h candidate over the leading track momentum of the τ h candidate should be larger than 0.2.

Three jets are required to have p T>45 GeV, |η|<2.4 and the τ h candidate p T>45 GeV, |η|<2.3. The offline jets and offline τ h candidates are explicitly matched to the jets and τ h candidates used by the trigger. The presence of an additional jet with p T>20 GeV (τ h candidate excluded) is required.

Since two b-jets from the top-quark decays are expected in the final state, at least one jet is required to be identified as a b-jet using the medium working point of the jet probability algorithm [20]. At a misidentification probability for light-flavored jets of 1 %, a b-tagging efficiency of 60 % is achieved for this working point of the tagging algorithm.

A veto on the presence of loosely isolated electrons and muons is applied to further prevent the misidentification of genuine electrons and muons as τ h candidates. The isolation requirement is defined as I/p T<0.15, where I is the sum of the transverse energy deposits in the ECAL and HCAL calorimeters and p T is the scalar value of the track momenta in a cone with ΔR=0.3 centered on the lepton direction, excluding the lepton p T.

The momentum imbalance, \(\boldsymbol {p}_{\mathrm {T}} ^{\text{miss}}\), is defined as the opposite of the vectorial sum of the particle transverse momenta, using all particles reconstructed by the particle-flow algorithm. The transverse missing energy, \(E_{\mathrm {T}}^{\text {miss}}\), is defined as the magnitude of this quantity and is required to be greater than 20 GeV to reject the multijet background and to achieve a good separation for the input variables used in the artificial neural network described in Sect. 6. Events which pass this set of criteria constitute the preselected sample, from which the yield is extracted.

The trigger efficiencies have been measured in data, determining separately the efficiency of a single jet and a single τ h to pass the trigger requirements. The single-jet efficiency has been measured in events containing four particle-flow jets in the central region, three of them matched to the trigger jets. The fourth jet is used as a probe jet and the single-jet efficiency is computed with respect to its match to the fourth trigger jet. The efficiency of a single particle-flow jet with p T≈45 GeV to pass the single-jet requirement of the trigger is 70±1 % (54±1 % for the more restrictive trigger). The jet trigger plateau is reached above ≈120 GeV due to the different energy scale of particle-flow jets and calorimeter jets.

The τ h trigger efficiency has been measured in the events that satisfy the jet filter requirement and that contain a reconstructed τ h candidate matched to one of the four trigger jets. The τ h trigger plateau is reached for p T>45 GeV (respectively p T>50 GeV for the more stringent trigger) yielding an efficiency of 90±1 % (92±1 %). The trigger efficiency is modeled in simulation by multiplying the trigger efficiencies obtained for the three most energetic central jets and the trigger efficiency obtained for the τ h candidate.

6 Background estimation

The largest background for this analysis comes from high-multiplicity multijet events where one of the jets is misidentified as a τ h, and represents approximately 90 % of the expected background. While control samples in data are used to evaluate the multijet background, the estimation of the other contributions from \(\mathrm {t}\bar {\mathrm {t}}\) backgrounds and electroweak processes, such as single top-quark production and W/Z+jets events, relies on MC simulation. Given the low expected signal over background ratio expected after preselection, an artificial neural network (ANN) is used to discriminate signal and background.

6.1 Multijet background

The multijet background is estimated from data by using the same selection as the preselected sample except that a veto is applied on the presence of a b-tagged jet. From simulated events, we expect the resulting sample, referred to as multijet sample, to contain less than 0.6 % of \(\mathrm {t}\bar {\mathrm {t}} \rightarrow \tau _{\mathrm {h}} +\text{jets}\) events, less than 0.3 % of \(\mathrm {t}\bar {\mathrm {t}}\) background events and less than 2.0 % of W+jets and Z+jets events. Therefore the multijet sample provides a good representation of the multijet background and is used to train the ANN.

To account for the kinematic bias of the b-tag veto in the multijet sample, as the b-tagging efficiency depends on the jet momenta, the selected multijet events in data are weighted by the misidentification probability to select at least one b-jet in the event. This assumes that the jets are predominantly light flavored:

with ji, where P j stands for the misidentification probability of a light-flavored jet and has been measured for different p T and η bins in control samples in data [20].

6.2 Artificial neural network

The following seven variables are used to build an artificial neural network: the scalar sum of the transverse momenta of all the selected jets and the τ h, H T, the aplanarity, the τ h charge multiplied by the absolute value of the pseudorapidity of the τ h candidate, q(τ h)⋅|η(τ h)|, the missing transverse energy, \(E_{\mathrm {T}}^{\text {miss}}\), the azimuthal angle between the τ h candidate and the missing transverse energy direction, \(\Delta\phi( \tau _{\mathrm {h}} , \boldsymbol {p}_{\mathrm {T}} ^{\text{miss}})\), the invariant mass of the system of all the selected jets and the τ h candidate, M(jets,τ h), and the χ 2 returned by a kinematic fit constraining the hadronically decaying W boson and top-quark masses to m W=80.4 GeV and m top=172.5 GeV. The aplanarity, \(A=\frac{3}{2}\lambda_{1}\), is used to describe the spherical topology of the top-quark decay products: λ 1 is the smallest eigenvalue of the momentum tensor \(M^{\alpha\beta}=\sum_{i}p^{\alpha}_{i}p^{\beta}_{i}/\sum_{i}|\vec{p_{i}}|^{2}\), where i runs over the number of jets and the τ h candidate and α,β=1,2,3 specify the three spatial components of the momentum. The τ h charge multiplied by the absolute value of the pseudorapidity of the τ h candidate, q(τ h)⋅|η(τ h)|, is used to account for the charge-symmetric nature of \(\mathrm {t}\bar {\mathrm {t}}\) events in contrast to W+jets events produced in proton–proton collisions. The τ h charge is defined as the sum of the charges of the charged hadrons selected by the HPS algorithm. The training is performed using simulated \(\mathrm {t}\bar {\mathrm {t}} \to \tau _{\mathrm {h}} +\text{jets}\) events passing the preselected sample criteria and events from the multijet sample.

6.3 Signal yield extraction

To minimise the statistical uncertainty of the cross section measurement, we fit the entire ANN output, D NN, distribution rather than counting events above a given value. The extraction of the yield is performed via a two-component binned negative log-likelihood fit of the shapes of expected signal and multijet background, derived, respectively, from simulation and the multijet sample, to the data. The shapes for the \(\mathrm {t}\bar {\mathrm {t}}\) background and the electroweak processes, and their normalizations are fixed to the expectation from simulation. Table 1 summarizes the contribution of the various processes. The number of signal events among the 3050 selected events is 383±29. The fit uncertainty is given for the number of signal and multijet events, whereas for the remaining backgrounds the statistical uncertainty is due to the limited size of the simulated samples. The systematic uncertainties correspond to those described in Sect. 7.1.

Table 1 Estimated number of signal and multijet events after a fit to the ANN output distribution, and expected contributions of the electroweak processes and \(\mathrm {t}\bar {\mathrm {t}}\) backgrounds from MC simulation

Figure 1 shows the fitted ANN output distribution. Figure 2 shows the distribution of M 3, defined as the invariant mass of the three-jet system with highest transverse momentum in an enriched signal region, D NN>0.5. The selected jets are deemed to originate from the hadronically decaying top quark.

Fig. 1
figure 1

Distribution of the artificial neural network output variable after a fit of the signal and multijet processes to the data. Other background shapes and normalizations are fixed to the expectations from simulation. The hatched area shows the combined statistical and systematic uncertainty on the sum of the signal and background predictions. The ratio of the data distribution to the sum of expected background and fitted signal distributions is shown at the bottom of the figure

Fig. 2
figure 2

Distribution of the reconstructed M 3 variable after a fit of the signal and multijet processes to the data, after requiring the ANN output value to be greater than 0.5. The hatched area shows the combined statistical and systematic uncertainty on the sum of the signal and background predictions. The ratio of the data distribution to the sum of expected background and fitted signal distributions is shown at the bottom of the figure

7 Cross section measurement

7.1 Systematic uncertainties

The main sources of systematic uncertainties are those due to uncertainty in the jet energy scale (JES), the τ h energy scale, the τ h identification, the trigger efficiency and in the \(E_{\mathrm {T}}^{\text {miss}}\) measurement. The uncertainty in the cross section measurement is obtained by combining the uncertainty in the signal acceptance and in the fitted number of signal events. The systematic uncertainties in the fitted number of signal events are estimated, when relevant, by iterating the fit on the ANN output in order to take into account possible shape variations of the ANN input variables.

The uncertainties in the cross sections for the different simulated background processes are estimated from theoretical calculations [15, 21]. The uncertainty coming from the top-quark mass is evaluated considering two simulated samples where the nominal top-quark mass of 172.5 GeV has been shifted by ±6 GeV. Scaling this uncertainty to the measured top-quark mass uncertainty of 1.1 GeV provides a 3 % relative uncertainty in the measured cross section. The dependence of the selection on the renormalization and factorization scales is estimated by varying these scales simultaneously by a factor of 0.5 and 2.0 from their default value equal to the hard-scattering scale Q, with \(Q^{2}=m^{2}_{\text{top}}+\sum p_{\mathrm {T}} ^{2}\), where m top denotes the top-quark mass and \(\sum p_{\mathrm {T}} ^{2}\) the sum of the squared transverse momenta of all final state partons. The measured relative uncertainty for the \(\mathrm {t}\bar {\mathrm {t}}\) processes is estimated to be 2 %.

The thresholds used to associate the matrix elements to the parton showers are varied from 20 GeV to, respectively, 10 GeV and 40 GeV. The measured relative uncertainty for the \(\mathrm {t}\bar {\mathrm {t}}\) processes is estimated to be 3 %. The uncertainty of the choice of PDFs on the signal acceptance is estimated using the 2×11 reference PDFs associated to CTEQ6L1. The uncertainty of the choice of PDFs on the number of fitted signal events is determined iterating the fit on the ANN output distribution. Simulated events using the reference PDFs (out of the 11 available) leading to the maximal up (respectively maximal down) variation are used.

The uncertainty induced by the statistical uncertainty of the trigger turn-on is computed using the uncertainties on the trigger turn-on curves versus the transverse momenta of the particle-flow jets and particle-flow τ h. An additional ±5 % uncertainty is assigned to the τ h trigger efficiency measurement, since the data used to estimate the τ-leg efficiency consist mainly of jets misidentified as τ h candidates. This uncertainty is derived in comparison to the trigger efficiency obtained for τ h candidates in genuine Z→τ + τ events using similar trigger conditions.

The pileup uncertainty is estimated by varying the number of pileup interactions measured in data according to the theoretical uncertainty of the minimum bias inelastic cross section of ±8 %.

The uncertainty on the τ h energy scale is estimated by varying the τ h energy by ±3 % [19]. The uncertainties are propagated to the trigger efficiency weights. The uncertainty due to the τ h identification efficiency is estimated to be 6 % [19].

The uncertainty due to the JES is estimated by rescaling the jet energy up or down by the uncertainties corresponding to one standard deviation. For the jet energy resolution (JER) the distribution of the jet energy has been smeared by one standard deviation. The corrections are propagated to the \(E_{\mathrm {T}}^{\text {miss}}\) measurement and to the trigger-efficiency measurement. The energy of the particles that are not clustered into jets is varied by ±10 %, leading to an additional uncertainty in the \(E_{\mathrm {T}}^{\text {miss}}\).

The uncertainty due to applying b-tagging data-to-simulation scale factors for b, c and light-flavored jets to the simulated events is estimated by shifting the value of the applied scale factors by the uncertainty corresponding to one standard deviation [22]. The uncertainty in the reweighting method applied to the multijet data sample is estimated to be 5 %.

The uncertainty in the luminosity measurement is estimated to be 2.2 % [23].

Table 2 summarizes the uncertainties entering the cross section measurement, split into systematic and statistical ones. The statistical uncertainty includes the D NN fit uncertainty, the statistical uncertainty of the trigger turn-ons, as well as the uncertainty due to the limited size of the simulated samples.

Table 2 Relative uncertainties in the cross section measurement

7.2 Measured cross section and branching fraction

The measurement of the \(\mathrm {t}\bar {\mathrm {t}}\) cross section in the τ h+jets channel is performed using the following expression:

$$\sigma_{ \mathrm {t}\bar {\mathrm {t}} }=\frac{N-N_B}{A_{\text{tot}}\cdot \mathcal{B} \cdot \int \mathcal{L}\, \mathrm {d} {}t} $$

where N is the number of observed candidate events, N B is the estimate of the background, \(\int\mathcal{L}\, \mathrm {d} {}t\) is the integrated luminosity, A tot is the total acceptance, which contains the trigger efficiency and the efficiency of the offline event selection and \(\mathcal{B}\) is the branching fraction of the τ h+jets channel.

Taking into account the systematic and statistical uncertainties reported in Table 2 and the evaluated acceptance, A tot=0.0066±0.0001 (stat.)±0.0010 (syst.), the cross section is

$$\sigma_{ \mathrm {t}\bar {\mathrm {t}} } = 152\pm12\,(\mbox {stat.})\pm32 \,\text {(syst.)} \pm3 \,\text {(lum.)} \ \mbox{pb}. $$

Using the number of fitted signal events and the theoretical \(\mathrm {t}\bar {\mathrm {t}}\) production cross section, the branching fraction of the τ h+jets channel is

$$\mathcal{B} = 0.091\pm0.007\,(\mbox {stat.})\pm0.020 \,\text {(syst.)} \pm0.002 \,\text {(lum.)} . $$

The theoretical uncertainty on the \(\mathrm {t}\bar {\mathrm {t}}\) cross section is included in the systematic uncertainties.

8 Summary

Top-quark pairs in the τ h+jets final state have been selected in a data sample from proton–proton collisions at \(\sqrt{s}=7\) TeV, corresponding to an integrated luminosity of 3.9 fb−1. Events were recorded by a dedicated multijet plus τ h trigger, where events are selected with a moderate amount of \(E_{\mathrm {T}}^{\text {miss}}\) and four jets, at least one of which is b-tagged. The multijet background is discriminated against using an artificial neural network technique. The result, \(\sigma_{ \mathrm {t}\bar {\mathrm {t}} } = 152\pm12\,(\mbox {stat.})\pm32 \,\text {(syst.)} \pm3 \,\text {(lum.)} \ \mbox{pb}\), is consistent with CMS measurements performed in other \(\mathrm {t}\bar {\mathrm {t}}\) final states [5, 2426], as well as with the theoretical NNLO value of 164±10 pb. The measured process is the dominant background to a charged Higgs search, where a significant deviation from the SM expectations would indicate the presence of new phenomena.