Measurement of single top-quark production in association with a $W$ boson in the single-lepton channel at $\sqrt{s} = 8$ TeV with the ATLAS detector

The production cross-section of a top quark in association with a $W$ boson is measured using proton-proton collisions at $\sqrt{s} = 8$ TeV. The dataset corresponds to an integrated luminosity of 20.2 fb$^{-1}$, and was collected in 2012 by the ATLAS detector at the Large Hadron Collider at CERN. The analysis is performed in the single-lepton channel. Events are selected by requiring one isolated lepton (electron or muon) and at least three jets. A neural network is trained to separate the $tW$ signal from the dominant $t\bar{t}$ background. The cross-section is extracted from a binned profile maximum-likelihood fit to a two-dimensional discriminant built from the neural-network output and the invariant mass of the hadronically decaying $W$ boson. The measured cross-section is $\sigma_{tW} = 26 \pm 7$ pb, in good agreement with the Standard Model expectation.


Introduction
Single top quarks are produced in proton-proton collisions via the weak interaction.At leading order (LO) three different channels, which depend on the virtuality of the W boson involved, are defined: -channel, -channel or top-quark production in association with a W boson, called  production.These processes, for which example Feynman diagrams are shown in Figure 1, involve a  vertex at LO in the Standard Model (SM).Calculations involving  production beyond LO have to include quantum interference with tt production.Measurements of single-top-quark cross-sections are used to study the properties of this vertex, as they are directly sensitive to the Cabibbo-Kobayashi-Maskawa (CKM) matrix element |   |.Deviations from the cross-sections predicted by the SM can originate from single top quarks produced with similar kinematics in the decays of unknown heavy particles predicted by physics beyond the Standard Model.If the masses of these particles are beyond the reach of direct searches, they might be revealed through their effects on the effective  coupling [1].Using measurements in all three channels of single top-quark production, physics beyond the SM can be probed systematically in the context of Effective Field Theory [2].As each of the single-top-quark processes can be sensitive to different sources of new physics, it is also important to study each channel separately.In addition, the SM production of  is an important background in direct searches for particles beyond the SM [3,4]  At the Large Hadron Collider (LHC), evidence for the  production process was found by the ATLAS [5] and CMS Collaborations [6] at √  = 7 TeV and the process was observed by both experiments [7, 8] at √  = 8 TeV.The  cross-section has been also measured with 13 TeV collision data inclusively by the CMS Collaboration [9] as well as inclusively and differentially by the ATLAS Collaboration [10][11][12].These measurements were performed in final states with two leptons, and the measured cross-sections agree with the theoretical expectations.This paper presents evidence for  production in final states with a single lepton using proton-proton (pp) collisions at √  = 8 TeV.This topology features a W boson in addition to a top quark, which decays mainly into another W boson and -quark, leading to a  +  −  state.In the single-lepton channel, one of the W bosons decays leptonically ( L ) while the other one decays hadronically ( H ). Therefore, the experimental signature of event candidates is characterised by one isolated charged lepton (electron or muon), large missing transverse momentum ( miss T ), and three jets with high transverse momentum ( T ), one of which contains a b-hadron and is labelled as a b-tagged jet,  B .In contrast to the dilepton analyses, the event signature contains only one neutrino, which originates from the leptonic W-boson decay.Hence, both the W-boson and the top-quark kinematics can be reconstructed and used to separate the signal from background.The main backgrounds are W+ jets and tt events, where the latter background poses a major challenge in this measurement because of its similar kinematics and a ten times larger cross-section compared to the  signal.An artificial neural network is trained to separate the signal from the tt background.The cross-section is extracted using a binned profile maximum-likelihood fit to a two-dimensional discriminant.This measurement, performed with  single-lepton events, constitutes a cross-check of the previous results published in the dilepton channel.

ATLAS detector
The ATLAS experiment [13] at the LHC is a multipurpose particle detector with a forward-backward symmetric cylindrical geometry and a near 4 coverage in solid angle. 1  It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic and hadron calorimeters, and a muon spectrometer (MS).The ID provides charged-particle tracking in the pseudorapidity range || < 2.5.It consists of silicon pixel, silicon microstrip, and transitionradiation tracking detectors.Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity.An iron/scintillator-tile hadron calorimeter covers the central pseudorapidity range (|| < 1.7).The endcap (1.5 < || < 3.2) and forward (3.1 < || < 4.9) regions are instrumented with LAr calorimeters for measurements of both EM and hadronic energy.The MS surrounds the calorimeters and includes a system of precision tracking chambers (|| < 2.7) and fast detectors for triggering (|| < 2.4).The magnet system for the MS consists of three large air-core toroidal magnets with eight superconducting coils.The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector.Collisions producing interesting events are selected for storage with the trigger system [14].For the data taken at √  = 8 TeV, a three-level trigger system was used to select events.The first-level trigger is implemented in hardware and uses a subset of the detector information.It reduced the accepted rate to at most 75 kHz.This was followed by two software-based trigger levels that together reduced the accepted event rate to 400 Hz on average, depending on the data-taking conditions.

Data and simulated event samples
The data considered in this analysis are from pp collisions at √  = 8 TeV and were taken with stable LHC beams and the ATLAS detector fully operational, corresponding to an integrated luminosity of 20.2 fb −1 .
Monte Carlo (MC) samples were produced using the full ATLAS detector simulation [15] implemented in G 4 [16].In addition, alternative MC samples, used to train the neural network and evaluate systematic uncertainties, were produced using A F 2 [15], which provides a faster calorimeter simulation making use of parameterised showers to compute the energy deposited by the particles.Pile-up (additional pp interactions in the same or nearby bunch crossing) was modelled by overlaying simulated minimumbias events generated with P 8 [17].Weights were assigned to the simulated events, such that the distribution of the number of pp interactions per bunch crossing in the simulation matches the corresponding distribution in the data, which has an average of 21 [18]. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe.The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards.Cylindrical coordinates (, ) are used in the transverse plane,  being the azimuthal angle around the -axis.The pseudorapidity is defined in terms of the polar angle  as  = − ln tan(/2).Angular distance is measured in units of The  signal events were simulated using the next-to-leading order (NLO) P method [19][20][21] implemented in the P B (v.1.0)generator (revision 2192) [22] with the CT10 parton distribution function (PDF) set [23] in the matrix-element calculation.The mass and width of the top-quark were set to   = 172.5 GeV and Γ  = 1.32 GeV, respectively.The top quark was assumed to decay exclusively into Wb.The parton shower, hadronisation and underlying event were simulated using P 6 (v6.426) [24] with the LO CTEQ6L1 PDF set [25] and a corresponding set of tuned parameters called the Perugia 2011 (P2011C) tune [26].The factorisation scale,  f , and renormalisation scale,  r , were set to   .Calculations involving  production beyond LO included quantum interference with tt production.Double-counting of the contributions was avoided by using either the diagram-removal (DR) or the diagram-subtraction (DS) scheme [27,28].In the DR scheme, diagrams with a second on-shell top-quark propagator are removed from the amplitude, while in the DS scheme, a subtraction term cancels out the tt contribution to the cross-section when the top-quark propagator becomes on shell.Nominal MC samples were generated using the DR scheme.For the evaluation of systematic uncertainties, alternative samples were generated using the DS scheme, or using P B or MC@NLO (v4.06) [29], each interfaced with H (v6.520) [30].For the H samples, the AUET2 tune [31] with the CT10 PDF was used and the underlying event was generated with J (v4.31) [32].In addition, P 6 (v6.427) samples with variations of  r and  f and the radiation tunes were used.The SM  cross-section prediction at NLO including next-to-next-to-leading-log (NNLL) soft gluon corrections [33,34] was calculated as  th.
(8 TeV) = 22.4 ± 0.6 (scale) ± 1.4 (PDF) pb assuming a top-quark mass,   , of 172.5 GeV.The first uncertainty accounts for renormalisation and factorisation scale variations (from   /2 to 2  ) and the second term covers the uncertainty in the parton distribution functions, evaluated using the MSTW2008 PDF set [35] at next-to-next-to-leading order (NNLO).
The tt sample was generated with P B (v1.1) interfaced with P 6 (v6.427) [36].In the P B event generator, the CT10 PDFs were used, while the CTEQ6L1 PDFs were used for P .The ℎ damp parameter, which effectively regulates the high- T gluon radiation, was set to   .The predicted tt production cross-section,  tt (8 TeV) = 252.9+6.4  −8.6 (scale) ± 11.7 (PDF+ s ) pb, was calculated with the Top++2.0program to NNLO in perturbative QCD, including soft-gluon resummation to NNLL [37].The first uncertainty comes from the sum in quadrature of the effects of independently varying  r and  f .The uncertainty associated with variations in the PDFs and strong coupling constant,  S , was evaluated following the PDF4LHC NLO prescription [38,39], which defines the central value as the midpoint of the uncertainty envelope of three PDF sets: MSTW2008 NNLO [35], CT10 NNLO [40] and NNPDF2.35f FFN [41].The same procedures as for the  samples were employed to determine the uncertainties due to the NLO matching method and the parton shower and hadronisation.Samples to evaluate the scale uncertainties were produced in a similar way, varying  r and  f together with the Perugia tune, but also adding variations in the ℎ damp parameter (for the up-variation, ℎ damp was changed to 2  , while for the down variation it was kept at   ).
The other single-top-quark production processes, -channel and -channel, were also generated with P B (v1.1) coupled to P 6 (v6.426), using the same PDF sets as described for the other top-quark processes above.The predicted cross-sections at √  =8 TeV were calculated at NLO plus NNLL as 5.6 ± 0.2 pb for the -channel [42,43], and 87.8 +3.4  −1.9 pb for the -channel [44,45] process.
The multi-leg LO generator S (v1.4.1) [46][47][48], together with the CT10 PDF sets, was used to simulate vector-boson production in association with jets.S was used to generate the hard process as well as the parton shower and the modelling of the underlying event.Double-counting between the inclusive  +  parton samples (with  = W or Z) and samples with associated heavy-quark pair production was avoided consistently by using massive c-and b-quarks in the shower.The predicted NNLO W+ jets cross-section with W decaying leptonically was calculated as (pp → ℓ ±  ℓ ) = 36.3±1.9 nb [49]. 2or Z + jets the crosssection was calculated at NNLO in QCD for leptonic Z decays as (pp → ℓ + ℓ − ) = 3.72 ± 0.19 nb [49].The A F 2 simulation was used to generate these samples with sufficient statistics.For cross-checks of the W+ jets modelling, an alternative sample generated with A (v2.14) [50] with up to five additional partons, P 6 (v6.426) and the CTEQ6L1 PDFs were used.Diboson samples (// + jets) were generated with H (v6.520) at LO QCD using the CTEQ6L1 PDF.The theoretical NLO cross-section for events with one lepton is 29.4 ± 1.5 pb [51].
Multĳet events are selected in the analysis when they contain jets or photons misidentified as leptons or contain non-prompt leptons from hadron decays (both referred to as a 'fake' lepton).This background was estimated directly from data using the matrix method [52], which exploits differences in lepton identification and isolation properties between prompt and fake leptons.The data were processed with a second, 'loose' set of lepton selection criteria.The resulting sample was then corrected for efficiency differences between the two sets of cuts, and the contamination from events containing prompt leptons was subtracted.The efficiencies, lepton selection criteria, and uncertainties applied in this analysis are the same as in Ref. [52].

Object definitions
Primary vertex (PV) candidates in the interaction region are reconstructed from at least five tracks that satisfy a transverse momentum ( T ) of  T > 400 MeV.The candidate with the highest sum of  2 T over all associated tracks is chosen as the hard-collision PV [53].
Muon candidates are reconstructed by matching segments or tracks in the MS with tracks found in the ID [54].The candidates must have  T > 25 GeV and be in the pseudorapidity range || < 2.5.The longitudinal impact parameter of the track relative to the hard-collision PV, | vtx |, is required to be smaller than 2 mm.In order to reject non-prompt muons, an isolation criterion is applied.The isolation variable is defined as the scalar sum of the transverse momenta of all tracks with  T > 1 GeV (excluding the muon track) within a cone of size Δ = 10 GeV/ T () around the muon's direction.It is required to be less than 5% of the muon  T .The selection efficiency after this requirement is measured to be about 97% in Z →  +  − events.
Electron candidates are reconstructed from energy deposits (clusters) in the EM calorimeter, which match a well-reconstructed track in the ID [55].Requirements on the transverse and longitudinal impact parameter of | vtx | < 1 mm and | vtx | < 2 mm, respectively, are applied.Electron candidates must have energy in the transverse plane  T > 25 GeV and  cluster < 2.47, where  cluster denotes the pseudorapidity of the cluster.Clusters in the calorimeter barrel-endcap transition region, 1.37 < || < 1.52, are excluded.An isolation requirement based on the deposited transverse energy in a cone of size Δ = 0.2 around the direction of the electron and the  T sum of the tracks in a cone with Δ = 0.3 around the same direction is applied.This requirement is chosen to give a nearly uniform selection efficiency of 85% in  T and , as measured in Z → e + e − events.Electron candidates that share the ID track with a reconstructed muon candidate are vetoed.
Jets are reconstructed using the anti-  algorithm [56,57] with a radius parameter of  = 0.4 using topological clusters [58], calibrated with the Local Cluster Weighting method [59], as input to the jet finding.The jet energy is further corrected by subtracting the contribution from pile-up events and applying an MC-based and a data-based calibration.The jet vertex fraction (JVF) [60] variable is used to identify the primary vertex from which the jet originated.The JVF criterion suppresses pile-up jets with  T < 50 GeV and || < 2.4.To avoid possible overlap between jets and electrons, jets that are closer than Δ = 0.2 to an electron are removed.Afterwards, remaining electron candidates overlapping with jets within a distance of Δ = 0.4 are rejected.Finally, muons overlapping with jets within Δ = 0.4 are removed.
The identification of jets originating from the hadronisation of a b-quark (b-tagging) is based on various algorithms exploiting the long lifetime, high mass and high decay multiplicity of b-hadrons as well as the properties of the b-quark fragmentation.The outputs of these algorithms are combined in a neural network classifier to maximise the b-tagging performance [61].The choice of -tagging working point represents a trade-off between the efficiency for identifying b-jets and rejection of other jets.The chosen working point for this analysis corresponds to a -tagging efficiency of 70%.The corresponding c-quark-jet rejection factor is about 5 and the light-quark-jet rejection factor is about 120.These efficiencies and rejection factors were obtained using simulated tt events.The tagging efficiencies in the simulation are corrected to match the efficiencies measured in data [61].
The  miss T of the event, defined as the momentum imbalance in the plane transverse to the beam axis, is primarily due to neutrinos that escape detection.It is calculated as the negative vector sum of the transverse momenta of the reconstructed electrons, muons, jets and the clusters that are not associated with any of the previous objects (the 'soft term') [62].Its magnitude is denoted  miss T .

Event selection
Events are required to have a hard-collision primary vertex.They also have to pass a single-lepton trigger requirement [14,63] and contain at least one electron or muon candidate with  T > 30 GeV matched to the lepton that fired the trigger.The electron trigger requires an electron candidate, formed by an EM calorimeter cluster matched with a track, either with  T > 60 GeV or with  T > 24 GeV and additional isolation requirements.The muon trigger requires a muon candidate, defined as a reconstructed track in the muon spectrometer, either with  T > 36 GeV or with  T > 24 GeV and isolation requirements.If there is another lepton candidate with a transverse momentum above 25 GeV, the event is rejected.This lepton veto guarantees orthogonality with respect to the dilepton analysis.The contribution from leptonically decaying -leptons is included.In the following, the electron or muon candidate is referred to as the lepton.
Events identified as containing jets from cosmic rays or beam-induced backgrounds or due to noise hot spots in the calorimeter are removed.Only jets with  T > 30 GeV and || < 2.4 are considered in the analysis.Additionally, a requirement of  miss T > 30 GeV is applied, and the transverse mass 3 of the leptonically decaying W boson must satisfy  T (W  ) > 50 GeV.
In order to perform the measurement and validate the result, selected events are divided into different categories based on the jet and -tagged jet multiplicities.The region with three jets of which one is -tagged (3j1b) is called the signal region and is used to extract the  cross-section.The region with four jets, two of them -tagged (4j2b), contains a very pure sample of tt events and is used as the tt validation region to check the modelling of this background.Table 1 shows the expected and the observed numbers of 3 The transverse mass is calculated using the momentum of the lepton associated with the W boson,  miss T and the azimuthal angle between the two: events in the signal region after the event selection.All backgrounds except fake leptons, which is estimated using data-driven methods, are normalised to their expected cross-sections.The  events constitute about 5% of the total number of events.The major backgrounds are tt production with about 58%, and W+ jets production with about 28% of the total number of events.The W+ jets events are subdivided into heavy flavour (HF), where a W boson is produced in association with b-or c-jets, and light flavour (LF).The total numbers of expected events agree within a few percent with the observed numbers of events.

Separation of signal from background
Differences between signal and background event kinematics are exploited to better separate them.The tt background is inherently difficult to distinguish from the signal, motivating the use of an artificial neural network (NN) implemented in the NeuroBayes framework [64,65].Detailed information about how the NN is used in single-top-quark analyses can be found in Ref. [66].The NN input variables are selected such that they contribute significantly to the statistical separation power between signal and background, while avoiding variables that would lead to an increase of the expected uncertainty in the signal cross-section.
The observable   H (Figure 2) provides very good separation of the signal from the background, but is strongly affected by uncertainties in the reconstructed jet energies as well as uncertainties in the -tagging in tt events.For this reason,   H is not used in the NN; instead a two-dimensional discriminant is constructed from   H and the response of the NN.The two-dimensional discriminant, explained in the following subsections, allows the nuisance parameters affecting the variable   H to be partially constrained.

Invariant mass of the hadronically decaying W boson
The variable   H is computed from the four-momenta of the two selected untagged jets.For the signal and the tt background, the distribution of   H exhibits a peak near the mass of the W boson, shown in Figure 2(a).The peak results from events where the two untagged jets are correctly matched to the hadronically decaying W boson.This is less likely to happen for tt events than for  events due to the higher b-jet multiplicity and the limited b-tagging efficiency.On the other hand, the W+ jets background does not feature such a peak since the W boson must decay leptonically for the events to pass the selection.

Neural network
The NN is trained using simulated events with the two reconstructed untagged jets matched within Δ < 0.35 to the generator-level jets originating from a W-boson decay in the MC simulation and having a reconstructed mass of 65 GeV <   H < 92.5 GeV.As events are required to contain a lepton, only , tt and diboson events can have a pair of jets matched to the hadronic W-boson decay.Given that the contribution from diboson production is very small, the background sample used for the training consists entirely of tt events.Following the training procedure mentioned before, the following four variables (ordered by significance) are selected as input for the NN: • the transverse momentum of the  system,  T ( H  L  B ), divided by the sum of the objects' transverse momenta, , where the four-momentum of  L is the sum of the four-momenta of the electron or muon and the neutrino, and the four-momentum of the neutrino is determined using  miss T from the solution of a quadratic equation. 4 The use of  T (  ,   ,  B ), instead of the transverse momentum of the  system, decreases the background contribution in the signal-like region of the NN response and results in a gain of sensitivity; • the invariant mass of the reconstructed  system,   L  H  B ; • the absolute value of the difference between the pseudorapidities of the lepton and the leading untagged jet in  T , |Δ(ℓ,  1 )| ; • the absolute value of the pseudorapidity of the lepton, |(ℓ)|.
Figure 3 compares the data with the prediction for the NN input variables.For all variables, the simulation provides a good description of the data.
The distribution of the NN response is subdivided into eight bins, with the edges placed approximately at the 12.5% quantiles of a 50:50 mixture of  and tt events.Figure 4(a) shows the shape of the NN response for the  and tt processes and Figure 4(b) presents the comparison between data and Monte Carlo simulation.

Two-dimensional discriminant
For the two-dimensional discriminant,   H is used on the abscissa and the NN response on the ordinate of the two-dimensional discriminant.Outside of the aforementioned   H range from 65 GeV to 92.5 GeV, the bins corresponding to different values of the NN response are merged, i.e. the NN response is ignored.The two-dimensional distribution is presented in Figure 5.
The bins are then rearranged on a one-dimensional axis in column-major order.The resulting onedimensional distribution is presented in Figure 6, together with a comparison of the shapes.The first three bins and the last ten bins correspond directly to the bins of   H below 65 GeV and above 92.5 GeV respectively.In between are four blocks of eight bins, corresponding to the NN output in slices of   H . Inside each of the blocks, the -to-tt ratio increases significantly from left to right.

Systematic uncertainties
Uncertainties in the jet reconstruction arise from the jet energy scale (JES), jet energy resolution (JER), JVF requirement and jet reconstruction efficiency.The effect of the uncertainty in the JES [59] is evaluated by varying the reconstructed energies of the jets in the simulated samples.It is split into multiple components, taking into account the uncertainty in the calorimeter response, the detector simulation, the choice of MC event generator, the subtraction of pile-up, and differences in the detector response for jets initiated by a gluon, a light-flavour quark, or a b-quark.In a similar way, the JER uncertainty is represented using several components, which account for the uncertainty in different  T and  regions of the detector, the difference 4 There are two solutions if  T (ℓ) <   and no real-valued solutions if  T (ℓ) >   .In the first case, the ambiguity is resolved by picking the solution with the smaller |   ()|.The latter case occurs if the measured  miss T is too large, and is resolved by adjusting  miss T until a real-valued solution is found, under the constraint that the adjustment be minimal in the transverse plane.between data and MC simulation, as well as the noise contribution in the forward detector region [59].The uncertainty in jet reconstruction efficiency is estimated by randomly removing simulated jets from the events according to the jet reconstruction inefficiency measured with dĳet events [67].The JVF uncertainty is evaluated by varying the JVF criterion [60].
The scale factors used to correct the -tagging efficiency in simulation compared to the efficiency in data are varied separately for b-jet, c-jet and light-flavour jets.Independent sources of uncertainty affecting the b-jet tagging efficiency and c-jet mis-tagging efficiency are considered depending on the jet kinematics, e.g. the variation of the b-quark jets is subdivided into 6 components.Uncertainties associated with the lepton selection arise from the trigger, reconstruction, identification, isolation and lepton momentum scale and resolution [54,68,69].
All systematic uncertainties in the reconstruction of jets and leptons are propagated to the uncertainty in  miss T .In addition, dedicated uncertainties are assigned to the soft term of the  miss T , which accounts for energy deposits in the calorimeter which are not matched to high- T physics objects [62].
The uncertainty in the integrated luminosity for the data set used in this analysis is 1.9%.It is derived following the methodology detailed in Ref. [18].This systematic uncertainty is applied to all contributions determined from the MC simulation.
Uncertainties stemming from theoretical models are evaluated using alternative MC samples for  and tt processes.The renormalisation and factorisation scales are varied in the matrix element and in the parton shower together with the amount of QCD radiation.Both scales are varied simultaneously in the matrix element and in the parton shower.The variation of both  r and  f by a factor of 0.5 is combined with the Perugia 2012radHi tune, while the variation of the scale parameters by a factor of 2.0 is combined with the Perugia 2012radLo tune [26].This (radiation) uncertainty is considered uncorrelated between the  and tt processes.The NLO matrix element generator uncertainty is estimated by comparing two NLO matching methods: P B and MC@NLO, both interfaced with H .The parton shower, hadronisation and underlying-event systematic uncertainties are computed by comparing P B with either P or H .These are treated as fully correlated between the  and tt processes.The uncertainty due to the treatment of the interference effects of the  and tt processes is evaluated by using the  DS scheme instead of the DR scheme, both generated using P B with P .The effect of the PDF uncertainties on the acceptance is taken into account for both the  signal and the tt background and treated as uncorrelated between the processes, following the studies in Ref. [70].
The uncertainties in the theoretical cross-section calculations are process dependent and vary from 4% for the -channel to 6% for tt (see Section 3).In addition, there are large uncertainties in the Z/W+ jets production cross-sections.For every jet an additional uncertainty of 24% is assumed [71].The uncertainty in the normalisation of W/Z-boson production in association with three jets is 42%, and the rate of W-boson events with heavy-flavour jets is allowed to vary by an additional 20%.
The modelling of the W+ jets background was cross-checked using A with P .The shape of the W+ jets background was found to be consistent with the nominal S prediction.Hence no dedicated systematic uncertainty is assigned to the choice of generator, in order to avoid double-counting of the statistical uncertainty of the prediction (model statistics).Figure 6: (a) Shape distribution of the reconstructed discriminant in the  signal (3j1b) region rearranged onto a one-dimensional distribution.The distribution for each process normalised to unity is shown.(b) Pre-fit distributions of the discriminant in the  signal (3j1b) region.Small backgrounds are subsumed under 'Other'.The simulated distributions are normalised to their theoretical cross-sections.The dashed uncertainty band includes statistical and systematic uncertainties.The lower panel shows the ratio of the observed and the predicted number of events in each bin.The first three bins and the last ten bins correspond directly to (non-uniform) bins of   H .In between are four blocks of eight bins, corresponding to the NN output in slices of   H . Inside each of the blocks, the numbers of events are scaled by a factor of four for better visibility.
Uncertainties related to the modelling of the fake-lepton background take into account the choice of control region for the determination of the fake-and real-lepton efficiencies, the choice of parameterisation, and the normalisation of the prompt-lepton backgrounds in the determination of the efficiencies [52].
The uncertainty due to the limited size of the simulated samples and the fake-lepton background (model statistics) is estimated through the procedure detailed in Refs.[72,73]: for every bin of the discriminant, an independent parameter is assigned which describes the variation of the predicted event rate constrained by its statistical uncertainty.

Statistical analysis
A binned profile maximum-likelihood fit to the discriminant in the signal region is used to determine the  cross-section.The likelihood function is defined as a product of Poisson probability terms over all the bins of the discriminant in the signal region and Gaussian penalty terms, where the   (  ) is the observed (expected) number of events in each bin  of the discriminant.The expected number of events depends on the signal-strength parameter, , which is a multiplicative factor on the predicted signal cross-section.Nuisance parameters (NPs),   , are used to encode the effects of the systematic uncertainties in the expected number of events.The Gaussian penalty terms model the external constraints on these parameters.The estimated parameters, denoted by μ and θ, are obtained by maximising (, ì ; ì ).
The likelihood function is composed and evaluated with the H F program [74], part of the R S framework [75].The minimisation is performed with the M package [76], using M to compute the error estimates.
The statistical significance, , of the result is estimated by comparing the likelihood values of two hypotheses.The background-only hypothesis is that there is no signal in the data (or equivalently,  = 0).The signal-plus-background hypothesis is that the signal exists with the signal strength obtained from the fit to data.With the asymptotic approximation [77], the significance is calculated using a test statistic based on the profile likelihood ratio, , where ì  =0 denotes the estimates of the nuisance parameters that maximise the likelihood function under the background-only hypothesis.The expected significance is calculated by replacing ì  in the likelihood function with the Asimov dataset for the nominal signal-plus-background hypothesis ( = 1, ì  = ì ).

Cross-section measurement
The  cross-section is extracted from the fit to data in the signal region.Given the Standard Model prediction, the extracted signal strength is expected to be μ = 1.00 ± 0.35.The measured value is μ = 1.16 ± 0.31, corresponding to an observed cross-section of  obs  = 26 ± 7 pb, which is consistent with the Standard Model prediction.The observed (expected) significance is 4.5 (3.9).
The (post-fit) impact of each systematic uncertainty on the measured signal strength is estimated by means of conditional fits, i.e. the fit is repeated while keeping the corresponding nuisance parameter fixed at the ±1 standard deviation (sigma) value of the post-fit error interval.The resulting change in the estimate of the signal strength quantifies the impact of the uncertainty.For each nuisance parameter, the +1 and −1 sigma variations are found to be symmetric about the best-fit value to a very good approximation.Table 2 shows the impacts of the systematic uncertainties on the observed fit result, where the impacts of uncertainties with similar sources have been added in quadrature.The dominant uncertainties are due to the amount of QCD radiation in signal events and tt background, the JES and -tagging, and the model statistics, including the limited size of the MC samples.Some nuisance parameters are constrained by the data.For example, the normalisation uncertainty for W+ jets events is reduced from 45% to 8%, because the assigned initial uncertainty is large and this background can be separated well from  and tt events.By design of the discriminant, combinations of nuisance parameters that shift the peak in the   H distribution are constrained, primarily the JES and choice of renormalisation scale together with the amount of QCD radiation in signal and tt background.Also, the nuisance parameter for the NLO matching for  and tt is constrained: the choice of MC@NLO is not supported by the data, reducing the impact of the choice from 9% pre-fit to 3% post-fit.
A few nuisance parameters are pulled away from the pre-fit expectation.For the parameter associated with the choice of parton-shower generator, a blend of P and H gives the best description of the data, while the nominal P prediction is disfavoured at the two-sigma level.The -tagging parameter with the largest effect on the overall -tagging efficiency is pulled by about one sigma, corresponding to a decrease of about 1% to 2% in the -tagging efficiency compared to the pre-fit expectation.Given that the -tagging calibration partially relies on dĳet events [61], which correspond to a different environment regarding the production mechanism of the -jets, the pull is reasonable.
Table 3 shows the post-fit event yields of each process.The uncertainties in the yields are computed taking the correlations between nuisance parameters and processes into account.The post-fit estimates are well within the uncertainties of the pre-fit expectation (Table 1), while most of their uncertainties are reduced.The normalisation uncertainty for W + HF jets changes from almost 50% to about 10%.
Figure 7 shows the post-fit distributions for the NN input variables, the NN output response and the   H in the signal region.The post-fit plots use the parameter estimates obtained in the fit of the discriminant, including their uncertainties, and demonstrate a good description of the data.
Figure 8(a) shows that the data are well described by the model in the signal region.Figure 8(b) shows the strongest support for the validity of the fit result by comparing the expected distributions and observed distributions in the tt validation region.It shows that the uncertainty due to the extrapolation from the signal region is small, and therefore provides a stringent test that the main background is well modelled.The lower panels show the ratio of the observed and the predicted number of events in each bin.The first three bins and the last ten bins correspond directly to (non-uniform) bins of   H .In between are four blocks of eight bins, corresponding to the NN output in slices of   H . Inside each of the blocks, the numbers of events are scaled by a factor of four (factor of two in 4j2b) for better visibility.

Conclusion
The inclusive cross-section for the production of a single top quark in association with a W boson in the single-lepton channel is measured using an integrated luminosity of 20.2 fb −1 of √  = 8 TeV proton-proton collision data collected by the ATLAS detector at the LHC in 2012.A neural network is used to separate the signal from the tt background.A two-dimensional discriminant, built from the neural-network response and the mass of the hadronically decaying W boson, is used to extract the cross-section.Evidence for  production in the single-lepton channel is obtained with an observed (expected) significance of 4.5 (3.9) standard deviations.The measured cross-section is: which is consistent with the SM expectation of  th. = 22.4 ± 1.5 pb.

Figure 2 (Figure 2 :
Figure 2: (a) Shape of the reconstructed   H distribution for signal and most important backgrounds in the signal (3j1b) region.The distribution for each process normalised to unity is shown.(b) Pre-fit   H distribution in the 3j1b region.Small backgrounds are subsumed under 'Other'.The simulated distributions are normalised to their theoretical cross-sections.The dashed uncertainty band includes statistical and systematic uncertainties.The lower panel shows the ratio of the observed and the predicted number of events in each bin.The last bin includes the overflow events.

Figure 3 :Figure 4 :
Figure3: Pre-fit distributions of the NN input variables in the  signal (3j1b) region with 65 GeV ≤   H ≤ 92.5 GeV.Small backgrounds are subsumed under 'Other'.The simulated distributions are normalised to their theoretical cross-sections.The dashed uncertainty band includes statistical and systematic uncertainties.The last bin includes the overflow events.The lower panels show the ratio of the observed and the predicted number of events in each bin.

Figure 5 :
Figure5: Predicted distribution of the two-dimensional discriminant in the signal (3j1b) region.The proportions of the coloured areas reflect the expected composition in terms of , tt, W+ jets and other processes.The numbers correspond to the bin order when projecting the discriminant onto one axis as in Figure6.The last bin on the horizontal axis includes the overflow events.

Figure 7 :
Figure 7: (a), (b), (c), (d) Post-fit distributions of the NN input variables, (e) NN discriminant and (f)   H in the signal region.Small backgrounds are subsumed under 'Other'.The dashed uncertainty band includes statistical and systematic uncertainties.The last bin includes the overflow events, except for (e).The lower panels show the ratio of the observed and the predicted number of events in each bin.

Figure 8 :
Figure 8: Post-fit distributions of the discriminant in the (a) signal region and (b) validation region.Small backgrounds are subsumed under 'Other'.The dashed uncertainty band includes statistical and systematic uncertainties.The lower panels show the ratio of the observed and the predicted number of events in each bin.The first three bins and the last ten bins correspond directly to (non-uniform) bins of   H .In between are four blocks of eight bins, corresponding to the NN output in slices of   H . Inside each of the blocks, the numbers of events are scaled by a factor of four (factor of two in 4j2b) for better visibility.

Table 1 :
Expected signal and background and observed number of events in the signal 3j1b region.The cross-section for  production is taken to be the theory prediction.The uncertainties include statistical and systematic uncertainties.

Table 2 :
List of systematic uncertainties considered in the analysis and their relative impact on the observed signal strength, evaluated as described in the text.The 'model statistics' uncertainty is dominated by the W+ jets background.

Table 3 :
Post-fit signal and background and observed number of events in the signal region and the tt validation region.The uncertainties include statistical plus all systematic uncertainties (cf.Section 7).