Search for third-generation leptoquarks and scalar bottom quarks in pp collisions at √ s = 7 TeV

Results are presented from a search for third-generation leptoquarks and scalar bottom quarks in a sample of proton-proton collisions at √ s = 7 TeV collected by the CMS experiment at the LHC, corresponding to an integrated luminosity of 4.7 fb − 1 . A scenario where the new particles are pair produced and each decays to a b quark plus a tau neutrino or neutralino is considered. The number of observed events is found to be in agreement with the standard model prediction. Upper limits are set at 95% conﬁdence level on the production cross sections. Leptoquarks with masses below ∼ 450 GeV are excluded. Upper limits in the mass plane of the scalar quark and neutralino are set such that scalar bottom quark masses up to 410 GeV are excluded for neutralino masses of 50 GeV.


Introduction
Many theoretical extensions of the standard model (SM) predict the existence of color-triplet scalar or vector bosons, called leptoquarks (LQ), that have fractional electric charge and both lepton and baryon quantum numbers.These theories include grand unified theories [1], composite models [2,3], technicolor schemes [4][5][6], and superstring-inspired E 6 models [7].We follow the usual assumption that there are three generations of LQs, each of which couples only to the corresponding generation of SM particles, to avoid violating the known experimental constraints on flavor-changing neutral currents [8].Leptoquarks would be produced at the Large Hadron Collider (LHC) in pairs predominantly through gg fusion and qq annihilation, and the contributions from lepton t-channel exchange are suppressed by the leptoquark Yukawa couplings.A leptoquark decays to a charged lepton and a quark with a branching fraction β usually considered as a free parameter of the model, or a neutrino and a quark with branching fraction 1 − β.For scalar LQs, the production cross section is determined by the ordinary color coupling between an LQ and a gluon, which is model independent.
Numerous theories of particle physics beyond the SM address the gauge hierarchy problem and other shortcomings of the SM by introducing a new symmetry that relates fermions and bosons, called "supersymmetry" (SUSY) [9].Supersymmetric models introduce a new discrete symmetry, R-parity, and all SM particles have R p = +1 while all superpartners have R p = −1.Imposing R-parity conservation prohibits baryon and lepton number violating couplings which could otherwise lead to rapid proton decay.In models with R-parity conservation, SUSY particles are produced in pairs, and the lightest SUSY particle (LSP) is stable.In some models the LSP is the electrically neutral and weakly interacting neutralino ( χ 0 1 ), which provides a dark matter candidate [10].The left-and right-handed SM quarks have scalar partners ( qL and qR ) that can mix to form scalar quarks (squarks) with mass eigenstates q1,2 .Since the mixing is proportional to the corresponding SM fermion masses, the effects can be enhanced for the third generation squarks, yielding sbottom ( b1,2 ) and stop ( t1,2 ) mass eigenstates with large mass splitting.The lighter mass eigenstate ( b1 or t1 ) could be lighter than any other charged SUSY particle [11].Therefore, if sufficiently light, b 1 squarks could be produced at the LHC either directly or through decays of gluinos (the supersymmetric partners of gluons).In most SUSY models, a b 1 is expected to decay predominantly into a bottom quark and χ 0  1 , so that the final state consists of b jets and a sizable imbalance in transverse energy (E T / ), defined as the magnitude of the vector opposite to the sum of the transverse momenta of all detected particles.
In this paper we present results of a search for pair-produced scalar third-generation leptoquarks (LQ 3 ) with an electric charge of ±1/3 and for b1 .Each of the LQ 3 ( b1 ) particles decays into a b quark and ν τ ( χ 0 1 ).In each case, signal events are characterized by two high-transversemomentum (p T ) b jets accompanied by large E T / .The resulting final state, consisting of jets, E T / , and no charged leptons, does not allow a full reconstruction of the decay chain, because of the lack of knowledge of the individual momenta of the weakly interacting particles.
Previous searches performed by the CDF and D0 collaborations at the Tevatron have excluded LQ 3 → ν τ b masses below 247 GeV, and set limits on the production of b 1 squarks for a range of values in the b 1 − χ 0  1 mass plane that extend up to m( b 1 ) = 200 GeV for m( χ 0 1 ) = 110 GeV [12,13].A search performed by the CMS collaboration has excluded the existence of a scalar LQ 3 with an electric charge of ±2/3 or ±4/3 and with mass below 525 GeV, assuming 100% branching fraction to a b quark and a τ lepton [14].A search performed by the ATLAS collaboration excluded the production of b 1 with masses up to 390 GeV, for χ 0 1 masses below 60 GeV [15].
The main SM backgrounds in this search are tt+jets, heavy-flavor (HF) multijet production, and W or Z accompanied by HF production.In the case of multijet events and W/Z decays to hadrons, the E T / is due to neutrinos in HF semileptonic decays, and due to effects of jet energy resolution and mismeasurements.In the case of W/Z decays to leptons, genuine E T / results from the escaping neutrinos when the charged lepton (e or µ) goes undetected, or from τ decays.

The CMS apparatus
A detailed description of the Compact Muon Solenoid (CMS) detector can be found elsewhere [16].The central feature of the CMS detector is the superconducting solenoid magnet, of 6 m internal diameter, providing a magnetic field of 3.8 T. The silicon pixel and strip tracker, the lead-tungstate crystal electromagnetic calorimeter (ECAL), and the brass/scintillator hadron calorimeter (HCAL) are contained within the solenoid.Muons are detected in gas-ionization chambers embedded in the steel return yoke.The ECAL has a typical energy resolution of 1-2% for electrons and photons above 100 GeV.The HCAL, combined with the ECAL, measures the jet energy with a resolution ∆E/E ≈ 100%/ √ E/GeV ⊕ 5%.
CMS uses a right-handed coordinate system, with the origin located at the nominal collision point, the x axis pointing towards the center of the LHC ring, the y axis pointing up (perpendicular to the plane of LHC ring), and the z axis along the counterclockwise-beam direction.The azimuthal angle φ is measured with respect to the x axis in the x-y plane and the polar angle θ is defined with respect to the z axis.The pseudorapidity is defined as η = − ln[tan(θ/2)].

Razor variables
Although the signal considered in this analysis consists of two high p T b jets and E T / , additional jets may be produced by initial-or final-state radiation (ISR/FSR).We study the effect of such radiation with Monte Carlo (MC) simulation samples.To reduce the systematic uncertainty due to the imperfect simulation of ISR/FSR, we force every event into a dijet topology by combining all the jets in the event into two "pseudojets", following the "razor" methodology and variables [17,18].The pseudojets are constructed as a sum of the four-momenta of their constituent jets.After considering all possible partitions of the jets into two pseudojets, the combination that minimizes the sum in quadrature of the pseudojet masses is selected.
The razor methodology provides an inclusive technique to search for production of heavy particles, each decaying to a visible system of particles and a weakly interacting particle.As an example, let us consider the pair production of two massive particles, denoted S, each decaying to a b quark and neutral weakly interacting particle, χ, as S → bχ.In the respective rest frame of each particle S, the decay products have a unique momentum p resulting from the two-body decay of S, given by: where the mass of the b quark is neglected in this expression.This characteristic momentum, which is denoted M ∆ and is referred to as "momentum scale", is the same in each decay instance, and can be used to distinguish this particular signal from SM backgrounds in the same final states.The razor mass, M R , is an event-by-event estimator of this scale calculated through a series of approximations, motivated by physics, meant to estimate the rest frames of the respective particles S [17,18], and is defined as: where p i (p i z ) is the absolute value (the longitudinal component) of the i-th pseudojet momentum.An average transverse mass M R T can be defined as: whose maximum value for signal events equals M ∆ .The dimensionless variable R is then defined as: For the signatures examined in this analysis, the value of M R can have different interpretations.
In the case of LQ 3 pair production, the LQ 3 corresponds to the particle S from the above example, while χ is a neutrino.As a result, the characteristic scale M ∆ is an estimator of the LQ 3 mass.Similarly, for b 1 pair production, S refers to a b 1 while χ is the LSP, generally a massive neutralino.In this case, M ∆ corresponds to the mass difference between the b 1 and LSP.
As follows from the definitions above, M R T is expected to have a kinematic endpoint at the mass of the new heavy particle, in a similar fashion to the transverse mass having an edge at the particle mass (such as M T in W → ν events).Therefore, the R variable is a measure of how well the missing transverse momentum is aligned with respect to the visible momentum.If the missing momentum is completely back-to-back to the visible momentum, R will be close to one.On the other hand, if the momenta of the two neutrinos or χ 0 1 largely cancel each other, R will be small.The distribution of R for signal events will peak around 0.5, while for QCD multijet events it peaks at zero.These properties of R and M R motivate the kinematic requirements for the signal selection and background reduction, which are discussed below.Some differences between the kinematic distributions (such as the transverse momenta of b jets) for LQ 3 production and b 1 production may arise, if the mass of the χ 0 1 is substantial or even almost degenerate with the mass of the b 1 .For a fixed b 1 mass the M ∆ decreases as the χ 0 1 mass increases.In the case of an almost degenerate χ 0 1 and b 1 , E T / is relatively small and the jets are soft, resulting in an M R distribution shifted towards lower values, thus reducing the momentum of the b 1 decays products and the sensitivity of the search.

Data samples, triggers, and event selection
The analysis is designed using MC samples generated with PYTHIA (version 6.424) [19] and MADGRAPH [20] (version 5.1.1.0),and processed with a detailed simulation of the CMS detector response based on GEANT4 [21].Events with QCD multijets, top quarks, and electroweak bosons are generated with MADGRAPH interfaced with PYTHIA tune Z2 [22] for parton showering, hadronization, and the underlying event description.Signal samples for LQ 3 masses from 200 to 650 GeV, in steps of 50 GeV, are generated with PYTHIA tune D6T [23,24].The b 1 pair production signal samples are generated with the PYTHIA generator and processed with a detailed fast simulation of the CMS detector response [25].The scalar bottom quark signal samples are generated with b 1 masses from 100 GeV to 550 GeV in steps of 25 GeV, and χ 0 1 masses from 50 GeV to 500 GeV in steps of 25 GeV.The b 1 samples are generated with the assumption that the mass peak can be described by a Breit-Wigner shape [19], but this assumption becomes imprecise when the sparticles are close to degenerate.Samples where the difference between the b 1 mass and χ 0 1 mass is less than 50 GeV are therefore not generated.The simulated events are reweighted so that the distribution of number of overlapping pp interactions per beam crossing ("pileup") in the simulation matches that observed in data.
Events used in this search are collected by a set of online triggers.The first level (L1) of the CMS trigger system, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events in a fixed time interval of less than 4 µs.The High Level Trigger (HLT) processor farm further decreases the event rate from around 100 kHz to around 300 Hz, before data storage.We employ three categories of triggers for this search: (i) hadronic razor triggers with moderate/tight requirements on R and M R ; (ii) muon razor triggers with looser requirements on R and M R and at least one muon in the central part of the detector with p T > 10 GeV; and (iii) electron razor triggers with the R and M R requirements similar to those for muon razor triggers, and at least one electron of p T > 10 GeV, satisfying loose isolation criteria.Events collected with the muon and electron razor triggers are used to provide control regions for background studies, since the potential signal contribution in these events is negligible.The search for the presence of a new physics signal is performed in the events collected with the hadronic razor triggers.
All events are required to have at least one good reconstructed interaction vertex [26].Events containing calorimeter noise, or large E T / due to instrumental effects (such as beam halo or jets near non-functioning channels in the ECAL) are removed from the analysis [27].The jets in the event, which are required to have |η| < 3.0, are reconstructed from the calorimeter energy deposits using the infrared-safe anti-k T algorithm [28] with a distance parameter of 0.5, and are corrected for the non-uniformity of the calorimeter response in energy and η using corrections derived from Monte Carlo and observed data [29].The E T / is reconstructed using the particleflow algorithm, which identifies and reconstructs individually the particles produced in the collision, namely charged hadrons, photons, neutral hadrons, electrons, and muons [30].

Muon and electron identification and selection
We select muon and electron candidates using a cut-based approach similar to the selection process used for the measurement of the inclusive W and Z cross section [31].
We use the "tight" and "loose" muon identification criteria, and all muons are required to have p T > 20 GeV.For loose muons, we require that the muon candidate has at least 10 hits in the inner tracker.For the tight muon we require in addition that the following selections are met: • at least one hit in the pixel detector; In addition, the tight muons satisfy a lepton isolation requirement I comb obtained by summing the p T of tracks and the energies of calorimetric energy deposits in a cone of ∆R = (∆η) 2 + (∆φ) 2 < 0.3 around the lepton candidate, excluding the candidate's p T .We require the combined isolation to be less than 15% of the muon p T .
When the isolation requirements [31] are applied to the electron or tight muon candidates, the combined isolation I comb is corrected for pileup dependence using the average energy density ρ from other proton-proton collisions in the same beam crossing, calculated for each event [32].

Identification of b jets
Jets originating from a b quark are identified ("tagged") by the TCHE algorithm [33].Selecting events with b-tagged jets reduces the background from QCD multijet events where mismeasured light-flavor jets cause large apparent E T / .In the TCHE algorithm a jet is considered as b tagged if there are at least two high-quality tracks within the jet, each with a three-dimensional impact parameter (IP) significance IP/σ IP larger than a given threshold ("operating point").In this analysis we use the "medium" operating point [33].The b-tagging efficiency ( b ) and mistag rate (R b ) have been measured up to p T = 670 GeV and in the p T range 80-120 GeV are found to be b = 0.69 ± 0.01 and R b = 0.0286 ± 0.0003.In the following we refer to the sample with two jets tagged by the medium TCHE tagger as the "2b-tagged" sample.A scale factor (per jet) of 0.95 ± 0.02 is applied to the to the MC simulation samples to account for the observed differences in the b-tagging efficiency between the simulation and data [33].

Search strategy
Candidate signal events in this search contain a pair of b jets, large E T / , and no isolated leptons.The main backgrounds that contribute to this final state originate from tt+jets, HF multijets, and W/Z+HF jets events.Diboson production is included in the total background estimation, but its contribution is small.Significant E T / in multijet events derives from b quarks decaying semileptonically or from jet energies being severely mismeasured.Apart from the multijet background, the remaining backgrounds originate from processes with both genuine E T / due to energetic neutrinos and undetected charged leptons from vector boson decays.
Data sets collected with the razor triggers are examined for the presence of a well-identified electron or muon, as described in Section 4.1.Based on the presence or absence of such a lepton, the event is categorized into one of the three disjoint event samples (boxes) referred to as the electron (ELE), muon (MU), and hadronic (HAD) boxes.
These requirements define the inclusive baseline selection: • MU box: events collected with muon razor triggers and containing one loose muon with p T > 20 GeV, M R > 400 GeV and R 2 > 0.14.• ELE box: events collected with electron razor triggers and containing one loose electron with p T > 20 GeV, M R > 400 GeV and R 2 > 0.14.• HAD box: events collected with hadronic razor triggers and not satisfying any other box requirements, and with M R > 400 GeV and R 2 > 0.2.
We also require that there are at least two jets above 60 GeV in each event, to ensure that the trigger is fully efficient for our selected events.In order to study and estimate the background contributions in the HAD box, we treat muons and electrons in the MU and ELE boxes as neutrinos, i.e. the lepton 4-vector is used to recalculate the E T / vector and the R variable is recomputed.This procedure generates the kinematic properties of the background events in the HAD box, using events from the MU and ELE boxes that, because of the presence of the leptons, are free of the signals relevant to this analysis.
The distributions of the discriminating variables R and M R for the main backgrounds (heavyflavor multijets and tt) are estimated from observed data.Events in the MU box are used to extract the probability density functions (PDFs) describing the behavior of the R and M R shapes for each process of interest.For the W/Z+HF-jets and diboson backgrounds we use heavyflavor-enriched MADGRAPH simulation samples to get the shape prediction.The procedure to extract the background shapes is described in detail in Section 6, and the samples used are summarized in Table 1.
To predict the SM background normalizations in the signal region we adopt the following strategy.The events in the ELE and HAD boxes are split into two exclusive categories: • sideband: events with 400 < M R < 600 GeV and 0.2 < R 2 < 0.25; • high R 2 : events with M R > 400 GeV and R 2 > 0.25.
The 2b-tagged high-R 2 events in the HAD box define the signal search region.The normalizations of the SM backgrounds in the signal region are obtained through a two-step procedure: • the SM processes are normalized according to their theoretical cross sections, except for tt where the measured CMS cross section [34] is used; • the total background prediction in the high-R 2 region is multiplied by a scale factor ( f R 2 ) to correct for imperfect knowledge of the multijet production cross section.
The scale factor is derived from events in the sideband, and is defined as f R 2 = N exp /N obs , where N exp is obtained using the background PDF normalized to their individual cross sections; and N obs is the number of observed events.
In order to avoid potential bias in the search, before analyzing the events in the HAD box signal region, we test our understanding of the SM background estimation procedure in control regions, using the MU and ELE boxes.This is done by comparing the background shapes derived from the MU box to the observed data in the ELE box (removing the leptons from the reconstruction to emulate E T / in each case).To ensure that both the shapes and normalizations of the background components describe the observed events, the procedure to be used in the HAD box (see Table 1 below) is first employed and tested in the ELE box (Sec.6.5).Events in the ELE sideband are used to obtain the scale factor f R 2 , ELE which is used to test the background prediction in high R 2 ELE box.Once the procedure is validated in the ELE box, the f R 2 , HAD is derived from events in the sideband of the HAD box, and is used to predict the normalization of the backgrounds in the signal region.

Background estimation
In both simulation and observed data, the distributions of SM background events have been shown to have a simple exponential dependence on the razor variables R and M R over a large fraction of the R 2 -M R plane [17,18].The shape of the M R tail is well-described by two exponentials with slope parameters S i (i = 1, 2), where each S i depends linearly on the R 2 selection threshold (R 2 min ): We construct a simultaneous fit across different R bins, where the M R distribution is fitted for each value of the R 2 threshold to extract the A i and B i parameters.The simultaneous fit allows one to fully exploit the correlations between the fit parameters and therefore (i) to get a better estimate on the uncertainty of the A i and B i parameters, and (ii) to ensure that the PDF obtained from the fit can be used in regions with various R 2 thresholds.The functional form used in the fit for a fixed value of the R threshold is: where f , the relative amplitude of the second exponent, is extracted from the fit.The values of the shape parameters that maximize the likelihood in the fits, along with the corresponding covariance matrix, are used to define the background model and the uncertainty associated with it.Therefore, if a pure sample of a given process is selected, the PDF describing the behavior of the R and M R shapes of a given process can be extracted.
The fits are performed using the ROOFIT toolkit [35].The background PDFs are then used to generate pseudoexperiments, to evaluate the effects of systematic uncertainties on the event yields, as described below in Section 6.4.

The W/Z+jets background
Owing to the lack of a high-purity data sample enriched in events with W/Z+two heavy-flavor jets, we estimate the shape of the W/Z+jets background using MC simulated events.A selection of events in the observed data whose jets fail to be b-tagged could provide a sample enriched in W+light flavor jets.However, because of the b-tagging efficiency on the jet p T [33], the PDF extracted from these events does not provide a sufficiently accurate model for W/Z+b jets events.Therefore, we estimate the shape of the W/Z+jets background using simulated events generated with the MADGRAPH event generator interfaced with PYTHIA, which were found to give an adequate description of CMS observed data [36,37].Residual deficiencies of this MC simulation-based background modeling are accounted for in the extraction of the tt background estimate from observed data, as described in the Section 6.2.The overall normalization of this background is determined using the observed events in the sideband region of the HAD box.
We perform an unbinned fit of the W/Z+jets M R distribution in simulated events passing the MU box selections with 2b-tagged events, using the sum of two separate exponential terms, as shown in Eq. ( 5).The fit allows us to obtain a parametric description of the background that is later used in the derivation of the remaining backgrounds, and it also permits the extrapolation of the prediction into the region of higher R and M R values.The fit is performed in the region M R > 400 GeV and is binned in values of R 2 as shown in Fig. 1.The fit to the simulated data, which provides a good description of the M R distribution, is used as the PDF to estimate the W/Z+b jets background in the signal box.

tt+jets background estimation
We estimate the tt background from the MU box, using 2b-tagged events in collision data (Section 4.2) and requiring the presence of a muon passing the tight identification requirements (Section 4.1).Based on comparisons with the MC simulation, approximately 90% of the events in this sample are tt.We find empirically from MC simulation studies that the shape of the M R distribution in both the tightly selected MU box and in the HAD box is very similar, as can be seen in Fig. 2. We therefore use the shape derived from the 2b-tagged sample to predict the tt background in the signal region.Additionally, because of a non-negligible contribution of W/Z+HF events in this sample, the imperfections in the W/Z + jets background modeling in the simulation are absorbed into the tt background prediction.In order to derive the tt shape, we constrain the W/Z + jets shape to that obtained from the MC simulation (Section 6.1).We find that a two-exponential function provides a good fit to the observed data in the MU box, as shown in Fig. 3.

Multijet background
The remaining backgrounds that contribute significantly to the interesting region of high R 2 originate from heavy-flavor enriched multijet production.We use events with a loose muon in the MU box to derive the multijets background PDF.According to the MC simulation, this sample is composed 45% of top events, 5% of W/Z+b jet events, and 50% of multijet events.We proceed to perform the fits, for which the contributions from W/Z+b jets and tt backgrounds are fixed to the PDFs described in Sections 6.1 and 6.2.Based on simulation studies it is found that the parameters of the second component in the fit function (A 2 and B 2 in Eq. ( 5)) are nearly idenical for the multijet and the tt+jets background processes.In order to better constrain the multijet fit, the parameters of the second component are set equal to those from the observed events for tt+jets while the parameters of the first component of the multijet PDF are left free.The results of the fit in the 2b-tagged MU box are displayed in Fig. 4, where we find good agreement between the fit results and observed data.

Systematic uncertainties
For the backgrounds estimated from observed events, the uncertainty in the total yield arises from the uncertainties (statistical and systematic) in the fit parameters in Eq. ( 5).We estimate these uncertainties by varying the R 2 threshold values (by ±5%), thus arriving at a new set of A i and B i parameters describing the background PDF.The maximum difference observed between the experimental data and the simulated data in the MU box with tight and loose muon selections is then used as the uncertainty on the shape parameters.This procedure results in a 10% uncertainty in the A i values, and 40% in the B i values.We also tested the stability of the fits by varying the initial parameters used to start the fit by ±50%, and found that this variation results in stable solutions, returning the same central value for the A i and B i parameters.
We generate an ensemble of pseudoexperiments, based on the fit results in the MU box.From each pseudoexperiment a new set of values for the parameters is then obtained, with the corresponding uncertainties, and we use the associated PDF results to predict the background yield.The ensemble of pseudoexperiments thus provides a distribution of the expected background yield in the signal regions, with its corresponding uncertainty.This procedure allows us to correctly propagate the systematic uncertainty in the background shape into the prediction of the background.To account for the normalization uncertainty we propagate the uncertainty in the f R 2 introduced in Section 5 to the prediction of background yields in the signal region from control samples in observed events.
The effect of the jet energy scale (JES) and jet energy resolution (JER) uncertainties on the W/Z+jets background estimate and the signal model yields from simulation are taken into account.These effects are evaluated by repeating the extraction of all background PDFs by first varying the JES/JER by plus or minus one standard deviation in the W/Z+jet background model, and recalculating the E T / and R.These variations correspond to uncertainties as large as 3% in the selection efficiency.We then re-derive the background model PDFs from observed data in the MU box, using the newly obtained W/Z+HF jets model.The new set of PDFs with their corresponding covariance matrices then serve as an alternative background model.
We apply a scale factor of about 0.95, that is weakly dependent on jet p T , to account for an observed difference in tagging efficiency between data and simulation.The uncertainty in the scale factor varies from 0.03 to 0.05 for jets with p T from 30 to 670 GeV, and is 0.10 for b jets with p T > 670 GeV.These uncertainties are measured using a dijet sample with high b-jet purity, as detailed in Ref. [33].
The uncertainty in the b 1 acceptance due to uncertainties in the parton distribution functions is calculated using the recommendation from the PDF4LHC group [38].The parton distribution function and the α s variations of next-to-leading (NLO) order in the MSTW2008 [39], CTEQ6.6 [40], and NNPDF2.0[41] sets were taken into account and their impact on the signal cross sections was compared with the calculation with CTEQ6L1 [42] that was used in the simulation of the signal samples.From these three sets we evaluate an upper and lower bound on the signal efficiency for each pair of assumed b 1 and χ 0 1 masses, and half of the difference between the two bounds is used as an estimate of the uncertainty.The theoretical cross section of LQ 3 production has been calculated using CTEQ6L1 and CTEQ6M [42] at NLO, and the uncertainty in the prediction of the cross section was estimated by repeating the calculation using the NLO MRST2002 parametrization [43].This uncertainty was found to vary from 3.5 to 25% for leptoquarks in the mass range considered in this analysis [44].
The systematic uncertainty to the luminosity measurement is taken to be 2.2% [45], which is correlated among all signal channels and the background estimates that are derived from simulations.The uncertainty in trigger efficiency is estimated using a set of prescaled razor triggers with low thresholds, and is found to be 2% for events in the HAD box, and 3% for events in the MU and ELE boxes.

ELE control region
In order to check that our background shape modeling indeed predicts the observed data adequately, we use the PDFs obtained in the steps described above (Sections 6.1-6.3) in an orthogonal sample in the 2b-tagged ELE box with a tight electron selection, i.e. the sample with a well-identified electron, which is then treated as a neutrino.This signal-depleted sample provides an independent cross-check of our background modeling, and covers the same region in R and M R as the HAD box.Additionally, based on MC simulation studies, the composition of the tight ELE sample in observed events is similar to that of the HAD sample, consisting of approximately 85% tt, 5% W/Z+HF jets, and 10% multijet events.For comparison, the HAD sample is expected to contain approximately 70%, 5%, and 25% of the respective backgrounds.
Using the background model PDFs obtained from the fits, we derive the distribution of the expected shapes in the ELE box using pseudoexperiments.In order to correctly account for correlations and uncertainties in the parameters describing the background model, the shape parameters used to generate each pseudoexperiment data set are sampled from the covariance matrix returned by the fit.The actual number of events in each dataset is then drawn from a Poisson distribution centered on the yield returned by the covariance-matrix sampling.For each pseudoexperiment dataset, the number of events in the sideband and in the high-R 2 region is found.We then obtain the scale factor f R 2 , ELE = 0.87 ± 0.14 from the sideband region, which is used to predict the overall yield of background events in the high R 2 region of the ELE box.
The comparison of the predicted M R distribution with the observed events in the ELE box is shown in Figure 5, and the background model is found to predict the observed data adequately.We also test our ability to correctly predict the yields of SM backgrounds using the scale factor mentioned above.The results are summarized in Table 2. Total background yield in the sideband is normalized to the number of observed data events in the sideband, in order to derive the scale factor f R 2 , ELE , as described in Section 5.The uncertainties in the background yields shown here represent systematic uncertainties that are estimated by varying the parameters A i and B i , as described in Section 6.4.As can be seen in this comparison, the f R 2 , ELE obtained from the sideband allows one to predict the overall normalization of the 2b-tagged sample.
Table 2: Comparison of the yields in the ELE box.The sideband here refers to 2b-tagged events in the ELE box with 400 < M R < 600 GeV and 0.2 < R 2 < 0.25, while "signal-like" refers 2b-tagged events with M R > 400 GeV and R > 0.25.The scale factor derived in the sideband ( f R 2 , ELE = 0.87 ± 0.14) is used to normalize the background yield in the signal-like region (third column), and the uncertainty on the f R 2 , ELE is propagated into the total background yield.We perform another check to test whether the R 2 -dependence is well-described by our background model.This check is needed since in the final signal region we have several signal boxes, each optimized for different signal mass hypotheses.In order to increase the sensitivity for higher masses, a tighter selection on R 2 is imposed to reduce the backgrounds further, while keeping the signal efficiency high.In order to ensure that our background model adequately describes observed data with higher R 2 thresholds, we perform the same procedure in the ELE box.The results are summarized in Table 3.Here, we use the same f R 2 , ELE derived from the sideband.As can be seen from these results, this model correctly predicts the total yields for higher R 2 boxes.

Results
We search for LQ 3 and b 1 signals in the HAD box data sample using the background PDFs obtained from the MU box (Sections 6.1-6.3).The predicted background yields and their uncertainties are summarized in Table 4.Total background yield in the 2b-tagged sideband is normalized to the number of observed data events in the sideband, in order to derive the scale factor f R 2 , HAD = 1.10 ± 0.13, as described in Section 5.The distributions of R and M R observed in the 2b-tagged HAD box are compared to the background prediction in Fig. 6.As seen in Fig. 6 and Table 4, both the number of observed events and the shapes of the R and M R distributions are in agreement with the expected SM backgrounds.Therefore, we proceed to define two signal regions, to enhance the sensitivity for different LQ 3 masses.The regions are optimized to provide the lowest expected cross section limits, by varying the thresholds on R and M R .We find that M R > 400 GeV provides the best sensitivity for all masses, and for LQ 3 masses below 350 GeV the optimal selection is R 2 > 0.25, while for higher masses R 2 > 0.42 provides best sensitivity.Because of the high value assumed for the χ 0 1 mass in the b 1 search, the inclusive selection of M R > 400 GeV and R 2 > 0.25 is found to provide the optimal sensitivity in the mass range considered in this analysis.
Table 5 shows the comparison of the expected background yields in these signal boxes, and agreement of the observed event counts with the expectations is observed.Table 6 shows the efficiency of these selections for several LQ 3 mass hypotheses, based on MC simulation.Efficiencies for the b 1 signal are shown in Fig. 7. Typical efficiencies range from a few percent up to ∼12 percent for b 1 masses between 200 and 500 GeV and small χ 0 1 mass.The efficiency drops when the mass of the b 1 squark is close to the mass of χ 0 1 , since the resulting b jets are softer in these scenarios.Table 5: Expected and observed yields in the 2b-tagged HAD box for various R 2 selections and a fixed M R > 400 GeV requirement.The quoted uncertainties on the expected number of events include statistical and systematic uncertainties, and the uncertainty from the f R 2 , HAD .The left three columns show inclusive yields above the R 2 threshold, while the right three columns show the yields in bins of R The statistical model for the observed number of events is a Poisson distribution with the ex-  White lines show the iso-efficiency contours for 1, 5, and 10% signal efficiency, respectively.
pected value equal to the sum of the signal and expected backgrounds.Log-normal priors for the nuisance parameters are used to model the systematic uncertainties listed in Section 6.4.
A 95% CL upper limit is set on the potential signal cross section, as summarized in Table 7.
The modified frequentist construction CL s [46,47] is used for limit calculation.These limits are interpreted in terms of limits of LQ 3 pair production cross section as shown in Fig. 8.The upper limits are compared to the NLO prediction of the LQ pair production cross section [44], and we set a 95% CL exclusion on LQ masses smaller than 440 GeV (expected 470 GeV), assuming β = 0. We also present the 95% CL limit on β as a function of LQ 3 mass as shown on the right side of Fig. 8.
The results of the analysis are interpreted in the context of the simplified supersymmetry model spectra (SMS) [48][49][50].In SMS, a limited set of hypothetical particles and decay chains are intro-  Figure 8: (Left) the expected and observed upper limit at 95% CL on the LQ 3 pair production cross section as a function of the LQ 3 mass, assuming β = 0.The systematic uncertainties reported in Section 6.4 are included in the calculation.The vertical greyed region is excluded by the current D0 limit [12] in the same channel.The theory curve and its band represent, respectively, the theoretical LQ 3 pair production cross section and the uncertainties due to the choice of parton distribution functions and renormalization/factorization scales [44].(Right) minimum β for a 95% CL exclusion of the LQ 3 hypothesis as a function of LQ 3 mass.The observed (expected) exclusion curve is obtained using the observed (expected) upper limit and the central value of the theoretical LQ 3 pair production cross section.The band around the observed exclusion curve is obtained by considering the observed upper limit while taking into account the uncertainties on the theoretical cross section.The grey region is excluded by the current D0 limits [12] in the same channel.duced to produce a given topological signature, such as the E T / plus b jets final state considered in this analysis.We consider a SMS scenario where all supersymmetric particles are set to have a very large mass, except for the b 1 and χ 0 1 .The pairs of scalar bottom quarks produced through strong interactions are kinematically allowed to decay only into a b quark and a χ 0 1 .The observed and expected 95% CL upper limits in the b1 − χ 0 1 mass plane are shown in  Fig. 9, where the b 1 pair production cross section is calculated at the NLO and next-to-leadinglogarithm (NLL) order [51][52][53][54][55][56].Since M R depends on the squared difference of the masses of b 1 and χ 0 1 , at the b 1 masses around 400-450 GeV and low χ 0 1 masses the exclusion limit is almost independent of the χ 0 1 mass.The signal acceptance in the region with small mass splitting between the b 1 and χ 0 1 is particularly susceptible to uncertainties associated with initial-state radiation (ISR).The impact of ISR is estimated by comparing the results of the acceptance calculation using PYTHIA with the "power shower" and with moderate ISR settings [19].If the acceptance varies by more than 25% for a particular choice of b 1 and χ 0 1 masses, then no limit is set for those mass parameters.This procedure results in reduced sensitivity in the region of m( b 1 ) < 300 GeV and 80 < m( χ 0 1 ) < 130 GeV, and thus an inability to exclude some of the models in this parameter range.

Summary
A search has been performed for third-generation scalar leptoquarks and for scalar bottom quarks in the all-hadronic channel with a signature of large E T / and b-tagged jets.This search is based on a data sample collected in pp collisions at √ s = 7 TeV and corresponding to an integrated luminosity of 4.7 fb −1 .The number of observed events is in agreement with the predictions for the SM backgrounds.We set an upper limit on the LQ 3 pair production cross section, excluding a scalar LQ 3 with mass below 450 GeV, assuming a 100% branching fraction of the LQ 3 to b quarks and tau neutrinos.We set 95% confidence level upper limits in the b1 − χ 0 1 mass plane such that for neutralino masses of 50 GeV, scalar bottom masses up to 410 GeV are excluded.These results represent the most stringent limits on LQ 3 masses and extend limits on

Figure 1 :
Figure 1: M R distributions for different values of the R 2 threshold for events passing the MU box selections in the W/Z+jets MC simulation.The results of the fits (lines) are overlaid with the M R distributions from the MC simulation (markers).

Figure 2 :Figure 3 :
Figure2: The M R distributions (left) in tt MC simulated events selected with either tight MU, tight ELE and HAD requirements, and (right) the ratio of the number of events selected with the HAD or tight MU selections, as a function of M R .

Figure 4 :
Figure 4: The result of the fit of the M R distributions (lines) compared to the MU box observed data for events with R 2 > 0.14 (left); individual contributions of backgrounds are not stacked.On the right are shown the M R distributions for different values of the R 2 threshold (right) in 2b-tagged events of the MU box with a loose muon; the results of the fits (lines) are overlaid with the observed distributions (markers).

Figure 5 :
Figure 5: The M R distribution for observed data in the 2b-tagged ELE box for events with R 2 > 0.25 compared to the prediction.The background model derived from the MU box is used to predict the M R shapes of the background processes.The individual contributions are not stacked.

Figure 6 :
Figure 6: Comparison of the background prediction with the data observed in the 2b-tagged sample in the HAD signal box for the M R (left) and R (right) distributions.The expected contributions from LQ 3 and b 1 signal events with various mass hypotheses are also shown.

Figure 9 :
Figure 9: The expected and observed 95% CL exclusion limits for the b 1 pair production SMS model.The red dashed contour shows the 95% CL exclusion limits based on the NLO+NLL cross section.The red dotted contours represent the theoretical uncertainties from the variation of parton distribution functions, and renormalization and factorization scales.The corresponding expected limits are shown with the black dashed contour.The shaded yellow contours represent the uncertainties in the SM background estimates, as reported in Section 6.4.

Table 1 :
Summary of samples used in the search, with a short description of their specific purpose.Events in all samples are required to have M R > 400 GeV and to include two btagged jets.The selections on R 2 listed in the table are applied after recalculating E T / and R for events in which charged leptons are treated as neutrinos.The definitions of muons (µ) and electrons (e) are discussed in Section 4.1.< R 2 < 0.25 veto leptons M R < 600, sideband to extract f R 2 , HAD HAD R 2 > 0.25 veto leptons signal box, search for signal

Table 3 :
Expected and observed yields in the 2b-tagged ELE box for R 2 selections and a fixed requirement M R > 400 GeV.The quoted uncertainties on the expected number of events include statistical and systematic uncertainties, and the uncertainty on the scale factor f R 2 , ELE .

Table 4 :
Comparison of the yields in the 2b-tagged (signal region) samples in the HAD box.The uncertainties include the systematic uncertainty in the background shapes (Section 6.4) and statistical uncertainties.The uncertainty in the total yield after scaling also includes the jet energy scale uncertainty.The scale factor derived in the sideband ( f R 2 , HAD = 1.10 ± 0.13) is used to normalize the background yield in the signal-like region.The uncertainty in f R 2 , HAD is propagated and included in the quoted uncertainty in the expected background yields. 2.

Table 6 :
Summary of the expected LQ 3 signal yields and efficiency in the signal region, for 4.7 fb −1 of observed data, in events with M R > 400 GeV.For LQ 3 masses below 350 GeV R 2 > 0.25 is required, while for heavier masses we require events to pass R 2 > 0.42.All uncertainties are statistical only.

Table 7 :
Observed and expected 95% CL upper limits on the LQ 3 pair-production cross section as a function of the LQ 3 mass.