Measurement of b -hadron pair production with the ATLAS detector in proton-proton collisions at √ s = 8 TeV The ATLAS Collaboration

A measurement of b -hadron pair production is presented, based on a data set corresponding to an integrated luminosity of 11.4 fb − 1 of proton–proton collisions recorded at √ s = 8 TeV with the ATLAS detector at the LHC. Events are selected in which a b -hadron is reconstructed in a decay channel containing J /ψ → µµ , and a second b -hadron is reconstructed in a decay channel containing a muon. Results are presented in a ﬁducial volume deﬁned by kinematic requirements on three muons based on those used in the analysis. The ﬁducial cross section is measured to be 17 . 7 ± 0 . 1(stat.) ± 2 . 0(syst.) nb. A number of normalised di ﬀ erential cross sections are also measured, and compared to predictions from the P ythia 8, H erwig++ , M ad G raph 5_ a MC@NLO + P ythia 8 and S herpa event generators, providing new constraints on heavy ﬂavour production.


Introduction
• the separation between the J/ψ and the third muon in the azimuth-rapidity plane, ∆R(J/ψ, µ), 3 measured inclusively and split into p T (J/ψ, µ) < 20 GeV (low p T ) and p T (J/ψ, µ) ≥ 20 GeV (high p T ) regions; • the separation in rapidity between the J/ψ and the third muon, ∆y(J/ψ, µ); • the magnitude of the average rapidity of the J/ψ and the third muon, y boost ; • the mass of the three-muon system, m(J/ψ, µ); • the ratio of the transverse momentum of the three-muon system to the invariant mass of the threemuon system, p T /m, and its inverse, m/p T .
Results are presented as the total cross section and normalised differential cross sections, all within a fiducial region defined in Section 3.3.

ATLAS detector
The ATLAS detector [30] consists of an inner tracking system, referred to as the inner detector, surrounded by a superconducting solenoid, electromagnetic and hadronic calorimeters and an external muon spectrometer. Charged-particle tracks in the pseudorapidity range |η| < 2.5 are reconstructed with the inner detector, which is immersed in a 2 T axial field provided by the solenoid. The inner detector consists of pixel and microstrip semiconductor detectors, as well as a straw-tube transition radiation tracker. The solenoid is surrounded by sampling calorimeters, which span the pseudorapidity range up to |η| = 4.9.
High-granularity liquid-argon (LAr) electromagnetic calorimeters are present up to |η| = 3.2. Hadronic calorimeters with scintillator tiles as active material cover |η| < 1.74, while LAr technology is used for hadronic calorimetry from |η| = 1.5 to |η| = 4.9. Outside the calorimeter system, air-core toroids provide a magnetic field for the muon spectrometer. Three stations of precision drift tubes and cathode-strip chambers provide measurements of muon tracks in the region |η| < 2.7. Resistive-plate and thin-gap chambers provide muon triggering capability up to |η| = 2.4.
3 Data set, event selection and simulation

Data set and event selection
This analysis uses proton-proton collision data recorded by the ATLAS experiment during 2012 at a centre-of-mass energy √ s = 8 TeV. All events considered were recorded while the detector and trigger systems were functional and satisfied data quality requirements. Events are selected using a dimuon trigger where the muons are required to have opposite charge, be consistent with originating from the same production vertex, have p T (µ) > 4 GeV and |η(µ)| < 2.4 and satisfy a loose dimuon mass selection, 2.5 < m(µ + , µ − ) < 4.3 GeV. The trigger was prescaled for various periods throughout 2012, and the integrated luminosity of the resulting data set is 11.4 fb −1 .
Muon candidates are reconstructed by combining inner detector tracks with tracks in the muon spectrometer. Candidates are required to have |η| < 2.5 and p T > 6 GeV. Additional requirements are placed on the track quality, requiring a minimum number of hits in the different layers of each of the inner detector subcomponents. The J/ψ candidates are formed by selecting pairs of muons with opposite charge that have been identified to originate from a common vertex. These muons must also have |η| < 2.3 to lie within the trigger acceptance, and are required to match the direction of the corresponding trigger-level candidates. The invariant mass of the J/ψ candidate is required to be in a window around the Particle Data Group (PDG) average of the J/ψ mass of 3.097 GeV [31], m(J/ψ PDG ), specifically in the range 2.6-3.5 GeV. If more than one dimuon J/ψ candidate satisfies the requirements above, the one with a reconstructed mass closest to the PDG mass is chosen.
A third-muon candidate is also required in the event. If there are more than three muons in an event the highest-p T muon that is not used in the J/ψ candidate construction is defined to be the third muon.

Simulation
Inclusive b-hadron pair production is simulated with the Pythia8.186 [32] Monte Carlo event generator, based on a 2 → 2 matrix element calculation matched to a parton shower. The CTEQ6L1 [33] parton distribution function (PDF) set is used, along with the AU2 [34] tuned parameter settings. The b-quarks are treated as massless in the PDF set and the matrix element calculation, but the mass is reinstated during the parton shower. Events are filtered based on the presence of J/ψ(→ µµ) produced in the decay of a b-hadron, requiring two muons with p T > 6 GeV. 4 The simulated collisions are overlaid with additional simulated minimum-bias collisions, to emulate the effect of multiple proton-proton interactions occurring during the same (in-time) or a nearby (out-of-time) bunch crossing, an effect called "pile-up". These additional collisions are produced using Pythia8 with the A2 [35] tuned parameter settings and the MSTW2008 [36] PDF set. The simulated events are then passed through a GEANT4 [37] simulation of the full ATLAS detector [38], and reconstructed using the same software as the real data.
In addition, an inclusive pp → bb sample is simulated with the Herwig++ [39] event generator using the CTEQ6L1 PDF set with UE-EE5 [40] parameter settings tuned for the underlying-event modelling. Again, this prediction includes a 2 → 2 matrix element calculation matched to a parton shower, but in this case, the b-quarks are considered massive in both the matrix element calculation and parton shower.
For both predictions, all 2 → 2 QCD processes are included and b-quarks can be produced either in the matrix element or in the subsequent parton shower phase.

Fiducial volume definition
Cross sections are defined in terms of observables at the particle level, defined in terms of particles with an average lifetime greater than 10 mm/c. The signal is defined by two final-state muons that originated from the decay of a J/ψ, which itself is a descendant of a b-hadron (including feed-down decays), and a third muon which originated from a different b-hadron (including cascade decays). Muons are "dressed" by adding the four-momentum of photons (excluding photons produced in the decays of hadrons) that are close to the muon, defined by ∆R η (µ, γ) < 0.1, 5 to the muon four-momentum. The resulting dressed 4 All J/ψ are decayed to µµ. In events where two J/ψ are present, the event is weighted by BR(J/ψ → µµ) to correct for this. 5  muons are required to have p T > 6 GeV. The two muons from the J/ψ are required to have |η| < 2.3 and the third muon is required to have |η| < 2.5, matching the trigger and reconstruction acceptances respectively.

Muon trigger and reconstruction efficiencies
The reconstructed J/ψ and third muon candidates are taken from muons satisfying the selection criteria previously described. To extract the fiducial cross sections, the data must be corrected to account for inefficiencies in the selection of events containing muons. Corrections for the muon trigger and reconstruction efficiencies are obtained from data-driven techniques and applied to the observed events via an event weight.
The trigger efficiency correction [41] is factored into several components: • The single-muon trigger efficiency is calculated with a tag-and-probe method using J/ψ candidate data. The efficiency is dependent on the kinematics of the muon and is parameterised as a function of p T and q × η, where q is the reconstructed muon charge. A correction derived from simulation is required to remove bias arising from the difference between the single-muon triggers used for the tag-and-probe method (which require p T (µ) > 18 GeV), and the dimuon trigger used in the analysis (which requires p T (µ) > 4 GeV for both muons).
• The dimuon trigger efficiency correction is calculated with data in two parts, a vertex-finding and opposite-charge correction, and a correction for the spatial overlap of muons in the trigger system, which typically results in a reduction in efficiency when ∆R η (µ, µ) < 0.2. The spatial overlap correction is calculated as a function of the angular separation between the two muons calculated in three separate dimuon rapidity intervals.
• A correction taken from simulation is applied to account for trigger inefficiencies in events with three muons where a pair of muons falls in the same trigger-level object, also typically for ∆R η (µ, µ) < 0.2. This correction is derived by fitting a linear function to the ratio of the number of b-hadron pair production events accepted by the trigger to all signal events, as a function of the separation of the third muon from the closest muon in the J/ψ candidate.
The per-event trigger efficiency is given by the product of each single-muon efficiency, the additional dimuon trigger efficiency corrections and the three-muon efficiency correction.
The data are also corrected for the muon reconstruction efficiency [42] which is applied per muon for each of the three muons in the event. The efficiency for a single muon is calculated in two parts: the efficiency for a muon track to be reconstructed in the inner detector is parameterised as a function of muon p T and η; then the efficiency for reconstructing a muon given that an inner detector track has already been reconstructed is calculated using a tag-and-probe method using Z → µ + µ − and J/ψ → µ + µ − data. The correction is derived as a function of muon p T and q × η.

Signal extraction
While the event selection described in Section 3 provides a high-purity J/ψ +µ sample, the signal of interest in this analysis must still be extracted from the data. This is done in the following stages: 1. The yield of J/ψ resulting from the decay of a b-hadron (the signal) is extracted using a simultaneous fit to the distributions of dimuon mass and pseudo-proper decay time, τ, defined as where L xy is the transverse distance between the primary vertex and the dimuon vertex positively (negatively) signed for a vertex with momentum vector pointing away from (towards) the primary vertex. The primary vertex is defined as the vertex formed from at least two tracks, each with p T > 400 MeV, that has the largest summed track p 2 T in the event. The quantity p T (µ + µ − ) is the transverse momentum of the dimuon system. This fit is described in Section 5.1, and the resulting determination of J/ψ background contributions is used as an input to the next step.
2. Next, the yield of events containing an additional (i.e. third) muon resulting from the decay of a b-hadron is determined. First, the proportion of signal relative to background is enhanced by requiring τ > 0.25 mm/c. The remaining contribution from certain backgrounds is determined from the J/ψ fit, and other backgrounds are determined using a simultaneous fit to the transverse impact parameter significance, S d 0 , and the output of a boosted decision tree (BDT) trained to separate signal muons from misreconstructed muons, as described in Section 5.2. The transverse impact parameter significance is defined as where the transverse impact parameter, d 0 , is the distance of closest approach of the track to the primary vertex point in the r-φ projection, with the d 0 sign given by the sign of the angular momentum of the track around the beam evaluated at the point of closest approach; and σ d 0 is the (unsigned) uncertainty in the reconstructed d 0 .
3. Some remaining irreducible sources of background are then subtracted from the fitted yields, as described in Section 5.3.

4.
Having determined the yield of J/ψ and third muons, this is corrected for the effect of the τ requirement (τ > 0.25 mm/c), as described in Section 5.4.
5. Finally, the effects of detector resolution are corrected for, as described in Section 5.5, determining the measured cross section for the signal.
This process is repeated for each kinematic bin of each differential cross section, resulting in the cross section in that bin. The width of these bins was chosen in order to retain the maximum information about the shape of each differential cross section, while maintaining a sufficient number of events in each bin to minimise the uncertainties. The data used in all fits are already corrected by the muon trigger and reconstruction efficiencies previously described.

Extraction of the J/ψ composition
To identify the J/ψ signal and background contributions to the data, a two-dimensional unbinned extended maximum-likelihood fit to the J/ψ candidate mass and pseudo-proper decay time is performed. There are two components of the J/ψ candidate dimuon mass spectrum: the real J/ψ contribution, which is peaked at the J/ψ mass; and the fake J/ψ background contribution, which forms a continuum under the J/ψ peak. The pseudo-proper decay time distribution for a real J/ψ has two components. The first is for a J/ψ from direct strong production in the hard scatter; this component is peaked at low τ and is referred to as prompt.
The second corresponds to the J/ψ signal from b-hadron decays that has, on average, larger values of τ due to the lifetime of the b-hadron, and is referred to as non-prompt. Both prompt and non-prompt J/ψ include feed-down from excited charmonium states produced in either the hard scatter or the decay of a b-hadron. Similarly, there are three components of the fake J/ψ background included in the fit, to account for different contributions to the pseudo-proper decay time (see below). The pseudo-proper decay time probability density function (p.d.f) components are convolved with a detector resolution function, modelled by a double Gaussian distribution centred at τ = 0 mm/c, with the widths of those Gaussian functions and their relative normalisation allowed to float freely in the fit.
In order to extract the number of non-prompt J/ψ, a fit model with five functional forms is used, based on the model used in Ref. [26]. The forms are: • Non-prompt J/ψ: The dimuon invariant mass is modelled using the sum of a Crystal Ball [43][44][45] and a Gaussian function. The τ distribution is modelled using a single-sided exponential decay function, convolved with the resolution function.
• Prompt J/ψ: The dimuon invariant mass is modelled using the same Crystal Ball and Gaussian function used for the non-prompt J/ψ. The τ distribution is modelled using a delta function at τ = 0 mm/c, convolved with the resolution function.
• Prompt fake J/ψ background: The dimuon invariant mass is modelled using a constant distribution (fits using a first order polynomial yield slopes consistent with zero for this component). The τ distribution is modelled using a delta function at τ = 0 mm/c, convolved with the resolution function.
• Single-sided fake J/ψ background: The dimuon invariant mass distribution is modelled by a negativeslope exponential function. The τ distribution is modelled using a single-sided exponential decay function, convolved with the resolution function.
• Double-sided fake J/ψ background: The dimuon invariant mass distribution is modelled by an exponential decay function. The τ distribution is modelled using a double-sided exponential decay function, convolved with the resolution function.
In the mass fit, the Gaussian and Crystal Ball functions share the same mean value. The Crystal Ball n and α parameters are fixed to values derived from an inclusive J/ψ fit, but the width is allowed to float.
The mean values of the decay time model's Gaussian function and double-sided exponential functions are fixed at zero. All other fit parameters are allowed to float.
The functional forms are combined into the fit model used in the unbinned extended maximum likelihood fit to the data. The fit for an example kinematic bin is shown in Figure 1, where the prompt, single-sided and double-sided fake J/ψ backgrounds are combined for clarity.
The stability and performance of the fit are checked by a closure test on simulated samples. Pseudodata sets are produced by combining different numbers of prompt and non-prompt J/ψ events. The two-dimensional J/ψ model is then fitted to each pseudo-data set and the fractions of prompt and nonprompt J/ψ events is extracted from the fit. The fractional difference between the input and fitted number of prompt and non-prompt J/ψ events is compared, demonstrating that the fit is performing very well with deviations from the input composition consistently below 2%, which is comparable to the statistical uncertainty in the fit.

Extraction of the non-prompt muon signal
Having determined the J/ψ contributions, the next step is to extract the signal yield: a non-prompt muon from the same hard scatter as a non-prompt J/ψ. To do this, a second two-dimensional maximumlikelihood fit is performed, in this case using two observables that allow separation of the signal muons from background: the transverse impact parameter significance, and a BDT trained to separate signal muons from instrumental backgrounds.
The mechanisms of prompt and non-prompt J/ψ production differ significantly, resulting in very different background contributions to the third-muon sample in each case. This leads to difficulties in fitting the third-muon distributions across the entire J/ψ τ range. To increase the signal muon purity and improve the third-muon fit performance, the selected J/ψ in each event is first required to have τ > 0.25 mm/c, removing all of the prompt J/ψ candidates. In addition, to reduce the J/ψ candidates arising from the continuum background, a tighter invariant mass requirement of 2.95 < m(µ + , µ − ) < 3.25 GeV is applied. To account for the signal efficiency loss from these criteria, a correction is made once the signal yield is extracted from the fits.
There are several contributions to the third-muon background that are then considered: prompt muons produced at the primary vertex, muons produced in the decays of charged pions or kaons, third muons in events where the J/ψ candidate is not a real J/ψ but from the continuum background, and events where the J/ψ and third muon are produced from different hard scatters in the same bunch crossing (referred to as the pile-up background).
There are two sources of fake muon background. Decay-in-flight (DIF) muons are the result of the decay of a charged pion or kaon. The small mass difference between the hadrons and resulting muons can result in only a small deflection of the track at the decay vertex. Therefore, the track of both the hadron and resulting muon can be reconstructed as coming from a single particle and a muon candidate may be identified. Muon candidates can also be reconstructed when charged hadrons leave tracks in the inner detector and charged particles from the shower in the hadronic calorimeter leave tracks in the muon spectrometer, referred to as hadronic shower leakage muons. Both the DIF and hadronic shower leakage sources of fake muons contribute significantly at low angular separations between the J/ψ and third muon, as they can arise through decays such as B ± → J/ψ + K ± .
To discriminate between the signal and fake muons, a BDT is constructed from kinematic variables that are sensitive to the production mechanism of the muons. These variables are: • Track deflection significance: This parameter is the maximum value of the significance of the difference in track curvature calculated upstream and downstream of a point somewhere along the track reconstructed in the inner detector. DIF muons originating from a point inside the inner detector typically have higher values of track deflection significance than the signal muons.
• Track deflection neighbour significance: Computed considering track segments, between adjacent hits, along the inner detector track. The largest value over the whole track of the significance of the angular difference between adjacent track segments is taken. This variable quantifies the significance of a deflection along a muon track; DIF muons originating from a point inside the inner detector populate larger values.
• Momentum balance significance: The significance of the difference between the track momenta reconstructed in the inner detector and in the muon spectrometer. If a pion or kaon decays outside the inner detector, the inner detector track may be matched to a lower-momentum muon spectrometer track produced by the resulting muon. The imbalance between the two track momenta is higher for these DIF muons than for inner detector and muon spectrometer tracks produced by a single muon. This variable also offers discrimination between signal and hadronic shower leakage muons.
• Absolute pseudorapidity, |η|: Muon candidates produced in the background processes are more likely to be produced at high absolute pseudorapidities.
The BDT is developed in the TMVA framework [46]. The Pythia8 simulation is separated into two independent samples; one is used for training the BDT and the other is used to validate the performance. Figure 2 (left) shows a clear discrimination between the distribution of the BDT output for the signal and the fake muon background.
The BDT is trained using signal muons and fake muons, both taken from the simulated signal sample using the list of particles produced in each simulated collision, the Monte Carlo (MC) event record. Signal muons are defined as reconstructed muons, µ reco , that can be associated to a muon from the MC event record, µ MC , which was produced in the decay of a b-hadron. The association between reconstructed and simulated muons is performed by requiring ∆R η (µ MC , µ reco ) < 0.02. Muons from DIF are identified as reconstructed muons which fail the signal definition, but are associated to a muon in the MC event record which has a charged pion or kaon as a parent. In this case, a looser association is required due to the change in the momentum at the point the pion/kaon decays: ∆R η (µ MC , µ reco ) < 0.15. Finally, the remaining reconstructed muons may be identified as hadronic shower leakage if they do not match any muon from the MC event record, but do match a charged pion or kaon. Again, a looser association of ∆R η (π/K MC , µ reco ) < 0.15 is used to correctly identify all of this background.
There is also a background of third muons in events where the J/ψ candidate comes from the dimuon continuum background (the "fake J/ψ" contribution introduced in Section 5.1). Events lying in the regions outside the dimuon signal mass region (2.95 < m(µ + , µ − ) < 3.25 GeV) but within the range 2.60 < m(µ + , µ − ) < 3.50 GeV consist almost entirely of this fake J/ψ background. These events are therefore used to form the S d 0 and BDT templates used for the fake J/ψ background. These templates are normalised using the results from the two-dimensional J/ψ fit (described in Section 5.1) to determine the number of fake J/ψ events in the signal mass region.
The background from pile-up is studied using ∆z 0 , defined as the difference between the reconstructed z-position (at their respective points of closest approach to the beam axis) of the third-muon track and the J/ψ candidate muon which maximises the value of ∆z 0 . Figure 2 (right) shows the ∆z 0 distribution for data after all event selection criteria are applied. The distribution consists of two components: a peaked structure centred on zero, representing events where the J/ψ candidate and third muon are produced in the same proton-proton interaction, and a Gaussian-distributed component from pile-up spanning a wide ∆z 0 range. To suppress the pile-up background, events are required to have |∆z 0 | < 40 mm, and the pile-up background within this signal region is estimated by fitting a Gaussian model to the broad ∆z 0 distribution, excluding the signal region from the fit. The integral of the Gaussian function within |∆z 0 | < 40 mm gives the number of residual pile-up events in the signal region, and the shape of third muon BDT and S d 0 distributions for pile-up events is taken from the pure pile-up region outside this |∆z 0 | range.
In summary, the two-dimensional fit of S d 0 vs BDT for the third muon is constructed from the following components: • Signal µ: Both the BDT and S d 0 fit templates are taken from the Pythia8 simulated sample, using reconstructed muons matched to a muon in the simulated event record which derives from a bhadron. This component is expected to populate the high values of the BDT output, signifying real muons, and have a wide S d 0 distribution indicating production away from the interaction point. The shapes of the templates are fixed but the normalisation floats in the fit.
• Prompt µ: Both the BDT and S d 0 templates are taken from J/ψ muons in the inclusive pp → J/ψ Pythia8 simulation, where J/ψ production is dominated by prompt production. These muons are real and thus should occupy the high values in the BDT output distribution and will have a narrow S d 0 distribution as they are produced at the interaction point. The shapes of the templates are fixed but the normalisation floats in the fit.  Figure 3: The two-dimensional S d 0 (left)-BDT(right) fit for a single differential observable bin: 0.4 < ∆R(J/ψ, µ) < 0.8. The points with error bars are data, plotted at bin centres. The solid line is the projection of the unbinned maximum-likelihood fit to the data. The signal µ, prompt µ, non-prompt fake µ, prompt fake µ, fake J/ψ and pile-up contributions are represented by dashed lines.
• Prompt and non-prompt fake µ: Both the BDT and S d 0 fake muon templates are taken from the Pythia8 simulated sample. The same BDT template is used for both the prompt and non-prompt components. The BDT shape has a large contribution at low values. The S d 0 template is derived separately for the prompt and non-prompt components as fake muons can have both prompt as well as non-prompt sources. The shape of the templates are fixed but the normalisation floats in the fit.
• Fake J/ψ: The BDT and S d 0 templates are derived from data and fixed in the fit.
• Pile-up: The BDT and S d 0 templates are derived from data and fixed in the fit.
The three S d 0 templates derived from simulation have a small shift in the mean of the distribution relative to data, due to the modelling of the beam-spot position. To correct for this, a small shift in the mean is derived independently for each of these three templates, by rerunning the fit once and allowing the shift to float; the shifted mean is then fixed at the fitted value for all other fits. The shifts for the three samples are consistent within uncertainties, and approximately −0.05.
Having derived templates for each of the expected third muon components in the data, an extended maximum-likelihood fit is carried out. The data are fitted in each bin of the various observables with the signal muon, prompt muon, fake J/ψ and pile-up templates derived in the same kinematic bin. Figure 3 shows the result of the third-muon fits to the data for an example bin. While it can be seen that some of the templates suffer from statistical fluctuations, these have a very small effect on the fit result, and a systematic uncertainty is derived to cover these effects, as described in Section 6.4.
The stability of the fit is verified with a closure test using simulated samples. Pseudo-data sets are created with varying fractions of prompt and non-prompt third-muon events and the fits repeated. The numbers of fitted prompt and non-prompt third muon events are compared to the values used to construct the pseudodata sets. The differences between input and fitted values are typically below the 1% level, demonstrating that the fits perform well. Two additional qualitative cross-checks on the modelling of the fake muon component are performed by considering data control regions with requirements orthogonal to those of the signal region such that they are expected to contain more fake muons. The first control region is defined by reversing the pile-up rejection requirement so that the probability of a charged pion or kaon faking a third muon increases. The second control region looking at prompt dimuon events is defined by reversing the pseudo-proper decay time requirement where the fraction of J/ψ production via weak decays is reduced, and the fraction of events containing a charged pion or kaon faking a third muon is increased. The fits in these control regions are performed inclusively as there is insufficient data to split into differential bins. The third-muon fit procedure is the same as described above except that no pile-up template is used in the pile-up control region. In both control regions, the fit behaves as expected, giving a good description of the data and returning a higher fraction of fake muons.

Irreducible backgrounds
There are three additional sources of background that could not be constrained in the fits, either because their contribution is too small to be reliably determined, or because their characteristics are very similar to the signal. As no reliable data-driven determination is possible, these irreducible backgrounds are instead subtracted from the post-fit signal yield based on estimates derived from simulation.
The first source of irreducible background is B c → J/ψ + µ + X production. As both the J/ψ and third muon originate in the decay of the same hadron, this is considered to be a source of background, and is concentrated at low values of ∆R(J/ψ, µ), the region of particular interest in this analysis. The production fraction of B c [47] and branching fraction [31] of B c → J/ψµ + X mean that this background is expected to be very small and, in the absence of an identifiable signal in the data, the estimate is taken directly from simulation. A prediction of the B c contribution passing the event selection is calculated in each differential observable bin from both the Pythia8 and Herwig++ simulated samples. The average of the two predictions is then subtracted from the fitted signal yield to remove the background from B c decays.
Another source of signal-like muons in this analysis is semileptonic decays of c-hadrons. In the case where the c-hadron is produced in the decay of a b-hadron, these muons are counted as part of the signal, but all other c-hadrons are considered as a source of background muons. This population of events is again expected to be small, as it requires a displaced J/ψ produced in a b-hadron decay, as well as a separate c-hadron; production modes include separate g → bb and g → cc splittings in the same hard scatter, or double parton scattering producing bb + cc + X in a single proton-proton collision. To determine the rate of such events in data, it is possible to use the third-muon S d 0 fits: c-hadrons have shorter lifetimes than b-hadrons, producing a narrower S d 0 distribution. However, the contribution is found to be so small that it is not possible to reliably extract the c-hadron background with the number of events available in each fit. Instead the rate is derived from simulation, where it is found that approximately 5% of all third muons passing the selection originate from this source of background. To derive a correction for each bin of the measured kinematic distributions, the rate of b+c-hadron production is derived separately from Pythia8 and Herwig++ simulation, two event generators which have different models of the parton shower, underlying event and double parton scattering, and the average of these is used to remove the expected contribution from the fitted number of signal events.
The final background considered is from events where a charged pion or kaon traverses the detector to the muon spectrometer without interacting with the detector material or decaying, referred to as sail-through. This background has a signature very similar to the signal third muons due to the presence of a welldefined track in the inner detector and a well-matched track in the muon spectrometer. The estimate of this background is taken from simulation, where a reconstructed muon not associated with a muon in the MC event record is matched to a charged kaon or pion which has no decay vertex inside the detector. The third-muon yield in each differential bin is corrected, after fitting, by removing the expected number of sail-through events. This number is estimated from the Pythia8 simulation, by calculating the ratio of the numbers of sail-through and fake muons and using this to scale the background estimate from the number of fake muons fitted in data.
A summary of the relative sizes and distributions of all the backgrounds as a function of the ∆R(J/ψ, µ) observable is given in Figure 4.

Extrapolation to the full range of pseudo-proper decay time
Once the signal yield has been determined, a correction must be applied to extrapolate the results obtained in the third muon fit (for τ > 0.25 mm/c) to the full range of J/ψ pseudo-proper decay time. The rate of non-prompt third muons is expected to be constant as a function of τ for non-prompt J/ψ mesons, and this is confirmed in two tests. First in data, by repeating the third muon fit in different bins of τ and observing that the ratio of non-prompt J/ψ events to each of the third muon fit components remains constant. Second with simulation, where the rate of non-prompt muons is indeed found to be constant as a function of τ in events containing a J/ψ originating from a b-hadron decay.
The extrapolation to the full τ-spectrum is then performed by simply correcting the third muon yield found in the τ > 0.25 mm/c region by an extrapolation factor taken as the ratio of all non-prompt J/ψ events over the full τ range to the number of J/ψ events found above the τ > 0.25 mm/c requirement, as determined from the J/ψ fit. The correction is derived individually for each differential observable bin, based on the fit in that bin, and is typically a factor of around 1.9 with no significant kinematic dependence.

Resolution corrections
The final step in converting the measurement to a full particle-level cross section is to correct for the effects of detector resolution on the momentum and η of the muons. Detector resolution can have two main effects, causing events to migrate between bins, or in and out of the fiducial volume.
Migration between bins can occur when events passing both the particle-level and detector-level selections are reconstructed in different bins of the differential cross sections. In this analysis, the analysis bins are all significantly wider than the detector resolution, so migrations between bins are a very small effect.
Migrations in and out of the fiducial volume must also be considered. Detector resolution effects can move an event into or out of the fiducial region (for example, by migrating individual muons above or below the muon p T > 6 GeV requirement). These effects are again found to be very small.
The final resolution correction factor combines both the bin-to-bin and fiducial migration effects, and as both these effects are small, a simple correction to each bin is sufficient. These corrections are derived by simply taking the ratio of the particle-level distributions to the detector-level distributions in the Pythia8 sample, where each sample distribution is derived independently and events are not required to pass both selections simultaneously. This ratio is then applied to the data to correct from detector-level quantities to particle-level quantities. The size of the correction is typically less than 2%, although for certain kinematic bins it can be as large as 5%.

Systematic uncertainties
Various systematic uncertainties are accounted for in this measurement. They broadly fit into three categories: uncertainties associated with the muon efficiency corrections to data, J/ψ fit model systematic uncertainties and uncertainties in the background components in the fits. Each source of systematic uncertainty is considered individually by repeating the full differential analysis from the beginning with the systematic change implemented; the deviation from the nominal result is then taken as the uncertainty due to that change. All of the systematic uncertainties, apart from those concerning J/ψ modelling, are allowed to vary independently upwards and downwards; from these two changes, the largest deviation from the nominal result is symmetrised and assigned as the uncertainty in the measurement from that source. The total upward or downward systematic uncertainty in the cross section is then calculated as the sum in quadrature of all contributions in the upward or downward directions respectively. When calculating the uncertainty in the normalised differential cross sections, shape-only systematic uncertainties are derived by varying each source of uncertainty up and down, while preserving the overall normalisation of the distribution. The total upwards or downwards systematic uncertainty for each bin is then the quadrature sum of these shape-only systematic uncertainties in the upwards or downwards directions respectively.
The main sources of systematic uncertainty in the cross section measurement are shown in Figure 5 for the ∆R(J/ψ, µ) observable. The statistical uncertainty includes the statistical uncertainty of the data and the statistical uncertainty of the third-muon templates taken from simulation. The trigger uncertainty includes the uncertainties in the single-muon trigger efficiency, the dimuon efficiency correction, the nearby-muon correction and the simulation-based correction to the trigger efficiency. The muon reconstruction uncertainty includes the uncertainties in the muon reconstruction efficiency and the inner detector track reconstruction efficiency. The fit model uncertainty includes several variations of the functional forms used in the fit model and the fitting procedure. The BDT uncertainty includes several uncertainties on the simulation-derived templates and a data-driven uncertainty. The Fake J/ψ, B c and c-hadron uncertainties are included as described in Section 6.4. The uncertainties in the resolution correction, pile-up and pileup double counting, also described in Section 6.4, are omitted from the figure for clarity but are included in all following calculations. The luminosity uncertainty is constant and therefore also not included in the figure.
All these uncertainties are described in more detail in the remainder of this section. 6.1 Luminosity uncertainty A 1.9% uncertainty is assigned to the delivered integrated luminosity. The methodology used to determine this uncertainty is described in Ref. [48].

Muon trigger and reconstruction efficiency uncertainties
As described in Section 4, the trigger efficiency has several components: the single-muon efficiency; the efficiency to identify pairs of muons with opposite-charge and vertex requirements; and the efficiency to resolve pairs of nearby muons including a third-muon correction.
The single-muon efficiency is parameterised as a function of p T and q · η. Each bin in the trigger efficiency map has an associated uncertainty resulting primarily from the limited data available to derive the efficiency. A Gaussian p.d.f. is formed for each map bin, with the mean being given by the central value and the width by the uncertainty in that bin. Modified efficiency maps are formed by sampling randomly from the p.d.f. in each map bin; multiple maps can be created by repeating this procedure. The data are corrected for trigger efficiency using each modified map in turn to determine the number of events in the reweighted data set after applying the trigger efficiency correction. The distribution of the corrected data yields using all the maps is then fitted with a Gaussian function; the mean of this Gaussian function gives the nominal event yield after corrections, and the width gives the systematic uncertainty in the trigger efficiency weighting procedure.
The trigger efficiency map is also corrected by a factor derived from simulation to remove the bias in the maps due to the tag-and-probe threshold being different to that of the analysis selection, and a systematic uncertainty in this correction is defined by using the data-driven maps without the simulation-based correction applied and repeating the analysis. The change in the extracted differential cross sections from using the uncorrected maps is taken as a systematic uncertainty.
The efficiency to identify pairs of muons with opposite-charge and vertex requirements also has an uncertainty, and the impact of this is assessed separately by varying the nominal correction for this efficiency by one standard deviation when reweighting the data set. In addition, an uncertainty assigned to the efficiency correction for cases where the third muon is close to a trigger muon is assigned by varying the correction parameters within their uncertainties.
The uncertainty in the muon reconstruction efficiency is factorised into two components. A constant 0.5% uncertainty is included for the efficiency of reconstructing a muon track in the inner detector [42]. This is added coherently for each of the three muons in an event, resulting in a 1.5% systematic uncertainty. As in the case of the trigger map efficiency, the uncertainty in the muon reconstruction maps is defined by the spread of the data set yields when using a set of modified maps created by sampling from p.d.f.s defined in each bin of the efficiency map by the nominal efficiency and the uncertainty in that value.
The combined systematic uncertainty in the reconstruction and trigger efficiencies on the cross section is 10%. The relative normalised uncertainty can be as large as 10% in certain kinematic regions. This is dominated by the uncertainty in the dimuon correction to the single trigger efficiency except in the small ∆φ(J/ψ, µ) and ∆R(J/ψ, µ) regions where the systematic uncertainty due to a third muon being close to a triggered muon is largest.

J/ψ mass-lifetime model uncertainty
To assess potential bias in the fitted number of non-prompt J/ψ candidates extracted from the twodimensional dimuon fit due to the models used, various changes are made to the functions describing the fit components: • The J/ψ mass model is changed to a combination of two Gaussian functions.
• The non-prompt J/ψ pseudo-proper decay time model is changed to a double exponential function convolved with the same resolution function.
• The resolution model is changed to a single Gaussian function.
• The fixed parameters in the Crystal Ball function are varied by ±10%.
• The dimuon mass model of the prompt and double-sided fake J/ψ backgrounds is changed to be a first-order polynomial function.
• The single-sided fake J/ψ background dimuon mass model is changed to a first order polynomial function.
• The single-sided fake J/ψ pseudo-proper decay time model is changed to a double exponential function.
The analysis is repeated for each of the varied J/ψ models, with only one change at a time. When performing fits to lower statistics regions, the fits are naturally less constrained, leading to a potential double counting of statistical uncertainties when performing the varied fits. To avoid this, the envelope of the largest deviation from the nominal event yield when considering all the individual model changes is taken as the total systematic uncertainty for the J/ψ model uncertainty. This envelope is calculated separately in each differential bin.
The fit model variations give a 5% uncertainty in the total cross section and at most a 15% relative uncertainty in the normalised differential cross section. This is dominated by the change in decay time parameterisation of the non-prompt J/ψ mesons and the mass parameterisation of the single-sided fake J/ψ background.

Third-muon uncertainties
There are several contributions to the systematic uncertainty in the third-muon fit from the derivation of the templates and the estimation of the various background components.
There is a statistical uncertainty in the templates used for the third-muon fits, and the effect of this uncertainty is assessed through an ensemble test. Pseudo-templates are created by randomly sampling from the default templates, and the fit repeated using each of the modified pseudo-templates. This procedure provides a distribution of results for the extracted signal muon yield, and this distribution is fitted with a Gaussian function, the width of which is taken as a systematic uncertainty from the statistical fluctuations of the templates. The uncertainty is typically at the level of 1-2%. The shapes of the templates derived from simulation are found to have minimal dependence on the physics modelling in the simulation, due to the similar lifetimes of all b-hadrons and the very weak kinematic dependence of S d 0 , so no additional systematic uncertainty was required.
The contribution from events with a fake J/ψ candidate which is combined with a third muon from another proton-proton interaction is double-counted by the fitting process. It would be possible to remove this double-counting by creating a four-dimensional fit but the complexity and lack of stability in such a fit make this very impractical and, since the level of double counting is very small, this is not necessary. Instead, each kinematic bin can be corrected by determining the number of fake J/ψ events that are due to pile-up and removing that from the number of fake J/ψ events extracted from the fit. The number of fake J/ψ events that are due to pile-up is determined by fitting the ∆z 0 distribution of events outside the signal mass window. Approximately 2% of fake J/ψ events are found to be from pile-up events and the fake J/ψ contribution is of order 10% of the selected data, therefore the signal yield is typically altered by 0.2%, but varies depending on the kinematic region.
The fake muon template contains two types of background with similar behaviour: DIF and hadronic leakage, as explained in Section 5.2. Both backgrounds are due to the decays or interactions of charged pions and kaons. To assess the robustness of the simulation of the fake muon background, the templates used in the third muon fits are systematically altered. The BDT response is subtly different for pions and kaons but, due to the limited number of fake muon candidates in simulation, the two sources of fake muons are combined. The ratio of pions to kaons populating the fake muon templates is changed by ±50% to cover any effect of the combination. In addition, the fraction of pions and kaons decaying inside and outside the inner detector is varied by ±50%. Finally the ratio of DIF muons and hadronic leakage muons in the fake muon template is changed by ±50%. The BDT response is different for the two types of fake muons but the available number of simulated events do not allow separation of the two contributions in the template which is composed of approximately 75% DIF muons. The fractional composition of the fake muon template is changed to cover any mismodelling in simulation of the composition of the two sources of fake muons. The effects these systematic shifts have on the BDT template result in an uncertainty of less than 1% in the total cross section and up to 2% relative uncertainty in the differential normalised cross section in certain kinematic regions.
Furthermore, a data-driven uncertainty in the shape of the BDT distribution for fake muons is derived using two control regions. A pile-up region is defined by reversing the |∆z 0 | requirements, and a fake J/ψ region is formed by selecting for events outside the dimuon mass window. The third-muon BDT distribution in these events is fitted with the usual simulation-derived BDT templates for real and fake muons. The fitted real component is then subtracted from data, leaving an estimate of the BDT template for fake muons. Due to statistical limitations, this process can only be performed on the inclusive data set and not differentially. This data-driven template is then used instead of the simulation-derived fake muon templates for each differential fit. The difference with respect to the usual result is then used to form the systematic uncertainty. The envelope from the largest deviation in each bin from either the pile-up or fake J/ψ derived template is assigned as a systematic uncertainty in the modelling of fake muons in simulation. The corresponding uncertainty in the total cross section is 1% and can be as large as a 10% relative uncertainty in the normalised differential cross sections.
The uncertainty in the fake J/ψ background estimate is assessed by changing the normalisation of the fake J/ψ templates in the third-muon fits. The number of fake J/ψ events is derived from the two-dimensional dimuon fits, given by normalisation of the three fake J/ψ components within the dimuon signal mass window. Due to the dimuon pseudo-proper decay time requirement in the third-muon fit region, the single-sided fake J/ψ component is the only background to contribute non-negligibly to the high pseudoproper decay time region. The differential fits are repeated with fake J/ψ template normalisation altered by the uncertainty (±1σ) in the normalisation of the single-sided background. The systematic uncertainty assigned is less than 1% on the total cross section but as much as 3% relative uncertainty in the normalised differential cross sections.
For the uncertainty in the pile-up background, a similar procedure is used. The templates used in the thirdmuon fits are changed by altering the normalisation within the uncertainty. The uncertainty is derived from the Gaussian fit to ∆z 0 and is applied as a ±1σ variation to the nominal pile-up templates. The corresponding systematic uncertainties are very small, 0.4% in the total cross section and up to 2% in the normalised differential cross sections.
The B c background prediction is taken from the average of Pythia8 and Herwig++ simulation predictions. The difference between the two predictions is assigned as an uncertainty in the number of B c -hadrons in the data set. Similarly the number of events estimated to be from b+c-hadrons is taken from the average of Pythia8 and Herwig++ simulation predictions. A systematic uncertainty is assigned to this prediction, using the difference between Pythia8 and Herwig++ for the rate of b+c-events. The systematic uncertainty assigned to the B c -hadron prediction gives a small uncertainty of below 0.1% on the total cross section. Typically the relative uncertainty in the normalised differential cross section is also very small, except at low ∆R(J/ψ, µ) where it can be almost 2%. For the b+c-hadron estimation, the systematic uncertainty in the total cross section is 2% and the relative uncertainty in the normalised differential cross section can be as large as 9%.
The sail-through background prediction is difficult to constrain with data, so the simulation-based estimate is varied by ±50% to assign an uncertainty due to mismodelling. The corresponding uncertainty in the total cross section is 0.1% and the relative uncertainty in the normalised differential cross section is less than 1% everywhere.

Resolution correction uncertainty
The uncertainty in the factors used to correct for events migrating in and out of the acceptance is estimated based on the statistical uncertainty in the simulated sample used to derive the correction. The correction, described in Section 5.5, is derived from the ratio of events passing the particle-level selection to events passing the detector-level selection in simulation. The uncertainty in this ratio is calculated assuming these samples are uncorrelated. That is not entirely the case as they are derived from the same simulated sample, so this represents a conservative estimate of an uncertainty in this correction. The relative fractional uncertainty due to the resolution corrections for ∆φ(J/ψ, µ) is typically at the 1% level.

Results and interpretation
The total measured cross section in the fiducial region, defined in Section 3.3, is While leading order calculations are not expected to accurately reproduce this total cross section, the normalised differential cross sections are used to test the accuracy of a number of predictions. First, comparisons are made using Pythia8, exploring several different options for the g → bb splitting kernel, as this process dominates the region of particular interest: small-angle b-hadron production. The details of these settings are given in Ref. [49] and summarised in Table 1. The settings explore one of the main theoretical degrees of freedom when evaluating gluon-splitting to heavy quarks: whether to use the relative p T (Opt. 1 and 4) or mass (Opt. 5, 8, 5b and 8b) of the splitting to set the scale when determining the value of α s to be used in that splitting.
Option label

Descriptions
Opt. 1 The same splitting kernel, (1/2)(z 2 + (1 − z) 2 ), for massive as massless quarks, only with an extra β phase-space factor. This was the default setting in Pythia8.1, and currently must also be used with the MC@NLO [50] method. Opt. 4 A splitting kernel z 2 + (1 − z) 2 + 8r q z(1 − z), normalised so that the z-integrated rate is (β/3)(1 + r/2), and with an additional suppression factor (1 − m qq 2 /m 2 dipole ) 3 , which reduces the rate of high-mass qq pairs. This is the default setting in Pythia8.2. Opt. 5 Same as Option 1, but reweighted to an α s (km 2 qq ) rather than the normal α s (p 2 T ), with k = 1. Opt. 5b Same as Option 5, but setting k = 0.25. Opt. 8 Same as Option 4, but reweighted to an α s (km 2 qq ) rather than the normal α s (p 2 T ), with k = 1. Opt. 8b Same as Option 8, but setting k = 0.25. Table 1: Description of Pythia8 options. Options 2, 3, 6 and 7 are less well physically motivated and not considered here. The notation used is as follows: r q = m 2 q /m 2 qq , β = 1 − 4r q , with m q the quark mass and m qq the qq pair invariant mass.
The distributions of ∆R(J/ψ, µ), m(J/ψ, µ), ∆φ(J/ψ, µ) and p T /m are shown in Figure 6. In general, Pythia8 does not reproduce the shape of the angular distributions in data within uncertainties. The p Tbased scale splitting kernels (Opt. 1 and 4) generally give a better description of the low ∆R(J/ψ, µ) region, with the kernel of Opt. 4 performing the best. This region is more suppressed in the mass-based scale kernels, although this suppression is overcome when lowering the scale by a factor of four (Opt. 5b and 8b), with Opt. 8b in particular performing comparably to Opt. 4. This pattern is repeated across the other differential cross sections considered.     Figure 6: Measured normalised differential cross sections as a function of ∆R(J/ψ, µ), m(J/ψ, µ), ∆φ(J/ψ, µ) and p T /m compared to Pythia8 predictions with the different gluon-splitting parameter settings described in Table 1.
The bottom pane shows the ratio to data of the options that use a mass-based splitting scale. The middle pane shows the p T -based scale options, along with the mass-based option that agrees best with data (Opt. 8b) for comparison.
To extend the comparisons, the Herwig++ sample described in Section 3 is included. Two samples are also simulated using MadGraph5_aMC@NLOv2.2.2 [51] at leading order in QCD interfaced to the Pythia8.186 parton shower model. In both samples, the CKKW-L [52,53] merging procedure is applied, with a merging scale of 15 GeV. The A14 [54] tuned parameter settings are used together with the NNPDF2.3LO PDF set [55] for Pythia8. The EvtGen1.2.0 program [56] is used for properties of the b-hadron and c-hadron decays. Both samples are generated with up to one additional parton in the matrix element calculation. One is generated in the 5-flavour scheme where massless b-quarks are included in PDF (the NNPDF3.0NLO [57] PDF set is used) and as possible initial state partons in the matrix element calculation; this is referred to as 5fl in the figures. The other is generated in the 4-flavour scheme where b-quark mass is included in the calculation, but b-quarks are excluded from the PDF (the NNPDF3.0NLO 4fl PDF set is used), but are generated in the matrix element; this is referred to as 4fl in the figures.
Finally, a sample is simulated using the Sherpa2.1.1 [58] event generator. Matrix elements are calculated with two or three outgoing partons at leading order, and merged with the Sherpa parton shower [59] using the ME+PS@LO prescription [60]. The CT10nlo [61] PDF set is used in conjunction with dedicated parton shower tuning developed by the Sherpa authors. Only the 5-flavour scheme sample is considered for the Sherpa event generator.
Due to the computational demands in producing sufficient events in the three-muon fiducial volume, both the MadGraph5_aMC@NLO+Pythia8 and Sherpa samples were produced only to the level of two bhadrons. These predictions are corrected to a three-muon prediction using the transfer functions described in Appendix A; the theoretical uncertainty in these transfer functions completely dominates the statistical uncertainty of the samples, and the quadrature sum of these two uncertainties is shown as an uncertainty bar for the MadGraph5_aMC@NLO+Pythia8 and Sherpa samples in the figures. Figures 7, 8 and 9 compare the various predictions to the measured normalised differential cross sections. The best performing Pythia8 prediction ("Opt. 4") is included again for reference along with the Herwig++, MadGraph5_aMC@NLO+Pythia8 and Sherpa predictions. Considering first the ∆R(J/ψ, µ) distribution, agreement with data is slightly better for Herwig++ than Pythia8. The 4-and 5-flavour MadGraph5_aMC@NLO+Pythia8 predictions tend to sit on either side of the data, with the 4-flavour prediction generally being closer in shape. The 5-flavour scheme Sherpa prediction is similar in shape to the 5-flavour scheme MadGraph5_aMC@NLO+Pythia8 prediction, although the modelling is worse overall. The differences between 4-and 5-flavour predictions are enhanced in the high-p T ∆R(J/ψ, µ) distribution, with the 4-flavour remaining close to the data while both 5-flavour predictions move further away.
The general trends seen in ∆R(J/ψ, µ) are also visible in ∆φ(J/ψ, µ), while in ∆y(J/ψ, µ) the Mad-Graph5_aMC@NLO+Pythia8 and Sherpa predictions all provide a good description of the data, while Pythia8 and Herwig++ tend to fall away at high ∆y(J/ψ, µ). In y boost , a distribution expected to be more sensitive to the PDFs than to the details of event generators, a comparable picture is seen across all predictions.
Moving to the other kinematic variables, the low m(J/ψ, µ) region again provides discrimination between the 4-and 5-flavour predictions, and is one case where the 5-flavour MadGraph5_aMC@NLO+Pythia8 prediction lies closer to the data than the 4-flavour prediction. However, at high values of the ratio p T /m, the 4-flavour prediction clearly provides a much better description of the data than either of the 5-flavour predictions. Indeed, considering all distributions, the 4-flavour prediction from Mad-Graph5_aMC@NLO+Pythia8 provides the best description of the data overall. The predictions of Py-thia8 and Herwig++ are generally comparable, with indications that some further tuning could yield an  Figure 7: Measured normalised differential cross sections as a function of ∆R(J/ψ, µ), ∆φ(J/ψ, µ), lowand high-p T ∆R(J/ψ, µ). Comparisons are made with predictions of Pythia8 and Herwig++. Mad-Graph5_aMC@NLO+Pythia8 and Sherpa predictions are also compared having been corrected from the twob-hadron production to the three-muon final state via transfer functions (indicated with *). There is no entry for these predictions in the lowest low-p T ∆R(J/ψ, µ) bin as the transfer function is not defined in this bin. The Pythia8 "Opt. 4" gluon-splitting parameter settings are described in Table 1. The ratio to data of the Mad-Graph5_aMC@NLO+Pythia8 and Sherpa (middle pane), and Pythia8 and Herwig++ (bottom pane) are also shown. and Sherpa predictions are also compared having been corrected from the two-b-hadron production to the threemuon final state via transfer functions (indicated with *). The Pythia8 "Opt. 4" gluon splitting parameter settings are described in Table 1. The ratio to data of the MadGraph5_aMC@NLO+Pythia8 and Sherpa (middle pane), and Pythia8 and Herwig++ (bottom pane) are also shown.  Table 1.
The ratio to data of the MadGraph5_aMC@NLO+Pythia8 and Sherpa (middle pane), and Pythia8 and Herwig++ (bottom pane) are also shown.
improved description of the data. It should be noted, however, that the theoretical uncertainties in these predictions were not evaluated for this study.

Conclusion
A measurement of the production of b-hadron pairs in the B(→ J/ψ[→ µ + µ − ]+X)B(→ µ+X) decay mode has been presented, using an integrated luminosity of 11.4 fb −1 of proton-proton collisions at √ s = 8 TeV recorded by the ATLAS detector at the LHC. A fiducial volume is defined by requiring two muons from the decay of a J/ψ, which itself originates from the decay of a b-hadron (including feed-down), and a third muon from the decay of a different b-hadron (including cascade decays). All muons are required to have p T > 6 GeV, the two muons from the J/ψ must have |η| < 2.3 and the third muon must have |η| < 2.5. The total cross section in this fiducial volume is measured to be 17.7 ± 0.1(stat) ± 2.0(syst) nb.
Normalised differential cross sections were measured for ten kinematic observables designed to probe the underlying mechanisms of b-hadron production. These include a determination of nearby b-hadron pair production down to zero opening angle. This region is particularly sensitive to the production of b-quarks via gluon-splitting, which suffers from large theoretical uncertainties. Constraining this region is vital for many LHC measurements, including H → bb in the V H production mode.
Predictions for the three-muon cross section are compared to the data. Several choices for the g → bb splitting kernel in Pythia8 are considered, along with nominal predictions from Herwig++, Mad-Graph5_aMC@NLO+Pythia8 and Sherpa. These cover a range of different matrix element calculations and parton shower models, as well as 4-and 5-flavour treatments of the calculation. Of the Py-thia8 options considered, the p T -based splitting kernel gives the best agreement with data, performing comparably to Herwig++. The best overall agreement with data comes from the 4-flavour Mad-Graph5_aMC@NLO+Pythia8 prediction, which outperformed Pythia8 and Herwig++, and the 5-flavour predictions from MadGraph5_aMC@NLO+Pythia8 and Sherpa, though it should be noted that the associated theoretical uncertainties must be evaluated before the level of agreement with data can be fully quantified.
In conclusion, the measurements presented here provide a new test of QCD calculations, motivate the choices of predictions used to model b-hadron production, and further tuning of parameters in the calculations to improve the description of the data.              [24] ATLAS Collaboration, Measurement of the b-hadron production cross section using decays to D * µ − X final states in pp collisions at √ s = 7 TeV with the ATLAS detector, Nucl. Phys. B 864 (2012) 341, arXiv: 1206.3122 [hep-ex].

A Transfer Functions
The cross sections measured in this analysis are computationally costly to simulate using available event generators, due to the low production rate of b-hadrons from inclusive simulation, and the low branching ratio for the three-muon final state. This is potentially a limiting factor in the use of these results for testing and tuning theory predictions. To get around this problem, a set of transfer functions are derived, which can be used to translate the predictions for a less computationally intensive inclusive two-b-hadron final state into the desired three-muon final state. These transfer functions therefore capture the average kinematics of b-hadron decays.
The transfer functions are calculated using simulation, by taking the ratio of a three-muon prediction to a two-b-hadron prediction. The three-muon prediction is calculated for the fiducial volume defined in Section 3.3, and an equivalent two-b-hadron fiducial volume is chosen such that the shape of the ∆R(bhadron,b-hadron) distribution is as similar as possible to the shape of the ∆R(J/ψ, µ) distribution for the three-muon prediction. The dependence of the ∆R(b-hadron,b-hadron) distribution on the definition of the two-b-hadron fiducial volume is studied using the Pythia8 and Herwig++ event generators described in Section 7, and based on requiring two b-hadrons with a minimum p T requirement and within a defined rapidity region. The dependence on the rapidity requirement was found to be small, so |y| < 2.4 was chosen for both b-hadrons as it lies between the η requirements on muons in the three-muon selection. The dependence on p T is more significant, and the optimal value of p T > 15.5 GeV was chosen based on a fine scan of p T selections.
While these requirements give reasonable agreement between three-muon and two-b-hadron selections for angular variables, the p T and mass of the two-b-hadron system requires further treatment in order to reproduce more closely the equivalent three-muon distribution. Due to the additional particles produced in the decay, the muons carry only a fraction of the momentum of the b-hadrons, so the b-hadron momenta are scaled down in order to match the equivalent three-muon quantity more closely. While no choice of value for a simple scaling parameter gives good agreement across all p T and mass values, a scaling factor of 1.75 provides a reasonable compromise. This scaling is only used for mass and p T . The two-b-hadron fiducial volume is therefore defined by requiring two weakly decaying b-hadrons with p T > 15.5 GeV and |y| < 2.4. Both the mass and p T of the combined two-b-hadron system are scaled down by a factor of 1.75.
Transfer functions are then derived for each of the measured differential cross sections using the six different Pythia8 configurations described in Section 7, and Herwig++. For each event generator, predictions for the two-b-hadron fiducial definition and the three-muon fiducial definition are derived independently, both are normalised to unit area, and the ratio of three-muon to two-b-hadron cross section is taken. The six Pythia8 and one Herwig++ transfer functions for each distribution are all slightly different, and used to define an envelope. The geometric centre of the envelope is taken as the default value for the transfer function in each bin, and the spread defines an uncertainty around that central value.
The resulting transfer functions for m(b-hadron,b-hadron) and ∆R(b-hadron,b-hadron) are shown in Figure 10. It can be seen that the transfer function is relatively constant in ∆R(b-hadron,b-hadron), but shows more structure in mass around the nearby (low mass) and back-to-back (higher mass) kinematic edges. The kinematic requirements on the b-hadrons result in no event populating the lowest ∆R(b-hadron,bhadron) bin for the p T < 20 GeV selection, and so the transfer function is not defined in this one bin.
The transfer functions are applied as a bin-by-bin multiplicative factor to two-b-hadron predictions from MadGraph5_aMC@NLO+Pythia8 and Sherpa, before these predictions are normalised to unit area. The uncertainty in the transfer functions is added in quadrature to any statistical uncertainty in the predictions from these event generators, and completely dominates those statistical uncertainties.