Measurements of observables sensitive to colour reconnection in tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t{\bar{t}}$$\end{document} events with the ATLAS detector at s=\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s} =$$\end{document} 13 TeV

A measurement of observables sensitive to effects of colour reconnection in top-quark pair-production events is presented using 139 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {fb}^{-1}$$\end{document} of 13 TeV proton–proton collision data collected by the ATLAS detector at the LHC. Events are selected by requiring exactly one isolated electron and one isolated muon with opposite charge and two or three jets, where exactly two jets are required to be b-tagged. For the selected events, measurements are presented for the charged-particle multiplicity, the scalar sum of the transverse momenta of the charged particles, and the same scalar sum in bins of charged-particle multiplicity. These observables are unfolded to the stable-particle level, thereby correcting for migration effects due to finite detector resolution, acceptance and efficiency effects. The particle-level measurements are compared with different colour reconnection models in Monte Carlo generators. These measurements disfavour some of the colour reconnection models and provide inputs to future optimisation of the parameters in Monte Carlo generators.


Introduction
Monte Carlo event generators are of the utmost importance for experimental particle physics as a tool providing a source of simulated events for analysis design, signal and background modelling, unfolding detector effects, and training of multivariate discriminants. A rich set of measurements provided by the Large Hadron Collider (LHC) experiments can be used to probe both soft and hard quantum chromodynamics (QCD), and thereby test many features of the Monte Carlo (MC) event generators. Even though the overall performance of the event generators is good, the understanding of some phenomena is insufficiently understood or can be significantly improved. One of these is the modelling of colour reconnection (CR). None of the models describing CR in hadron collisions is derived from first principles of QCD.
Measurements are required to exclude certain models or to constrain their parameter space.
The simulation of the hard-scattering process is accompanied by the so-called underlying event (UE) consisting of multiple parton interactions (MPIs), initial-and final-state radiation (ISR and FSR), and remnants of the interacting protons. According to QCD the strong interactions between quarks and gluons are related to their colour charges. In order to trace the colour information in an event, MC event generators make use of the leading-colour (LC) approximation [1,2]. Here, each quark is colour-connected to only one other parton (quark or gluon), while gluons are each connected to two other partons, since they carry both a colour and an anticolour charge. Additional colour lines are added to the final state by each MPI system, leading to a non-negligible possibility of phase-space overlaps between final states from different MPI systems and the hard-scattering process [3,4]. Colour reconnection [5] models introduce a mechanism for reassigning colour connections between partons to resolve ambiguities between overlapping colour lines in space and time, and thus allow colour topologies different from those in the simple LC approach. The interactions and interference among the different processes leading to a given generated event are poorly understood and several models are implemented in the three parton-shower MC programs, Pythia 8 [6], Herwig 7 [7], and Sherpa 2 [8].
In general, these models define some momentum-based measure of the colour configuration of partons in the event. In their workflow, the programs execute changes to the colour connections to reduce the defined measure and thereby correct for deficiencies of the LC approximation and allow colour connections between different MPIs. The modified colour configuration effectively leads to a reduction of the energy available to produce new particle-antiparticle pairs in the space between partons. Thus, the particle multiplicity is reduced and the produced particles carry, on average, more momentum than they would have in events without CR. Details can be found in Refs. [4,9] and references therein.
An unresolved question is the involvement of the top quark and its decay products in CR. The typical hadronisation scale is about 1 fm, while the average transverse decay length of top quarks is 0.2 fm [10]. Thus, it is possible that top quarks interact with the colour fields stretching between partons in the final state [11]. However, in the current implementation of CR models, the reconnection algorithm is always performed before the top quark decays, i.e. only the top quark itself takes part in the CR machinery, not its decay products.
In recent top-quark mass ( top ) measurements, which reach a precision well below 1 GeV [12,13], the uncertainty in the CR modelling is estimated to be as large as 400 MeV [13]. Thus, this uncertainty is becoming increasingly important as direct measurements of top become more precise.
Several measurements sensitive to the modelling of the UE and (partially) to CR in different scattering processes, e.g. minimum-bias events [14], boson production [15][16][17] and top-quark-antiquark (¯) production [18] events, have been performed at the LHC. However, dedicated measurements of CR in top-quark events have not been performed yet. This is particularly relevant since a well-defined prescription of the uncertainty due to CR is needed for top-quark mass measurements.
This article presents measurements of normalised differential cross-sections of¯production as a function of observables sensitive to CR effects, namely the charged-particle multiplicity outside of jets ( ch ), excluding leptons from the top-quark decay; the scalar sum of the transverse momenta of these charged particles ( ch T ); and the double-differential measurement of the two quantities ( ch T in bins of ch ). The last of these measurements contains partial information about the 'standard' observable which is often used in CR and UE measurements: the average transverse momentum of charged particles as a function of the charged-particle multiplicity. The observables are corrected for tracks from additional inelastic collisions in the same or neighbouring bunch crossings (pile-up), for tracks from secondary vertices, and for the effect of tracking inefficiencies. The distributions of the corrected observables are unfolded to the particle level in a fiducial phase space chosen to maximise the sensitivity to CR effects. The fiducial cross-section is obtained by integrating the differential cross-section.
In proton-proton collisions, the dominant top-quark production process is¯production via the strong interaction. As a result of top-quark and top-antiquark decays, two on-shell bosons emerge in¯events. In order to select a nearly background-free sample of¯events, this analysis targets events in which one boson decays in the → channel and the second boson decays via → . 1 Excluding dilepton final states with two charged leptons of the same flavour ( + − or + − ) essentially eliminates the background arising from +jets production. Thus, the only relevant backgrounds are single top-quark production in the channel and events where jets are misidentified as leptons. The experimental signature of the selected events is given by exactly one prompt isolated electron, exactly one prompt isolated muon, and two or three hadronic jets -two of which each originate from a -quark and are called -jets. The selected events were recorded with the ATLAS detector [19] during Run 2 of the LHC in proton-proton collisions at a centre-of-mass energy of 13 TeV. The resulting data set corresponds to an integrated luminosity of 139 fb −1 .

Colour reconnection in¯events
In the following, the CR models implemented in the two multipurpose MC generators, Pythia 8 and Herwig 7, are described. The Sherpa 2 event generator has an MPI model based on the Pythia one, but without any CR model. CR is currently being implemented in the upcoming Sherpa 3. 1 Events involving → decays with a subsequent decay of the -lepton to either or are included in the signal.

Colour reconnection models in Pythia 8
Pythia 8 [20] uses the so-called Lund string model for the hadronisation, which is based on the observation that at large distances the potential energy of colour sources increases linearly with their separation. As soon as the potential energy reaches values similar to hadron masses, the string breaks and creates a new quark-antiquark pair. The two string segments then begin to stretch and break again until all the energy has been converted into quark-antiquark pairs connected by short string segments, identified as hadrons.
Three general CR models are currently implemented in Pythia 8:

MPI-based model (CR0)
Starting from the lowest-T interaction in a set of MPIs the reconnection probability for an MPI-system with transverse momentum T is given by: where rec is a phenomenological parameter with a value in the 'colour reconnection range', i.e. 0 ≤ rec ≤ 10, and T0 is an energy-dependent dampening parameter used for MPIs. The latter parameter regularises the partonic cross-section to avoid divergence at low T and it depends on the centre-of-mass energy, √ = CM : where ref T0 is the value of T0 at a reference energy ref CM , and pow CM is a tunable parameter. For each MPI system that undergoes a reconnection, partons from lower-T MPI systems are added to the colour dipoles defined by the higher-T MPI system, in a way that minimises the total colour-string length, defined as the sum of the logarithm of normalised squared invariant masses of all two parton combinations in the event. This model is used in all ATLAS simulated samples generated by Pythia 8 with the A14 set of tuned parameters [21].

QCD-based model (CR1)
This model is based on the CR0 model but uses a more complete treatment of the QCD multiplet structure which allows reconnections of dipoles to produce structures with three (anti)colour indices, referred to as 'junctions', thereby enhancing the production of (anti)baryons. Only reconnections which lower the string length are performed.

Gluon-move model (CR2)
In the gluon-move model, reconnections are performed in the same way as in the CR0 model, but only gluons are considered for reconnection. For each gluon, all the reconnections to all MPI systems are considered (not only the ones for softer MPIs), so in principle the colour connections from the hard interaction can be affected more significantly than in the default model. Again, only reconnections which lower the string length are performed.
In all three models the top-quark lifetime is long enough to avoid colour reconnection of the top-quark decay products with the rest of the event. Only the top quark itself is involved in the CR process and the top-quark decays are performed after the colour reconnections have taken place. Thus, there is also no CR among the top-quark decay products. The first tuning of these models to ATLAS data is documented in Ref. [22] and a recent tuning to CMS data is presented in Ref. [23].
Additionally, a class of CR models that is designed to affect the top-quark decay products separately is considered. These models are described in detail in Ref. [11]. Two different sets of gluons are considered: gluons radiated from the top-quark decay products ({ }) and gluons from the rest of the event ({ }).
Performing colour exchanges in different ways leads to five models: 1. Forced random (TCR1) A gluon from { } is forced to exchange colours with a random gluon from the other set, { }.

Forced nearest (TCR2)
A gluon from { } is forced to exchange colours with the gluon from { } that minimises 2 ( , ).

Forced farthest (TCR3)
A gluon from { } is forced to exchange colours with the gluon from { } that maximises 2 ( , ).

Forced smallest (TCR4)
A gluon from { } is forced to exchange colours with the gluon from { } for which the change in (available rapidity range of particle production) is smallest.

Smallest
(TCR5) This is the same as the previous model, except that gluons exchange colours only if Δ < 0.
For all of these models a phenomenological strength parameter is used, where each gluon from the set { } is tested for reconnection with a probability .

Colour reconnection models in Herwig 7
Herwig 7 uses the cluster model [24] to model the hadronisation of quarks into hadrons. After the parton-shower calculation, gluons are split into quark-antiquark pairs, and a cluster is formed from each colour-connected pair of quarks. Before hadrons are produced from clusters, CR can modify the configuration of the clusters.
Three general CR models are currently implemented in Herwig 7:

• Plain CR
In the plain CR model [25], quarks from two clusters can be rearranged into two alternative clusters. A cluster is randomly chosen from the list of clusters and compared with all other clusters on the list. The rearrangement is performed for the combination of clusters with the lowest sum of the masses. The reconnection is accepted with a probability R , which is the only parameter of this model. This model typically leads to clusters with smaller invariant mass and thus the overall activity in the UE is reduced. This model is used in all ATLAS samples generated by Herwig 7, with different versions using different sets of tuned parameters ('tunes') for the UE and the CR. The H7-UE-MMHT tune is used in Herwig 7.0.4, the H7.1-Default tune is used in Herwig 7.1.3, and the H7.2-Default tune is used in Herwig 7.2.1. Also, in the latest version the MPI model was improved [26].
• Statistical CR The statistical CR model uses the Simulated Annealing algorithm [27] to find the configuration of clusters that results in the absolute lowest value of the colour length, defined as the sum of the squared invariant masses of all clusters in the event. In this model the only possible reconnections which are not allowed are those connecting quarks and antiquarks produced in the non-perturbative splitting of gluons, which would lead to the production of a colour-singlet object [28].
• Baryonic CR In contrast to the other two models, the baryonic CR model [29] uses a simple geometric picture of nearest neighbours, and the algorithm tries to find quarks that populate approximately the same phase-space region, based on their rapidity . Suitable combinations of clusters are then categorised as mesonic or baryonic, and reconnected with probabilities R or B respectively. This model is only available since Herwig 7.1.5.

ATLAS detector
The ATLAS detector [19] at the LHC covers nearly the entire solid angle around the collision point. 2 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range | | < 2.5. The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [30,31]. It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to | | = 2.0. The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range | | < 4.9. Within the region | | < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering | | < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within | | < 1.7, and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 Tm across most of the detector. A set of precision chambers covers the region | | < 2.7 with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range | | < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [32]. The first-level 2 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of Large sets of events simulating¯and background processes were produced with event generator programs based on the Monte Carlo (MC) method to model the recorded and selected data. After event generation, the response of the ATLAS detector was simulated using either the Geant4 toolkit [40] with a full detector model or a fast simulation [41] which employs a parameterisation of the calorimeter response [42]. Samples using the fast simulation were employed to estimate systematic uncertainties associated with the event generators.
To account for pile-up effects, simulated minimum-bias interactions were overlayed on the hard-scattering events and the resulting events were weighted to reproduce the observed pile-up distribution. The minimumbias events are simulated using Pythia 8.186 [6] with a set of tuned parameters called the A3 tune [43] and the NNPDF2.3lo PDF set [44]. To account for differences in the pile-up profile between individual years in a suitable way, simulated events representing the detector conditions in the years 2015 and 2016 were weighted to reproduce the distribution of the number of interactions averaged over each bunch crossing observed in that data-taking period, while simulated events modelling the detector conditions in the years 2017 and 2018 were weighted to reproduce the actual distribution of the number of interactions per bunch crossing observed in that time. Finally, the simulated events were reconstructed using the same software as applied to the collision data, the same event selection requirements were applied and the selected events were passed through the same analysis chain. More details about the samples of simulated events are provided in the following subsections.

Simulation of¯, and¯+ production
Samples of events simulating¯and production were generated with the Powheg Box v2 [45][46][47][48][49][50][51] next-to-leading-order (NLO) matrix-element generator, setting top = 172.5 GeV. The NNPDF3.0nlo PDF set [52] implementing the five-flavour scheme was used. Parton showers, hadronisation and the underlying event were modelled using Pythia 8.230 with the A14 tune [21] and the NNPDF2.3lo PDF set. The Powheg Box + Pythia generator set-up implements a matching scheme for their modelling of hard emissions that avoids double counting them. The matrix-element-to-parton-shower matching is steered by the ℎ damp parameter, which controls the T of the first additional gluon emission beyond the LO Feynman diagram and therefore regulates the high-T emission against which the¯system recoils. Harder emissions in the parton shower are then vetoed. Event generation was run with ℎ damp = 1.5 × top [53]. The renormalisation and factorisation scales were set dynamically on an event-by-event basis, namely to T ( ) for¯production and to r = f = top for production. Decays of bottom and charm hadrons were simulated using the EvtGen 1.6.0 program [54]. When generating events, the diagram-removal scheme [55] was employed to handle the interference with¯production [53].
The cross-section of¯production was scaled to the next-to-next-to-leading-order (NNLO) predictions by the Top++ 2.0 program (see Ref. [56] and references therein), which include the resummation of next-to-next-to-leading logarithmic (NNLL) soft-gluon terms. A value of (¯) = 832 +47 −51 pb was used, assuming top = 172.5 GeV. The total cross-section for production was computed at NLO in QCD with the addition of third-order corrections of soft-gluon emissions by resuming NNLL terms [57] and the simulated event sample was scaled to ( +¯) = 71.7 ± 3.8 pb.
The production of a¯pair in association with a vector boson was modelled at NLO using the MadGraph5_aMC@NLO 2.3.3 generator [58] with the NNPDF3.0nlo set of PDFs. The events were interfaced to Pythia 8.210 for modelling of the parton showers, using the A14 tune and the NNPDF2.3lo PDF set. The decays of bottom and charm hadrons were simulated using EvtGen 1.2.0. Cross-sections were calculated at NLO QCD and NLO EW accuracy using MadGraph5_aMC@NLO as reported in Ref. [59]. The predicted values are 0.88 +0.09 −0.11 pb and 0.60 +0.08 −0.07 pb for¯+ and¯+ , respectively, where the uncertainties were estimated from variations of s and the renormalisation and factorisation scales.

Simulation of +jets and diboson production
Backgrounds from boson production in association with jets, especially heavy-flavour jets, and on-shell diboson production ( , and ) were simulated with the Sherpa 2.2.1 generator [8]. To simulate +jets (diboson) production, NLO-accurate matrix elements for up to two additional partons (one additional parton) and LO-accurate matrix elements for up to four (three) additional parton emissions were calculated with the Comix [60] and OpenLoops 1 [61][62][63] libraries. The matrix elements used for diboson production contain all diagrams with four electroweak vertices. The default Sherpa parton shower [64], based on Catani-Seymour dipole factorisation, and the cluster hadronisation model [65] were used. The samples employ the dedicated set of tuned parameters developed by the Sherpa authors and the NNPDF3.0nlo PDF set. The NLO matrix elements of a given jet multiplicity were matched to the parton shower using a colour-exact variant of the MC@NLO algorithm [66]. Different jet multiplicities were then merged into an inclusive sample using an improved CKKW matching procedure [67,68] which was extended to NLO accuracy using the MEPS@NLO prescription [69]. The merging threshold was set to 20 GeV.
The +jets sample was normalised to an NNLO prediction [70] of the total cross-section, obtained with the FEWZ package [71]. Considering the dilepton event selection used in the analysis, the simulation efficiency of the +jets samples was increased by forcing the bosons to decay into a pair of charged leptons ( + − , + − or + − ). In diboson events, at least one of the two bosons was required to decay leptonically. The simulated diboson event samples were normalised to the total cross-sections provided by Sherpa at NLO in QCD.

Object reconstruction and definitions of observables
Event candidates are identified by means of isolated electrons and muons, and jets, some of which are -tagged. In the following, the definitions of these reconstructed objects at detector level, together with the corresponding objects reconstructed at particle level, as well as the event selection are described. In addition the analysis observables are introduced.

Detector-level object definitions
Vertices are reconstructed from at least two ID tracks with T > 500 MeV. The primary vertex of an event is defined as the vertex with the highest sum of 2 T of all associated ID tracks [72]. Tracks are reconstructed from the measured positions of 'hits' in the ID caused by the passage of charged particles [73]. The track reconstruction consists of an iterative track-finding algorithm seeded by combinations of at least three silicon-detector hits followed by a combinatorial Kalman filter [74] to build track candidates based on hits compatible with the extrapolated trajectory. Ambiguities between the track candidates are then resolved by applying quality criteria to suppress combinations of hits unlikely to originate from a single charged particle. Finally, a high-resolution track fit is performed, resulting in more precise track reconstruction parameters. Track candidates are required to pass the Tight Primary selection [75]. In particular, they must have T > 500 MeV and be within | | < 2.5. At least one hit in the two innermost layers is required if the extrapolated track crosses the sensitive region of an active sensor module. The number of hits in the pixel and SCT detectors must be larger than 9 for | | ≤ 1.65 or larger than 11 for | | > 1.65, with no more than 2 missing SCT hits on a track if the respective SCT modules are operational. Additionally, track-to-vertex-matching criteria are applied to mitigate the effect of pile-up tracks and improve the rejection of tracks from secondary interactions: the transverse impact parameter calculated relative to the beam line must satisfy | 0 | < 0.5 mm, and the longitudinal impact parameter calculated relative to the event primary vertex must satisfy | 0 sin( )| < 0.5 mm, where is the polar angle of the track.
Electron candidates are reconstructed from clusters of energy deposited in the electromagnetic calorimeter with a matched track reconstructed in the ID [76]. The pseudorapidity of clusters, cluster , is required to be in the range | cluster | < 2.47. However, clusters are excluded if they are in the transition region 1.37 < | cluster | < 1.52 between the barrel and endcap electromagnetic calorimeters. A likelihood-based method is used to simultaneously evaluate several properties of electron candidates, including shower shapes in the electromagnetic calorimeter, track quality, and transition radiation produced in the TRT. Electron candidates must have T > 25 GeV and satisfy the Tight identification criteria as defined in Ref. [76]. Electrons are required to be isolated from other activity by applying the Tight requirements on the sum of nearby energy in the calorimeter and the sum of the momenta of nearby tracks [76].
Muon candidates are reconstructed by combining tracks in the MS with tracks in the ID [77,78] and must have | | < 2.5, T > 25 GeV, satisfy the Medium identification criteria, and pass the Tight isolation requirements as defined in Ref. [78].
The tracks associated with electron and muon candidates must satisfy slightly different requirements on 0 , i.e. 0 significance requirements of | 0 / 0 | < 5.0 for electrons and | 0 / 0 | < 3.0 for muons are imposed. The lepton energy and momentum scales and resolutions are calibrated in data by using → [76] or → events and inspecting the dilepton mass spectrum near the -boson peak.
Jets are reconstructed from particle-flow objects [79] with the anti-clustering algorithm [80, 81] using a radius parameter of 0.4. Their energy is calibrated [82], and they must fulfil T > 25 GeV and | | < 2.5. Jets with T < 60 GeV and | | < 2.5 are required to pass a requirement on the jet-vertex-tagger (JVT) discriminant [83] to suppress jets originating from pile-up collisions. The JVT-discriminant is required to be above 0.5, which corresponds to an efficiency of 92% for non-pile-up jets. Jets containing -hadrons are identified ( -tagged) with the DL1r algorithm [84], which uses a deep feed-forward neural network with several -tagging algorithms as input [85]. The algorithms exploit the impact parameters of charged-particle tracks, parameters of reconstructed secondary vertices and the topology of -and -hadron decays inside the jets. In order to strongly reduce the misidentification rate of -jets and light-flavour ( , or ) jets, a specific working point of the DL1r algorithm was defined and calibrated, using Run 2 data and the standard calibration technique, i.e. a likelihood-based method in a sample highly enriched in¯events [84]. The -tagging efficiency for jets that originate from the hadronisation of -quarks is 70% in simulatedē vents.
To avoid double-counting objects satisfying more than one set of selection criteria, a procedure called overlap removal is applied. Reconstructed objects are removed in the following order: electrons sharing an ID track with a muon; jets within Δ = 0.2 of an electron, thereby avoiding double-counting electron energy deposits as jets; electrons within Δ = 0.4 of a remaining jet, to reduce the impact of non-prompt electrons; jets within Δ = 0.2 of a muon if they have less than three associated tracks; muons within Δ = 0.4 of a remaining jet, reducing the rate of non-prompt muons.
In order to build observables in this analysis, tracks within Δ = 0.4 of a jet or within Δ = 0.01 of the selected electron or muon are discarded.

Particle-level object definitions
Particle-level objects are defined in simulated events by using only stable particles with a mean lifetime > 30 ps, using the following criteria.
Charged particles are required to have T > 500 MeV and | | < 2.5. Charged particles that are generated through interactions with matter, referred to as secondary particles, are excluded.
Electrons and muons are required not to originate from a generated hadron in the MC event, either directly or through a -lepton decay. This ensures that they originate from a -boson decay without requiring a direct match. The four-momenta of the bare leptons are modified (dressed) by adding the four-momenta of prompt photons within Δ = 0.1. The dressed leptons must then have T > 25 GeV and | | < 2.5.
Jets are clustered using all stable particles, excluding those used in the definition of dressed leptons and prompt neutrinos, using the anti-algorithm with a radius parameter of 0.4. Jets are required to have T > 25 GeV and | | < 2.5 and are identified as -jets via ghost-matching [86] to a -hadron with T > 5 GeV.
To avoid double-counting, particle-level objects are subject to overlap removal criteria. Dressed muons and electrons with a separation Δ < 0.4 from a jet are excluded.
In order to obtain a track collection outside of jets, charged particles within Δ = 0.4 of a jet or within Δ = 0.01 of the electron or muon from the top-quark decay are excluded.

Event selection
Events are required to have at least one primary vertex, exactly one electron and one muon with T > 25 GeV for the 2015 data-taking period and T > 27 GeV for data taken from 2016 to 2018 to match the trigger thresholds. Furthermore, the events are required to contain two or three jets with T > 25 GeV. Exactly two of the jets must be -tagged jets. The third jet is mainly accepted in order to increase the statistics of the selected sample while keeping the modelling uncertainties manageable. Hadronic resonances are suppressed by requiring the dilepton invariant mass to satisfy ℓℓ > 15 GeV. Finally, events with track multiplicities trk,out (defined below) above 100 are excluded because the dominant contribution is due to pile-up tracks. 3 Events with an opposite-sign (OS) pair define the nominal sample, whilst events with a same-sign (SS) pair are used to estimate the background from fake leptons. The particle-level events are required to pass the same selection requirements as the data selection but applied to the particle-level objects. In the case of the lepton T the lower requirement of 25 GeV is applied.

Definition of the observables at particle level and detector level
Particle-level observables Particle-level observables are built from charged particles, as defined in Section 5.2. The three observables are defined as: • ch : Charged-particle multiplicity with the binning defined as [0, 10, 20, 30, 40, 50, 60, 80, 100].
• ch T : Scalar sum of the transverse momenta of the selected charged particles with the binning defined as [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, ∞) in units of GeV. This observable is highly correlated with the charged-particle multiplicity, but contains additional information about the kinematic configuration of the event.
• ch T in bins of ch : This two-dimensional observable is built from the above two variables, as the ch T in bins of ch with the binning defined in Table 1.

Detector-level observables
At detector level, the observables are calculated using reconstructed tracks with the same binning as at particle level. The corresponding variables are then defined as: • trk,out : Number of selected tracks outside of jets.

Background estimation and corrections to observables
Two different sources can lead to background contamination in the measured observables: event-based contributions from background processes with a final state similar to¯dileptonic events, discussed in Section 6.1, and track-based contributions originating from sources other than the hard-scattering process, discussed in Section 6.2. Since the track-based background has to be subtracted from the measured variables trk,out and trk,out T the following procedure is applied: • The pile-up and secondary-particle contribution per event is estimated in a stochastic way. This procedure is detailed in Section 6.2.
• A tracking-efficiency correction (TEC) is applied to the background-subtracted observables to ease the unfolding. This procedure is detailed in Section 6.3.

Event-based backgrounds
After event selection, several background processes with a final state similar to¯dileptonic events pass the selection. Sources that contain top quarks contribute to the background, with events containing a single top quark being the dominant contribution. The production of¯+ events, with being either a or a boson, contributes an irreducible background. Events that contain either two electroweak bosons or a boson produced in association with jets, (→ )+jets, can be misidentified as a¯dilepton event. Finally, there is a contribution from multĳet processes when at least one of the reconstructed leptons is wrongly reconstructed as an isolated lepton, arising from either a heavy-flavour hadron decay, an electron from a photon conversion, a jet misidentified as an electron, or a muon produced from decay-in-flight of a pion or kaon. This last category is collectively called the fake-lepton background.
The fake-lepton background contribution is estimated using a partially data-driven approach. Events with an enriched fake-lepton contribution are obtained from the observed same-sign sample. The total number of fake-lepton background events, fake , is given by: where data,SS is the number of same-sign events in data, prompt,SS is the estimated number of prompt same-sign events from simulation, and is the ratio of opposite-sign ( fake,OS ) to same-sign ( fake,SS ) events with fake leptons in simulation, with being 2.4±0.1. The kinematic distributions of the fake-lepton background are determined from the simulated¯events with single-lepton and dilepton final states.
The composition of the same-sign samples is illustrated in Figure 1. Electron and muon T distributions obtained from same-sign selected data events are shown together with the corresponding predictions from simulated events. The contributions are broken down into prompt leptons and various fake-lepton categories. The prompt contributions are about 35% of the same-sign sample. They include wrong-sign (WS) contributions, dominated by¯dilepton events where the electron charge sign has been wrongly reconstructed, and right-sign (RS) contributions, with same-sign prompt leptons, from¯+ events, +jets and diboson production. The fake-lepton contributions are dominated by electrons from photon conversions, which represent 62% of the same-sign sample. Sub-leading contributions come from leptons produced in semileptonic decays of heavy-flavour hadrons, and other sources, such as misidentified hadrons or decays-in-flight of pions and kaons. A conservative 50% uncertainty is assigned to the fake-lepton background [87].
All other background processes are modelled by simulated events and the expected number of events is calculated using the theoretical cross-sections and the acceptance from the simulated samples scaled to the integrated luminosity. The expected and the observed event yields are shown in Table 2. The estimated purity of¯dilepton events is approximately 96%, with the backgrounds from single top quarks being the largest impurities. Observed distributions of the leading jet T and the jet multiplicity compared with the model obtained from simulated events are shown in Figure 2. In the leading jet T distribution, a slope can be seen in the ratio of data to prediction. This is in agreement with previous differential cross-section measurements by ATLAS [87-90], and CMS [91-94], which have consistently observed a softer top-quark T spectrum (and related distributions) than what is predicted by NLO+PS MC generators. This discrepancy is at least partially due to missing NNLO corrections [95][96][97][98][99]. The difference between data and prediction is covered by the systematic uncertainties.  Figure 1: Distributions of (a) the electron T and (b) the muon T , in events with a same-sign pair and two or three jets, of which two are -tagged jets. The simulation's prediction is normalised to the number of expected events and broken down into contributions where both leptons are prompt RS or WS, or one is a fake lepton from a photon conversion originating from a top-quark decay or from background, from heavy-flavour decay or from other sources. The lower panel shows the ratio of data to the prediction in each bin. The hatched (grey) uncertainty band represents the statistical uncertainty due to limited size of the simulated samples. Events beyond the -axis range are included in the last bin. The blue triangular markers in the lower panels point to data-to-prediction ratio values which lie beyond the shown -axis range. Table 2: Event yields obtained after the event selection. The expected event yields from¯production and the various background processes are compared with the observed event yield. The fractional contributions from¯production and the background processes to the expected event yield is given in %. The processes labelled by 'Others' include production of +jets and diboson background events. The uncertainties include the MC statistical uncertainty and the normalisation uncertainty.

Process
Events   Figure 2: Distributions of (a) the leading jet T and (b) the jet multiplicity in events with an opposite-sign pair and two or three jets, of which two are -tagged jets. The observed data is compared with the expectation from simulated¯and background events, where the background includes contributions from ,¯+ , +jets and diboson processes and the estimated fake-lepton background. The lower panel shows the ratio of data to the prediction in each bin. The hatched (grey) uncertainty band includes the MC statistical uncertainty, background normalisation uncertainties, detector systematic uncertainties and¯modelling systematic uncertainties. Events beyond the -axis range are included in the last bin.

Track-based backgrounds
Selected tracks other than the hard-scatter primary tracks can arise from additional sources. Tracks originating from additional collisions, referred to as pile-up, are the dominant background source. In addition, secondary particles produced through interactions with matter rather than as a result of direct fragmentation processes form a small contribution, while secondary particles produced in heavy-flavour decays are negligible, since tracks within jets are excluded. Finally, tracks that are formed by a random combination of hits or from a combination of hits from several particles, referred to as fake tracks, are considered. Corrected quantities trk,prim and trk,prim T are obtained from the measured observables trk,out and trk,out T at detector level by subtracting these contributions on an event-by-event basis in the following way: where is the number of interactions per bunch crossing, PU is the number of pile-up tracks, sec is the number of secondary-particle tracks including the contribution from fake tracks, and PU , binned in and trk,out , and sec are scale factors to correct for differences between simulation and observed data. The corrected trk,prim T in bins of the trk,prim observable is built from Eqs. (1) and (2), and thus no additional background estimation is needed.
The pile-up and secondary-particle contamination ( PU and sec ) per event is subtracted in the measured observables in a stochastic way based on a method using MC templates. The determination of the two contributions is based on the assumption that they are uncorrelated, which was also checked to be true in simulation. The procedure is performed as follows: • trk,prim : Templates are created from simulated¯events for both the pile-up and secondary-particle track multiplicities, in bins of trk,out . The background contamination in trk,out is expected to be higher for higher track multiplicities, since more tracks are likely to be selected from additional pile-up vertices overlapping with the primary vertex. For each event a random number is drawn from each template, corresponding to the trk,out value of the event. In this way the contamination is subtracted in the measured observables in a stochastic way.
• trk,prim T : A procedure similar to that for the trk,prim observable is used, where templates are created from simulated¯events for pile-up and secondary-particle track T . In order to compensate for deficiencies in the modelling of pile-up in simulation a reweighting method is used. To obtain a corrected T distribution ( corr T ), tracks matched to pile-up vertices, in contrast to nominal tracks matched to the primary vertex, are compared between simulation and observed data and their ratio is used as a reweighting function. This function is used to reweight the T distribution obtained from pile-up tracks matched to the primary vertex. Since the contribution of tracks from secondary particles is small, no correction is applied in this case. These templates, together with the previously estimated PU ( sec ), are then used to estimate the pile-up (secondary-particle) track contamination in the measured trk,out T observable, by drawing random numbers PU ( sec ) times from the corresponding track T template for each event.
In order to validate the procedure a closure test is performed, comparing the obtained distributions with distributions built from true pile-up tracks in MC events. Differences between the expected and observed rates of pile-up tracks and secondary-particle tracks are accounted for by the scale factors PU and sec . Templates of the 0 and 0 / 0 distributions are created from simulated¯events for hard-scatter primary tracks, pile-up tracks and secondary-particle tracks. The distributions for pile-up tracks are corrected for differences between collision data and simulation using high-purity pile-up tracks originating from pile-up vertices. Two separate binned maximum-likelihood fits are performed to estimate PU and sec . Studies have shown that these two contribution can be determined independently in the following way: to obtain PU , hard-scatter primary-track and secondary-particle-track templates of the 0 distribution are used, while to obtain sec , hard-scatter primary-track and pile-up-track templates of the 0 / 0 distribution are used. The scale factors of each contribution in the two fits are kept free-floating. Studies have shown that PU depends on and the track multiplicity. Therefore, PU is determined for different intervals of and trk,out . Examples of fitted 0 and 0 / 0 distributions are shown in Figure 3 and a summary of the estimated PU ( , trk,out ) is shown in Table 3. The obtained scale factors range from 0.9 to 1.4, where higher values are obtained for higher and trk,out values. A value of 2.34 ± 0.02 is estimated for sec . The estimation of systematic uncertainties of PU is described in Section 8.2.    Figure 3: Illustration of the estimation of the pile-up (secondary-particle) track background by fitting the 0 ( 0 / 0 ) distribution. As representative examples, the 0 distribution is shown in the region with 20 ≤ < 40 and 60 ≤ trk,out < 80 in (a) and the fitted 0 / 0 distribution is shown in (b). The stacked histograms are normalised to the fit result. The lower panel shows the ratio of data to the prediction in each bin. The hatched (grey) uncertainty band represents the statistical uncertainty due to the limited size of the simulated samples.

Corrections to observables
To place the leading elements of the migration matrix on the principal diagonal, the background-subtracted quantities, trk,prim and trk,prim T , are scaled with correction factors ntrk and sum-pt , respectively, accounting for the limited track-reconstruction efficiency: The TEC factors are calculated in bins of the background-subtracted quantity by averaging the ratio of the quantity at the background-subtracted level to its value at particle level across all selected events of the nominal simulated¯sample. The function ntrk ( trk,prim ) has values between 0.68 and 0.89 in its range from 0 to 100, while the function sum-pt ( trk,prim T ) rises stepwise from 0.66 to 0.89 across the bins of its full range. The corrections are applied to the observables in measured data events, in simulated events from all scattering processes (i.e. the¯process and all background processes), and in simulated events of the samples with systematically varied parameters. The corrected quantities, trk and trk T , are the final corrected-level observables forming the distributions to be unfolded, and are shown in Figure 4. The corrected trk T in bins of the trk observable, shown in Figure 4(c), is built from Eq. (3) without any additional correction. The uncertainty band includes detector systematic uncertainties and the pile-up background-track-rate uncertainty. Within these experimental uncertainties there are considerable differences between the observed data and the nominal prediction, reflecting non-optimal tuning of the UE and MPI. Table 4 summarises the naming convention for the observables at different levels of the analysis.   Figure 4: Distributions of (a) trk , (b) trk T and (c) trk T in bins of trk for events with an opposite-sign pair and two or three jets, of which two are -tagged jets. The corrected-level observed distributions are compared with the expectation from simulated events, broken down into contributions from¯and the background processes ,¯+ , +jets and diboson, together with the estimated fake-lepton background. The estimated contributions from pile-up tracks and secondary particles is subtracted from measured data events and simulated¯events. The TEC is applied to all processes. The lower panel shows the ratio of observed data to the prediction in each bin. The hatched (grey) uncertainty band includes detector systematic uncertainties and the pile-up background-track-rate uncertainty. Events beyond the -axis range are included in the last bin.

Unfolding
The differential cross-section distributions in the fiducial phase space defined in Section 5.3 are obtained from the corrected-level events using an unfolding technique that corrects for acceptance and certain detector effects. The iterative Bayesian method [100] as implemented in RooUnfold [101] is used for this unfolding.
The unfolding procedure starts from the observables described in the previous section, i.e. after subtracting the track-based backgrounds and applying the TEC, and consists of two stages. First, the event-based background contributions bkg , summarised in Table 2, are subtracted bin-by-bin from the track-based background-subtracted and tracking-efficiency-corrected distribution corr . The contributing processes are normalised either to the corresponding theory predictions or, in the case of the fake-lepton background, to the partially data-driven estimate. Second, the distributions are unfolded using a detector response model, the migration matrix M, obtained from simulated¯events. As part of this second step, two correction factors are applied to correct for non-overlap of the fiducial phase space at corrected level and particle level. The corrections account for events that fall within the fiducial phase space of one level but not the other. The extraction of the absolute differential cross-section for an observable , being trk , trk T or trk T in bins of trk , is then summarised by the equation: where the index ( ) iterates over the bins of observable at particle (corrected) level, Δ is the bin width, L is the integrated luminosity, the inversion of M symbolises unfolding using the iterative Bayesian method, and acc and are the phase-space correction factors. These last two parameters are defined as: The number corr∧part indicates the number of corrected-level events that satisfy the particle-level selection and part is the number of events passing the particle-level selection. The response model and phase-space correction factors are obtained from simulated¯events.
The inclusive fiducial cross-section is obtained by integrating the unfolded differential cross-section over the kinematic bins, and its value is used to compute the normalised differential cross-section 1/ · d fid /d .
The binning is chosen according to the resolution of each observable such that the fraction of events in the diagonal bins is greater than 50% and the data statistical uncertainty in each bin is below 10%. The unfolding is performed using three iterations. This value is chosen because using more iterations produces only small event-count changes in each of the bins and gives lower correlation factors between them.
The unfolding procedure was validated by performing four tests. To ensure that the results are not biased by the MC generator used in the unfolding procedure, a study was performed in which the particle-level spectra from the default Powheg Box + Pythia simulation are altered by changing the shape of distributions using either a linear or sinusoidal function. Here, maximal variations of 40% are considered. Alternatively, the shapes are also changed by a smooth function that describes the ratio of the background-subtracted data distribution to the distribution obtained from simulated¯events. Here, variations of up to 30% are applied. The reweighted corrected-level¯distributions are then unfolded using the nominal migration matrices and unfolding corrections to see if the reweighted particle-level distributions can be recovered. These tests are referred to as 'stress tests'. The altered shapes are recovered within statistical uncertainties by the unfolding procedure in the case of the linear reweighting, while deviations of 5% in the tails are observed for the sinusoidal reweighting and the MC-to-data reweighting. In order to explore the impact of discrepancies between data and MC simulations in variables other than the unfolded observables, a cross-reweighting scenario is also considered. This uses the same procedure as in the stress tests above, but the reweighting function is obtained from one of the one-dimensional observables, either ch or ch T , and applied to the other two observables. The reweighting function is defined as a smooth function that describes the ratio of the measured distribution (unfolded data) to the distribution obtained from simulated¯events at the particle level. Similarly to the nominal stress tests, the altered shapes are recovered with a maximum deviation of 3.6%. Non-closure deviations after applying the background-subtracted data to¯MC ratio reweighting and cross-reweighting scenario are added as systematic uncertainties (see Section 8.4).

Systematic uncertainties
Systematic uncertainties from several sources affect the measured observables. Experimental uncertainties are associated with the reconstruction of the four-momenta of final-state objects: electrons, muons, jets, and tracks. Systematic uncertainties in the estimation of the background event rates and uncertainties related to the unfolding procedure are taken into account. Modelling uncertainties are related to parameters and methods used in MC event generation, and the statistical uncertainties are also considered. The total uncertainty is then calculated, assuming all the systematic uncertainties are uncorrelated. Finally, the sensitivity of the measured observables to variations of inital-and final-state radiation and variation of ℎ damp is tested. In the following, the different uncertainty components are explained in more detail.

Detector systematic uncertainties
Detector systematic uncertainties, due to residual differences between data and MC simulations after calibration, and uncertainties in corrective scale factors are propagated through the analysis and evaluated for¯production and the background. Non-significant uncertainties with negligible differences between the systematically varied distribution and the nominal distribution at the corrected level are not considered further. The uncertainties are evaluated by building distributions from events contributed by the systematically varied¯production process and the systematically varied background processes such as . The resulting distribution is then unfolded after subtracting the nominal background, using the nominal migration matrix and unfolding corrections. The relative deviation from the nominal unfolded distribution is then assigned as a systematic uncertainty. The following sources are considered:

• Jet uncertainties
The jet energy scale (JES) uncertainties are derived from test-beam data, in situ calibration measurements and simulation, and are parameterised in bins of jet T and [82]. Its uncertainty is decomposed into a set of 30 uncorrelated components. Sources of uncertainty contributing to the JES uncertainties include the intercalibration, jet flavour composition and response, differences between jets induced by -quarks and those from gluons or light-quarks, detector modelling, non-closure, punch-through losses and several pile-up properties. The effect of single high-T particles is also included. Furthermore, uncertainties related to the jet energy resolution (JER) are considered. The uncertainty of the JVT efficiency [83] is found to be negligible.
Since this analysis makes use of -tagging, the associated uncertainties in tagging efficiency for -jets, -jets, and light-flavour jets are considered by using an uncertainty model containing nine/four/four independent variations for the -/ -/light-flavour jet calibrations and two components for the MC-based extrapolation to high-T jets [84].
The combination of these uncertainties is labelled 'Jet' in Figure 5, with the main contribution coming from the jet flavour response.
• Track uncertainties Systematic uncertainties associated with 0 and 0 impact parameter resolutions are applied, based on measurements of minimum-bias data and → events and parameterised in T and . In addition, the impact parameter resolution is particularly sensitive to the accuracy of the simulation's modelling of the number of inactive modules in the pixel detector [75]. An uncertainty is estimated by randomly disabling 5% of the pixel modules.
Weak modes in the alignment uncertainties are accounted for by applying bias corrections to tracks as a function of their angular position to simulate the impact of such modes. Variations biasing 0 and 0 are taken into account, as well as sagitta biases [102].
Finally, uncertainties in the track reconstruction efficiency as well as track fake-rates are evaluated. Two main sources of uncertainty are considered for the track reconstruction efficiency: amount-ofmaterial uncertainties and the physics model used in simulation. The effects of these uncertainties are assessed by comparing the efficiency in samples with the different configurations: the passive material of the ID is scaled by 5%, the passive material of the IBL is scaled by 10%, the passive material in the Patch Panel 0 (PP0) region is scaled by 25%, and finally the Geant4 physics model is varied [103]. The estimation of the track fake-rate is based on observations of differences in the non-linear component of the evolution of track multiplicity as a function of ⟨ ⟩ as described in Ref. [75] and the uncertainty is assumed to be 100%.
The combination of these uncertainties is labelled 'Track' in Figure 5, with the main contribution coming from the global tracking efficiency.

• Lepton uncertainties
Uncertainties due to the lepton trigger, reconstruction and selection efficiencies in simulation are estimated from a tag-and-probe method applied to electrons and muons from leptonic decays of bosons and / mesons using methods similar to those in Refs. [77,104]. The same processes, using methods described in Refs. [76,77], are used to estimate uncertainties in the lepton momentum scale and resolution. The combination of these uncertainties is labelled 'Lepton' in Figure 5.
• Average number of additional pile-up events Scale factors are applied to reweight simulated events in order to obtain the distribution of the average number of additional pile-up events corresponding to the distribution obtained in data. An uncertainty in these reweighting scale factors is considered, based on the difference between the instantaneous luminosities in data and simulation, and is labelled as ' -reweight'.

Track-based-background uncertainties
The uncertainty in the rate of background pile-up tracks is evaluated by subtracting a varied number of pile-up tracks from the number of reconstructed tracks in¯simulated events. The variation is given by a re-evaluated pile-up scale factor PU , described in Section 6.2, from either systematically varied samples, alternative samples, or sideband fits. The resulting distribution is then unfolded using the nominal migration matrix and efficiency. The relative deviation from the nominal unfolded distribution is assigned as a systematic uncertainty. The following uncertainties in the scale factor are considered: • The uncertainty associated with the 0 impact parameter resolution is evaluated using systematically varied¯samples and it is found to be smaller than 2%.
• The uncertainty in the reweighting procedure used to correct the modelling of pile-up tracks in simulation is evaluated by varying the selection of pile-up vertices and leads to an uncertainty of 1.4% in the pile-up scale factor.
• Uncertainties due to the influence of hard-scatter primary tracks are estimated by fitting a sideband region with | 0 | > 1 mm. The resulting uncertainty ranges from less than 1% for most regions to as much as 14% in the high trk,out and low region.
• The uncertainty due to different beam conditions and alignment settings is evaluated by re-fitting the rate of background pile-up tracks for different collision-data periods and leads to an uncertainty of 0.4% in the pile-up scale factor.
• Uncertainties in the modelling of parton showers, hadronisation and the underlying event, are evaluated by comparing the nominal samples to alternative samples for which Powheg Box is interfaced to Herwig 7.1.3 instead of Pythia 8.230. The main difference is caused by the different underlying-event models and thus by a modified 0 distribution of the hard-scatter primary tracks used in the fit. The largest and dominant uncertainty is found for low trk,out values, being at a level of 17%, while otherwise the uncertainty is a few percent.
All pile-up scale-factor uncertainties are added in quadrature and the resulting relative uncertainties are presented in Table 5. The largest uncertainty is observed for events with trk,out < 20, mainly due to differences in the modelling of the UE between Pythia and Herwig. Finally, the uncertainty due to the non-closure of the subtraction method in a few bins is added in quadrature to the combined pile-up scale-factor uncertainty and labelled as 'PU subtraction' in Figure 8.

Event-based-background uncertainties
Uncertainties due to the normalisation of the background processes are assessed by varying the normalisation of each background process according to its uncertainty. The following uncertainties, based on inclusive cross-section uncertainties, are considered: single-top-quark and¯+ processes 10%, fake leptons 50%, and other processes 20%. The combination of these uncertainties is labelled 'Background rates'.
Uncertainties related to background modelling are assessed using samples where the renormalisation and factorisation scales in the matrix-element calculations and the intensity of initial-state and final-state radiation are varied. These uncertainties are labelled ' Scale r ', ' Scale f , ' ISR s ' and ' FSR', respectively. The parameter variations in the generation of these samples are identical to those used for the nominal¯process, described in Section 8.5. The uncertainty from the scheme for removing the overlap of the process with¯production is evaluated by comparing the nominal sample, using the diagram-removal scheme, with a sample produced with an alternative scheme (diagram subtraction) [55] and is labelled as ' DS vs. DR'.
The above uncertainties are evaluated by building distributions where the¯and systematically varied background contributions are added together. Each resulting distribution is then unfolded after subtracting the nominal background, using the nominal migration matrix and efficiency. Again the relative deviation from the measured unfolded distribution is assigned as a systematic uncertainty. Figure 6 shows the fractional uncertainty from each of the above-mentioned categories for the three observables and the combination of these uncertainties is labelled 'Event background' in Figure 8.

Unfolding-technique uncertainties
Based on the non-closure of the stress tests using the MC distribution reweighted to background-subtracted data and the cross-reweighting scenario, an unfolding-technique uncertainty is defined. It was checked that the cross-reweighting scenario together with generator uncertainties cover potential differences in the responses of the measured observables due to track-T differences between simulation and data. The reweighted corrected-level¯distribution is unfolded using the nominal migration matrix and unfolding corrections. The envelope of relative differences between these unfolded distributions is assigned as a systematic uncertainty, referred to as the 'Unfolding technique' in Figure 8. For the distributions of ch and ch T , the unfolding-technique uncertainty ranges between 0.1% and 2.2%, and is less than the statistical uncertainty of the data in all bins except the first one in each of these distributions. In the two-dimensional measurement of ch T in bins of ch the unfolding-technique uncertainty ranges between 0.3% and 3.6%, and exceeds the statistical uncertainty of the data in five of the nine bins.

8.5¯modelling uncertainties
The choice of MC generator and its settings determines the kinematic properties of simulated¯events. In order to assess uncertainties in different detector responses or different modellings of the event kinematics, an alternative MC sample is unfolded using the nominal Pythia 8 migration matrix and corrections and then compared with the generated 'truth' spectrum of that sample. The systematic uncertainty is then given by the size of the non-closure, which is propagated as a relative uncertainty. This procedure is applied to all following uncertainties if not stated otherwise. the MMHT2014lo [105] PDF set was used with the H7-UE-MMHT [7] set of tuned parameters. This uncertainty is labelled as 'H7' in the following.
Uncertainties related to the choice of renormalisation and factorisation scales in the matrix-element calculations, labelled as 'Scale r ' and 'Scale f ', are evaluated by varying the scales in a correlated way by factors of 2 and 0.5. The scale variations are implemented as generator weights in the nominal sample.
The uncertainty due to the scale choice for matching the matrix-element calculation of the¯process to the parton shower is estimated using an additional¯sample produced with the ℎ damp parameter set to 3 × top instead of 1.5 × top , while keeping all other generator settings the same as for the nominal sample ofē vents. This uncertainty is labelled as 'ℎ damp ' in the following. Uncertainties in the modelling of colour reconnection effects are studied using samples with modified CR settings or using different CR models. A detailed description of the tuned parameter settings is given in Ref. [22], and the CR models are described in Section 2.1. The final CR uncertainty is then given by the envelope of the non-closure differences of CR0-CR2 comparisons and labelled as 'CR'.
The uncertainty due to the choice of PDF is evaluated using the PDF4LHC15 combined PDF set [106] with 30 symmetric eigenvectors. The¯production sample is reweighted with the central value and eigenvectors of the combined PDF set to construct a set of systematically varied templates. This uncertainty is labelled as 'PDF4LHC'.
In all uncertainty evaluations mentioned above the alternative samples or reweighted samples are normalised to the total cross-section of the nominal samples.

Luminosity
The uncertainty in the combined 2015-2018 integrated luminosity is 1.7% [35], obtained using the LUCID-2 detector [37] for the primary luminosity measurements. The combination of the uncertainty in the luminosity and all detector uncertainties is labelled 'Detector' in Figure 8. A summary of the fractional uncertainty from each of the previously mentioned categories is shown for the three observables in Figure 8. The pile-up tracks background and the modelling uncertainties generally have the largest impact on the measurement.

Impact of scale variations in Pythia 8
In order to understand the sensitivity of the measured differential cross-section distributions to variations of the Powheg Box + Pythia 8 set-up which are not directly connected with CR or UE modelling, the differential cross-sections are compared with those with varied ISR/FSR and ℎ damp parameter values.
Additional radiation can produce additional charged particles either by leakage from high-T jets or by producing more soft jets. Thus, the normalised differential cross-sections are compared in Figure 9 for variations of the ℎ damp , ISR and FSR parameters in the nominal Powheg Box + Pythia 8 set-up as described in Section 8.5. The largest differences are seen for variations of ℎ damp . In the bulk of the ch and ch T distributions the differences are only a few percent. However, for low and high values of ch and ch T , the differences are as large as 15%. The influence of FSR and ISR is much weaker and is only significant for low ch T values. It can also be seen that the effect of ISR is the exact opposite of that of FSR for these bins. Based on these observations, it can be concluded that the measurement is only weakly sensitive to modelling of additional radiation in¯events and that differences seen among the predictions from various generator set-ups can be attributed mainly to differences in the CR or UE modelling.

Results
In this section, the analysis results are presented by comparing the normalised differential cross-sections with predictions from various MC models with different colour reconnection models and settings. The absolute differential cross-sections can be used to calculate the inclusive cross-section in the fiducial phase-space, and then the normalised differential cross-sections. At particle level the following requirements, as defined in Section 5.3, are imposed: events are required to have exactly one electron and one opposite-sign muon with T > 25 GeV and | | < 2.5 and contain two or three jets with T > 25 GeV and | | < 2.5, where exactly two of the jets must be -tagged. Requirements of ℓℓ > 15 GeV and ch < 100 are also applied. The resulting cross-section is: fid = 4.94 +0. 43 −0.41 (syst) ± 0.03(stat) ± 0.08(lumi) pb.
In order to quantify how well the measured differential cross-sections agree with the various model predictions, their respective 2 values are calculated. The formula for calculating the 2 value, given the unfolded data, the model, and the covariance matrix, is defined as follows: where b is the number of bins in the spectrum, b is the vector of differences between the unfolded data and the prediction, and Cov b represents the covariance matrix.
The covariance matrix incorporates both the statistical and the systematic uncertainties. The systematic uncertainties include experimental uncertainties, signal modelling and background-related uncertainties as described in the previous chapter. In addition, uncertainties in the theoretical predictions, which are independent of the CR modelling, are considered and referred to as 'theory uncertainties'. These theory uncertainties include scale variations in the matrix element and parton shower as well as the ℎ damp variation. The relative difference between the unfolded variation and the nominal unfolded distribution is taken as uncertainty.
The covariance matrix is obtained by performing pseudo-experiments in the following way: each bin of the measured distribution at the corrected level is varied according to a Poisson distribution with the number of observed events in this bin as the mean value. Gaussian-distributed shifts are coherently added for each detector-modelling systematic uncertainty and the MC statistical uncertainty by scaling each Poisson-fluctuated bin with the expected relative variation from the associated systematic uncertainty effect. In the case of a one-sided uncertainty, the corresponding contribution is symmetrised. The resulting distribution is then corrected with the nominal event-based background subtraction and passed through the standard unfolding procedure using the nominal corrections. Modelling uncertainties and an unfolding-procedure uncertainty are then coherently added with additional Gaussian-distributed shifts. For the normalised differential cross-sections, b is replaced by b−1 , which is the vector of differences between the unfolded data and the prediction, discarding one of the b elements. Consequently, Cov b−1 is the sub-matrix derived from the full matrix of the normalised differential cross-sections by discarding the corresponding row and column. Hence, the obtained sub-matrix is invertible and the 2 does not depend on which element is discarded. In practice the last row and column is removed.
In addition, a covariance matrix including correlations of systematic uncertainties between the two observables ch and ch T is calculated and used to derive a so-called global 2 value, denoted by 'Global( ch , ch T )'. All of the covariance matrices have large off-diagonal elements, mainly due to the modelling uncertainties. In the following comparisons of the differential cross-sections with predictions from different event generators using the nominal settings, predictions from CR models implemented in Pythia 8 and CR models implemented in Herwig 7 are presented and discussed.

Comparisons with predictions from different generators
In Figure 10 Table 6 shows the 2 /NDF values resulting from the comparisons of the measured differential cross-sections with the corresponding prediction from these generators for each observable. Figure 10 shows that the unfolded data disagree with the predictions from Sherpa 2.2.10, which does not include CR effects, thus illustrating their importance. The ch measured cross-section is approximately equally well described by Pythia 8.230 and Herwig 7, while the ch T measured normalised cross-section has a much better 2 value for Herwig 7. With increasing Herwig 7 version number, the predicted and observed ch distributions show worse agreement, while for ch T the level of agreement remains about the same. The distribution of ch T in bins of ch is best described by the three versions of Herwig 7 in comparison with Pythia 8.230 and Sherpa 2. Relative to Pythia 8.230, the lower 2 value for Sherpa 2 is due to the inclusion of the theory uncertainties. Overall, the best agreement is achieved using Herwig 7.0.4. Table 6: The 2 and NDF for measured normalised differential cross-sections obtained by comparing the different predictions with the unfolded data. Global( ch , ch T ) denotes the scenario in which the covariance matrix is built including correlations of systematic uncertainties between the two observables ch and ch T .  Unfolded data are shown as the black line with the grey band corresponding to the total uncertainty in each bin. The results are compared with the predictions of different MC event generators. Events beyond the -axis range are included in the last bin. The lower panels show the ratio of each prediction to data in each bin. The black triangular markers in the lower panels point to prediction-to-data ratio values which lie beyond the shown -axis range.

Comparison with different CR models in Pythia 8
In Figure 11, the normalised differential cross-sections are compared with predictions from different CR models implemented in Pythia 8. Table 6 shows 2 /NDF values resulting from the comparisons between the measured differential cross-sections and the predictions. The CR0 model is based on the same MPI-based CR model as the nominal Powheg+Pythia 8.230 sample. One can see that the unfolded data is in better agreement with the sample using the dedicated CR0 set of tuned parameters than with the nominal sample without a dedicated CR tune, but that the global 2 is somewhat larger. For the CR1 model, ch T is described best, while ch is worst among the three CR models. In contrast, the ch T distribution, obtained with the CR2 model, is in worst agreement with data, while the other distributions have 2 /NDF values comparable to those obtained from Powheg+Pythia 8.230 with the A14 set of tuned parameters. It can also be seen that neither these CR models nor the A14 tune can describe the lower part of the ch T distribution.

Pythia 8 CR and UE parameters
In this section, the normalised differential cross-sections are compared with the predictions obtained from Pythia 8 with variations of specific parameters connected with the CR and UE modelling. These variations are applied to the nominal Pythia 8 set-up using CR0 and the A14 tune. The varied parameters are: • CR range parameter rec is set to its maximal value of rec = 10, such that the reconnection probability reaches saturation (maxCR). The default is 1.71.
• MPI parameter ref T0 , by default set to 2.09, is lowered to 2.0 and raised to 2.2. • UE activity is varied by using the Var1 eigentune of the A14 tune [21]. This eigentune includes variations of s and variations of rec .
In Figure 12 the normalised differential cross-sections are compared with both the predictions with these varied parameter values and the predictions of the nominal Pythia 8 sample. The predictions without CR are not compatible with the data, while predictions using a maximal probability for colour reconnection in the default Pythia 8 CR0 model are still compatible with the measurement. Another observation is that the overall ch T value is not sensitive to the CR strength, but it is sensitive within the ch ranges. Finally, a clear sensitivity to the variation of ref T0 is observed and the presented measurement would suggest a value of ref T0 lower than the one used in the A14 tune. Thus, it can be concluded that for further tuning of CR models both the parameters of the CR model itself and the parameters of the UE should be included.

Herwig 7.2 CR models
In this section, the normalised differential cross-sections are compared with the predictions obtained from different CR models implemented in Herwig 7.2, as described in Section 2.2. In Figure 13 the normalised differential cross-sections are compared with distributions from both the predictions of the different CR models and predictions without any CR. Similarly to Pythia 8, predictions without CR do not describe data well, but in the case of Herwig 7.2 some deficiencies are compensated for by the UE modelling. Table 6 shows 2 values resulting from the comparisons between the measured differential cross-sections and the predictions. For the ch distribution the newly introduced statistical CR model performs the best, while for ch T the baryonic CR model agrees better with the observed distribution. Thus, even without dedicated retuning of the alternative CR models, they agree better with the measured data than the default plain CR model. In general, agreement with the ch T distribution is better than for any other generator considered in this analysis.

Top-quark-specific CR models of Pythia 8
In this section, the normalised differential cross-sections are compared with the predictions obtained from different top-quark-specific CR models implemented in Pythia 8, as described in Section 2.1. In Figure

Summary and conclusion
A measurement of three observables sensitive to colour reconnection effects in¯events is presented using a data sample of 139 fb −1 recorded by the ATLAS experiment at the LHC in proton-proton collisions at a centre-of-mass energy of √ = 13 TeV from 2015 to 2018. The charged-particle multiplicity, the scalar sum of their transverse momenta, and the scalar sum in bins of charged-particle multiplicity are measured in a fiducial phase space in events with exactly one isolated electron and one isolated muon with opposite charge, and two or three jets, where exactly two jets are required to be -tagged. The observables are unfolded to stable-particle level and they are compared with predictions from parton-shower programs. The measurement has a precision of 2%-10% in the central bins and up to 50% in the outer bins. The combination of the first two observables, and a double-differential measurement of the two quantities, provide stringent constraints on the colour reconnection models in mainstream Monte Carlo generators. Using these results as a benchmark, more precise models can be developed in a dedicated tuning effort. Distributions of ch T obtained from Herwig 7.2 are able to describe data well, while predictions from Pythia 8.2 for ch are better than those for ch T . The event generator Sherpa 2.2.10 does not include a colour reconnection model, which should lead to a higher particle multiplicity because the suppression mechanism from CR is missing. Instead, it predicts too few particles, and also its ch T distribution is softer than measured in data. Among the studied Pythia 8 colour reconnection models the best agreement with data is achieved with the default MPI-based set of tuned parameters and the largest disagreement is found for the set of parameters based on the gluon-move model. In the case of the MPI-based model, variations of the colour-reconnection range parameter to its minimal and maximal values are investigated and, similarly to Sherpa, a model without colour reconnection is excluded, while a maximal strength is still compatible with the measurement. Finally, top-quark-specific colour reconnection models are compared with the measured distributions, and sensitivity to strength-parameter variations is mainly seen in the distribution of ch T in bins of ch for two of the five models. Variations of the Pythia 8 MPI model parameter ref T0 showed that a value lower than the one used in the A14 set of tuned parameters is suggested by the measurement. In the case of Herwig 7.2 the newly introduced statistical colour-reconnection model agrees best for the ch distribution, while for the ch T distribution the baryonic CR model agrees best.
The effects of varying the colour reconnection models can be used to develop estimates of a colour reconnection uncertainty. The chosen observables are not only sensitive to parameters related to colour reconnection models, but also to the MPI parameters. Thus, these results could be used as input to future tuning of Monte Carlo generators for both colour reconnection and MPI parameters, which should be handled simultaneously.  Figure 15 shows the migration matrices used as input to the unfolding method for the observables ch , ch T , and ch T in bins of ch . The corrected-level observable is shown on the -axis, while the particle-level observable is shown on the -axis. The values are normalised so that each row sums to unity.

B Absolute differential cross-sections
Absolute differential cross-sections are compared with predictions from various MC event generators and presented in Figures 16-19.

C Additional material for the measured distributions
In addition to the quantitative comparisons of normalised differential cross-sections with predictions from different MC models, this appendix presents two alternative 2 scenarios: • Using only uncertainties described in Section 8, i.e. modelling uncertainties are only included as non-closure uncertainties, while the extra theory uncertainties defined in Section 9 are omitted. This scenario is referred to as 'Total' in the following tables.    Figure 18: Absolute differential cross-section as a function of (a) ch , (b) ch T , and (c) ch T in bins of ch . Unfolded data are shown as the black line with the grey band corresponding to the total uncertainty in each bin. The results are compared with the predictions of different CR and UE parameter variations in Pythia 8. Events beyond the -axis range are included in the last bin. The lower panels show the ratio of each prediction to data in each bin. The black triangular markers in the lower panels point to prediction-to-data ratio values which lie beyond the shown -axis range.  Figure 19: Absolute differential cross-section as a function of (a) ch , (b) ch T , and (c) ch T in bins of ch . Unfolded data are shown as the black line with the grey band corresponding to the total uncertainty in each bin. The results are compared with the predictions of different CR models in Herwig 7.2. Events beyond the -axis range are included in the last bin. The lower panels show the ratio of each prediction to data in each bin.
• Using the detector covariance matrix, and adding modelling uncertainties, scale variations in the matrix element and parton shower as well as the ℎ damp variation only to the diagonal elements of the covariance matrix. This scenario is referred to as 'De-correlate modelling' in the following tables.
The 2 values corresponding to these two scenarios for the normalised differential cross-sections are shown in Table 7, while the 2 values for the absolute differential cross-sections are shown in Table 8. In both tables, a clear reduction of the 2 values can be seen in the 'De-correlate modelling' scenario compared to the 'Total' scenario. This clearly shows that the modelling uncertainty correlations between the bins are the reason for the large 2 values in the case of the 'Total' scenario. The results for the two scenarios follow similar trends. The ch measured cross-section is approximately equally well described by Pythia 8.230 and Herwig 7, while the ch T measured normalised and absolute differential cross-sections are in much better agreement with Herwig 7. With increasing Herwig 7 version number, the ch distribution agreement gets worse, while the ch T agreement with remains at a similar level. The distribution of ch T in bins of ch is described best by the three versions of Herwig 7, not so well by Pythia 8.230, and worst by Sherpa 2. Overall, the best agreement is achieved using Herwig 7.0.4.              The ATLAS Collaboration