1 Introduction

A typical proton–proton (pp) collision studied at the LHC consists of a short-distance hard-scattering process and accompanying activity collectively termed the underlying event (UE). The hard-scattering processes have a momentum transfer sufficiently large that the strong coupling constant is small and the cross-section may be calculated perturbatively in quantum chromodynamics (QCD). The driving mechanisms for the production of the UE are at a much lower momentum scale. These mechanisms include partons not participating in the hard-scattering process (beam remnants), radiation processes and additional hard and semi-hard scatters in the same pp collision, termed multiple parton interactions (MPI). Phenomenological models are required to describe these processes using several free parameters determined from experiment. In addition to furthering the understanding of the proton’s internal structure and the related soft-QCD processes, accurate modelling of the UE is crucial for many data analyses at a hadron collider, either to precisely determine Standard Model quantities or to search for new particles and interactions.

The UE is not distinguishable from the hard scatter on an event-by-event basis. However, there are observables which are sensitive to the UE properties, as first introduced by the CDF Collaboration in proton–antiproton (\(p\bar{p}\)) collisions at a centre-of-mass energy of 1.8 \(\text {Te}\text {V}\) [1]. An example of such an observable can be defined by topological considerations, based on the activity measurement in the direction transverseFootnote 1 to a reference object.

The object in the event with the leading transverse momentum relates the UE activity to the scale of the momentum transfer in the hard interaction. In general, processes with leptonic final states like Drell–Yan events are experimentally clean and theoretically well understood, allowing reliable identification of the particles from the UE. The absence of QCD final-state radiation (FSR) permits a study of different kinematic regions with varying transverse momenta of the Z boson due to harder or softer initial-state radiation (ISR).

Previous measurements of distributions sensitive to the properties of the UE in Drell–Yan events were performed in pp collisions at a centre-of-mass energy of 7 \(\text {Te}\text {V}\) by the ATLAS [2] and CMS [3] Collaborations and at a centre-of-mass energy of 13 \(\text {Te}\text {V}\) by the CMS Collaboration [4]. Both measurements at \(\sqrt{s}={7}~\hbox {TeV}\) verified that the dependence of the UE activity on the dimuon invariant mass is qualitatively well described by the Powheg+Pythia8 and Herwig++ sets of tuned parameters but with some significant discrepancies. Reference [2] provides distributions which are sensitive to the choice of parameters used in the various UE models.

This paper presents distributions of four observables sensitive to the UE in events containing a Z boson produced in pp collisions at a centre-of-mass energy of 13 \(\text {Te}\text {V}\) in the ATLAS detector at the LHC, where the singly produced Z boson decays into \(\mu ^{+}\mu ^{-}\). Observables measured as a function of the transverse momentum of the Z boson, \(p_\mathrm {T}^{Z}\), in various regions of phase space are compared with predictions from several Monte Carlo (MC) event generators.

2 Underyling event observables and measurement strategy

Events containing two muons originating from the decay of a singly produced Z boson form a particularly interesting sample for studying the UE. The final-state Z boson is well-identified and colour neutral, so that interaction between the final-state leading particle and the UE is minimal. Gluon radiation from the quarks or gluons initiating the hard scatter are, however, an important consideration as these give the remainder of the event a non-zero transverse momentum and change the kinematics of the final-state. Observables are therefore measured in different regions of the transverse plane, which are defined relative to the direction of the Z boson as illustrated in Fig. 1.

Fig. 1
figure 1

a Illustration of away, transverse, and towards regions in the transverse plane defined relative to the direction of the Z boson. b Illustration of an isotropic and a balanced event topology in the transverse plane with their corresponding values of thrust \(T_{\perp }\). In these figures, the beams are travelling perpendicular to the plane of the page

A charged particle lies in the away region if its azimuthal angle relative to the Z boson direction \(|\Delta \phi |\) is greater than \(120^\circ \). This region is heavily dominated by the hadronic recoil against the Z boson from initial state quark/gluon radiation and is therefore not particularly sensitive to the UE. The toward (\(|\Delta \phi |\le 60^\circ \)) and transverse (\(60^\circ <|\Delta \phi |\le 120^\circ \)) regions contain less contamination from the hard process after subtraction of the two muons from the Z boson. The transverse region is sensitive to the UE because, by construction, it is perpendicular to the direction of the Z boson and hence is expected to have a lower level of activity from the hard-scattering process than the away region. The two transverse regions are differentiated on an event-by-event basis by their scalar sum of charged-particle \(p_{\text {T}}\) . The one with the larger sum is labelled trans-max and the other trans-min [5, 6]. The trans-min region is highly sensitive to the UE activity because it is less likely that activity from recoiling jets leaks into this region.

Four distributions are studied to understand the UE activity. The first is the charged-particle transverse momentum \(\text {d}N_\text {ch}/\text {d} p_\mathrm {T}^{\text {ch}}\) distribution inclusive over all selected particles. The final spectrum for this variable is accumulated over all events and then normalized. The next three are evaluated on an event-by-event basis: the charged-particle multiplicity \(\text {d}N_{\text {ev}}/\text {d}(N_{\text {ch}}/\delta \eta \delta \phi )\), the scalar sum of the transverse momentum of those particles \(\text {d}N_{\text {ev}}/\text {d}(\Sigma p_{\text {T}}/\delta \eta \delta \phi )\), and the mean transverse momentum \(\text {d}N_{\text {ev}}/\text {d}(\text {mean}\ p_{\text {T}})\), where mean \(p_{\text {T}}\) is the quotient of \(\Sigma p_{\text {T}} \) and \(N_{\text {ch}}\) (provided \(N_{\text {ch}}>0\) in the corresponding region). The distributions of these variables are produced separately for charged particles lying in each of the regions described above. The charged-particle multiplicity and the scalar sum of transverse momenta are normalized relative to the area of the corresponding region in the \(\eta \)\(\phi \) space. This simplifies the comparison of the activity in different regions. The distributions are distinguished in different ranges of the Z boson transverse momentum \(p_\mathrm {T}^{Z}\) and for two regions of transverse thrust \(T_{\perp }\) [7]. Transverse thrust characterizes the topology of the tracks in the event and is

$$\begin{aligned} T_{\perp }{} = \frac{ \sum _i | \vec {p_{\text {T},i}} \cdot \hat{n} | }{ \sum _i | \vec {p_{\text {T},i}} | }. \end{aligned}$$
(1)

The thrust axis \(\hat{n}\) is the unit vector which maximizes \(T_{\perp }\). Here the summation is done on an event-by-event basis over the transverse momenta \(p_{\text {T}}\) of all charged particles except the two muons. Transverse thrust has a maximum value of 1 for a pencil-like dijet topology and a minimum value of \(2/\pi \) for a circularly symmetric distribution of particles in the transverse plane, as illustrated in Fig. 1. As proposed in Ref. [8], events with lower values of \(T_{\perp }\) are more sensitive to the MPI component of the UE. The two regions of thrust examined in this paper are \(T_{\perp} < 0.75\) and \(T_{\perp} \geq 0.75\), which are optimized to distinguish extra jet activity from the actual UE activity. A measurement of transverse thrust in combination with the UE activity was done at \(\sqrt{s}={7}~\hbox {TeV}\) [9], but it did not distinguish the transverse regions.

In this paper, all measurements are also performed inclusively in \(T_{\perp }\). In total, the spectra of the four observables are measured in 96 regions of phase space, i.e. in eight bins of \(p_\mathrm {T}^{Z}\); in the away, toward, trans-max, and trans-min regions; and for low, high, and inclusive \(T_{\perp }\). The bin boundaries in \(p_\mathrm {T}^{Z}\) are (0, 10, 20, 40, 60, 80, 120, 200, 500) \(\text {Ge}\text {V}\). In addition to distributions of the four observables, the arithmetic means \(\langle N_{\text {ch}}\rangle \), \(\langle \Sigma p_{\text {T}} \rangle \), and \(\langle \text {mean}\ p_{\text {T}} \rangle \) are evaluated as functions of \(p_\mathrm {T}^{Z}\) in each of the various regions of phase space.

3 The ATLAS detector

The ATLAS detector [10,11,12] at the LHC covers nearly the entire solid angle around the collision point. It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer (MS) incorporating three large superconducting toroid magnets.

The ID is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range \(|\eta | < 2.5\). A high-granularity silicon pixel detector typically provides four measurements per track and is surrounded by a silicon microstrip tracker (SCT), which usually provides four three-dimensional measurement points per track. These silicon detectors are complemented by a transition radiation tracker, which enables radially extended track reconstruction up to \(|\eta | = 2.0\).

The MS comprises separate trigger and precision tracking chambers which measure the deflection of muons in a magnetic field generated by superconducting air-core toroids. The precision chamber system covers the region \(|\eta | < 2.7\) with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range \(|\eta | < 2.4\) with resistive-plate chambers in the barrel and thin-gap chambers in the endcap regions.

A two-level trigger system is used to select interesting events [13]. The level-1 trigger is implemented in hardware and uses a subset of the muon spectrometer and calorimeter information to reduce the event rate to around 100 kHz. This is followed by a software-based trigger which runs offline reconstruction algorithms and reduces the event rate to approximately 1 kHz.

4 Data and simulated event samples

Data recorded in 2015 with the ATLAS detector at the LHC in proton–proton collisions at a centre-of-mass energy of 13 \(\text {Te}\text {V}\) are used in this analysis. The data set corresponds to an integrated luminosity of 3.2 fb\(^{-1}\). Only events recorded when the detector was fully operational are considered.

Simulated MC events are used both to estimate the contamination from background processes in data and to correct the measured data for detector inefficiency and resolution effects (Sect. 6.1).

The \({ Z} \rightarrow \mu \mu \) signal process was simulated using the next-to-leading-order Powheg [14, 15] event generator with the CT10 set of parton distribution functions (PDFs) [16] and interfaced to the Pythia 8.170 event generator [17, 18] to simulate the parton shower, hadronization and UE with the CTEQ6L1 PDF set and the AZNLO set of tuned parameters [19]. The latter option tunes the event generator to the \(p_\mathrm {T}^{Z}\) measurement at \(\sqrt{s}=7\,\text {Te}\text {V}\) [19]. Hence, it retunes the overall UE activity by adjusting the Pythia MPI cut-off parameter to the UE activity of the previous measurement [2] in the lowest \(p_\mathrm {T}^{Z}\) bin (0 to \(5\,\text {Ge}\text {V}\)). Photos [20] was used to simulate final-state electromagnetic radiation. The Pythia generator uses \(p_{\text {T}}\)-ordered parton showers and a hadronization model based on the fragmentation of colour strings. Its MPI model interleaves the ISR and FSR emissions with MPI scatters.

An alternative signal sample used for cross-checks and systematic uncertainty evaluations was simulated using Sherpa 2.2.0 [21], which has an independent implementation of the parton shower, hadronization, UE and FSR. The Sherpa samples utilize the NNPDF30NNLO PDF set [22] and were generated with the nominal tune set of version 2.2.0. The Sherpa generator uses leading-order matrix elements with a model for MPI similar to that of Pythia 8 but without interleaving the FSR. It implements a cluster hadronization model similar to that of Herwig++ . Sherpa and Pythia impose the infrared cut-off for MPI as a smooth function. In contrast, Herwig++ implements it as a step function. A signal sample produced with the MC generator Herwig++ [23] using the UE-EE-5 tune [24] provided by the generator’s authors and the corresponding CTEQ6L1 PDF set is compared with unfolded data in Sect. 7. This tuning uses energy extrapolation and was developed to describe the UE and double parton interaction effective cross-section. Herwig++  uses, similarly to Pythia, a leading-logarithm parton shower model matched to leading-order matrix element calculations, but it implements a cluster hadronization scheme with parton showering ordered by emission angle.

Three sources of background are estimated using MC samples: \({ Z} \rightarrow \tau \tau \) , \(WW\rightarrow \mu \nu \mu \nu \), and the \(t{\bar{t}}\) process, each of which was simulated using Powheg [25, 26] interfaced to Pythia8 or Pythia6 for \(t{\bar{t}}\). The Pythia tune set for \({ Z} \rightarrow \tau \tau \) and \(WW\rightarrow \mu \nu \mu \nu \) is the same as was used for the signal process (AZNLO). The Perugia 2012 [27] tune set was used for simulation of the \(t{\bar{t}}\) process.

Overlaid MC-generated minimum-bias events [28] simulate the effect of multiple interactions in the same bunch crossing (pile-up). These samples were produced with Pythia 8 using the A2 tune set [29] in combination with the MSTW2008LO PDF set. The A2 tune set was matched to the ATLAS minimum-bias measurement at \(\sqrt{s}={7}~\hbox {TeV}\) [30]. The mean number of interactions per bunch crossing \(\langle \mu \rangle \) during the 2015 data-taking with 25 ns bunch spacing was 13.5. The simulated samples are reweighted to reproduce the distribution of the number of interactions per bunch crossing observed in the data.

The Geant4 [31] program simulated the passage of particles through the ATLAS detector. Differences in muon reconstruction, trigger, and isolation efficiencies between MC simulation and data are evaluated using a tag-and-probe method [32], and the simulation is corrected accordingly. Additional factors applied to the MC events correct for the description of the muon energy and momentum scales and resolution, which are determined from fits to the observed Z boson line shapes in data and MC simulations [32]. Finally, correction factors adjust the distribution of the longitudinal position of the primary pp collision vertex [33] to the one observed in the data.

5 Event and track selection

Candidate \({ Z} \rightarrow \mu \mu \) events are selected by requiring that at least one out of two single-muon triggers be satisfied. A high-threshold trigger requires a muon to have \(p_{\text {T}} {} > 40~\text {Ge}\text {V}\), whilst a low-threshold trigger requires \(p_{\text {T}} > 20~\text {Ge}\text {V}\) and the muon to be isolated from additional nearby tracks. All events are required to have a primary vertex (PV). The PV is defined as the reconstructed vertex in the event with the highest \(\Sigma p_{\text {T}} \) of the associated tracks, consistent with the beam-spot position (spatial region inside the detector where collisions take place) and with at least two associated tracks with \(p_{\text {T}} > 400\,\text {Me}\text {V}\).

The main selections to define the regions of phase space are summarized in Table 1. The reconstruction procedure for muon candidates combines tracks reconstructed in the inner detector with tracks reconstructed in the MS [32]. The reconstructed muons are required to have \(p_{\text {T}}\) > 25 \(\text {Ge}\text {V}\) and \(|\eta |<2.4\). Track quality requirements are imposed to suppress backgrounds, and the muon candidate is required to be isolated using a \(p_{\text {T}}\) - and \(\eta \)-dependent ‘gradient’ isolation criterion [32] based on track and calorimeter information. Muon candidates consistent with having originated from the decay of a heavy quark are rejected by requiring the significance of the transverse impact parameter (\(|d_{0}/\sigma (d_{0})|\), with \(d_{0}\) representing the transverse impact parameter and \(\sigma (d_{0})\) the related uncertainty) to be below 3. Furthermore, the muon candidates must be associated to the PV, i.e. the longitudinal (\(|z_{0}\sin {\theta }|\)) impact parameter is less than 0.5 mm. The variables \(d_{0}\) and \(z_{0}\) are measured relative to the PV.

Events are required to have exactly two opposite-charged muons satisfying the selection criteria above. The invariant mass of the dimuon system must be between 66 \(\text {Ge}\text {V}\) and 116 \(\text {Ge}\text {V}\).

Tracks reconstructed in the ID from the passage of charged particles are used to form the UE observables. Each reconstructed track is required to have \(p_{\text {T}} > 0.5~\text {Ge}\text {V}\), \(|\eta |<2.5\), one hit in the innermost layer is required (if expected) and in total at least one hit in the pixel detector and at least six hits in the SCT. The tracks must have been assigned to the PV, i.e. the transverse and longitudinal impact parameters of the tracks relative to the PV must be smaller than \({2}\hbox { mm}\) and \({1.5}\hbox { mm}\) respectively. An additional requirement on the quality of the fit of the track to the hits in the detector applies to tracks with \(p_{\text {T}} > 10~\text {Ge}\text {V}\) in order to suppress mismeasured tracks at high \(p_{\text {T}}\) . This criterion affects mainly the tracks associated with the muon candidates and has little impact on the predominantly low-\(p_{\text {T}}\) tracks of the UE activity.

The kinematics of the Z boson and of the charged particles in the event define the phase space of the fiducial region (particle level). This closely reflects the selection made on measured detector quantities outlined before. Simulated events are required to have two prompt muons that satisfy \(p_{\text {T}} >25\) \(\text {Ge}\text {V}\) and \(|\eta |<2.4\) with each muon defined at the ‘bare’ level (after final-state QED radiation). The measurements are all reported in bins of \(p_\mathrm {T}^{Z}\), the results presented in this paper are not sensitive to the predicted shape of the \(p_\mathrm {T}^{Z}\) spectrum, even though they are sensitive to jet activity in the event. As a cross-check the observables are constructed as defined before but the muons are unfolded to the ‘dressed’ level (i.e. collinear QED FSR is added to the ‘bare’ level muons) similar to the previous UE measurement in Z events [2]. The difference between the results after unfolding to different generator levels is below the percent level and is less than the uncertainty related to the unfolding procedure. Charged particles must be stable, i.e. have a proper lifetime with \(c\tau >{10}\hbox { mm}\), with \(p_{\text {T}} >0.5\) \(\text {Ge}\text {V}\). and \(|\eta |<2.5\).

The statistical uncertainties of the data and the MC simulations are propagated using the bootstrap method [34]. While the statistical error of the data is the limiting factor for all distributions at high \(p_\mathrm {T}^{Z}\), it does not limit the measurements in phase-space regions of lower \(p_\mathrm {T}^{Z}\), which are particularly important for tuning MC simulations.

Table 1 A summary of the fiducial volume definition of the measurement, the particle-level definition, and the main observables. The first row lists selection criteria for the signal muons (indicated with an \(\mu \) as superscript) limited by the detector geometry, while the cut on the dimuon invariant mass \(m^{\ell \ell }\) yields a low background contamination

6 Corrections and systematic uncertainties

6.1 Unfolding

An iterative Bayesian unfolding technique is used to correct the data for detector inefficiencies and resolution [35,36,37]. Response matrices connect each observable at the detector and particle levels; these are constructed using the Powheg+Pythia8 signal MC sample which is overlayed with pile-up events at detector level. Each response matrix corresponds to a bin of \(p_\mathrm {T}^{Z}\) or thrust, with the migration of events between \(p_\mathrm {T}^{Z}\) or thrust bins corrected using a per-bin purity correction factor. In the context of MC simulations, the purity of one bin is defined as the fraction of events that are reconstructed in the same bin as the original particle level quantity. The bin intervals in \(p_\mathrm {T}^{Z}\) and thrust are chosen to yield high purities (\(>0.9\) for the bins in \(p_\mathrm {T}^{Z}\) and \(>0.85\) for the two bins in \(T_{\perp }\)) enabling the per-bin corrections. For the observable \(\text {d}N_\text {ch}/\text {d} p_\mathrm {T}^{\text {ch}}\), two unfolding iterations are sufficient for convergence of the unfolding results, while for all other observables eight iterations are performed. The evaluation of the mean value of each observable in a bin of \(p_\mathrm {T}^{Z}\) and thrust occurs after unfolding. The bin boundaries are the same at both the detector and particle levels.

6.2 Background subtraction

The background contributions to the selected data from the \({ Z} \rightarrow \tau \tau \) , \(t{\bar{t}}\), and \(WW\rightarrow \mu \nu \mu \nu \) processes are estimated using MC simulations. In total, these are about 0.7% of selected data events. This fraction varies from 0.9% for the lowest bin in \(p_\mathrm {T}^{Z}\) to the per mille level for the highest \(p_\mathrm {T}^{Z}\) bin. The background contribution from multijet processes is estimated using a data-driven technique based on the isolation and charge of the two reconstructed muons, similar to previous analyses [2]. The size of the multijet contribution in the data is less than 0.1%. The unfolding of the data is done after the subtraction of all MC and data-driven background estimates.

6.3 Systematic uncertainties

Systematic uncertainties can arise due to possible mismodelling of the muon momentum scale or resolution, as well as the reconstruction, identification, and isolation efficiencies. Furthermore, limited knowledge of the ID material distribution [38] dominates the uncertainties in the track reconstruction inefficiencies. Also the effect of falsely reconstructed tracks (when there is no corresponding charged particle) contributes to all observables.

All uncertainties related to imperfect modelling of the detector are assessed using MC simulations. The data are first unfolded using the nominal MC simulation samples. Then the data are unfolded with MC samples where the parameter of the simulation which is affected by the mismodelling is varied by \(\pm 1\sigma \) of its estimated uncertainty. The average of the up and down shifts is assigned as the corresponding systematic uncertainty.

Since the observables are primarily track-based, the track-related systematic uncertainties dominate the total detector-related uncertainty. These are of the order of 2% regardless of the observable and region. Systematic uncertainties related to the muon reconstruction are a negligible fraction of the overall uncertainty.

Uncertainties due to mismodelling of the background processes are also considered. For the background processes modelled with MC simulations, the electroweak background normalization is varied by \(\pm 5\%\) and the \(t{\bar{t}}\) background normalization by \(\pm 15\%\) (approximately within their theoretical uncertainties [39, 40]) and the effect on the final measurements is estimated. The full effect of including the multijet background or not is taken as an uncertainty. The combined background-related uncertainties form a negligible fraction of the total systematic uncertainty. The dependence of the background uncertainty on \(p_\mathrm {T}^{Z}\) is negligible for this measurement.

An important consideration for these measurements is the modelling of the pile-up, since the MC simulations must correct for contamination from pile-up tracks through the unfolding procedure. When averaging over all simulated events about 13% of the selected tracks which are compatible with the primary vertex originate from pile-up.

A variation in the pile-up reweighting of the MC simulations is included to cover the uncertainty on the ratio between the predicted and measured inelastic cross-section in the fiducial volume defined by \(M_{X} >{13}\hbox { GeV}\) where \(M_{X}\) is the mass of the hadronic system [41]. The value of \(\left<\mu \right>\) assumed in the MC simulations for the unfolding process is varied by \(\pm 9\%\) from the nominal value. This uncertainty in the pile-up modelling is one of the largest sources of systematic uncertainty in the tails of the distributions of \(p_\mathrm {T}\), \(N_{\text {ch}}\), \(\Sigma p_{\text {T}} \), and \(\text {mean}\ p_{\text {T}} \), and for the mean distributions. The uncertainties related to the inaccuracies of the detector and pile-up modelling are combined and referred to as the ‘Detector’ uncertainty in the following figures.

Two additional cross-checks validate the pile-up modelling and the consistency of removing the pile-up effects via the unfolding technique. First, the unfolding procedure for all observables in all measurement bins is repeated for three intervals of \(\left<\mu \right> \), namely [8–10], [11–13] and [14–16]. A mismodelling of pile-up in MC simulations would manifest itself less in the interval of \(8\le \left<\mu \right>\le 10\) and more in the interval of \(14\le \left<\mu \right> \le 16\). The unfolded results for the three intervals are found to be fully compatible within their associated statistical uncertainties, confirming the consistency of the handling of pile-up in the unfolding process.

Secondly, a complementary data-driven technique based on the Hit Backspace Once More (HBOM) method  [42] is used. The intention is to reproduce pile-up contaminations as realistically as possible. Hence, the track information associated with non-primary vertices in the data is bundled to form a pile-up library. A random sample is drawn from this library and used as an example of pile-up effects in data. If this random sample is added to an individual event, the pile-up effect increases. A sampling of the library is subsequently used to pollute events with additional pile-up. Six iterations of pollution are applied, i.e. up to six random samples from the pile-up library are added to each event. Then the observables are constructed from these additionally contaminated events. Assuming the values of the observables evolve smoothly with each iteration of additional pile-up, an extrapolation in each bin to the value with zero pile-up vertices yields the HBOM estimate of pile-up subtracted data. The data are subsequently unfolded using a version of the Powheg+Pythia signal MC samples without pile-up vertices. The results obtained using this method are consistent with the nominal procedure, and no additional uncertainty is assigned.

The uncertainty associated with the unfolding technique is evaluated using a data-driven method. It accounts for the dependence of the unfolding on the usage of prior knowledge from the MC simulation, i.e. the particle level quantities. The ratio of data to simulation at detector-level is evaluated and smoothed for each observable. The smoothed ratio is then used to reweight the simulations by applying the event-weight according to the particle level quantity. The reweighted detector-level distribution is then unfolded using the regular response matrix. The relative difference between the reweighted particle-level distribution and the reweighted and unfolded detector-level distribution is treated as a systematic uncertainty. This dependence on prior knowledge from the MC simulation is the dominant systematic uncertainty in most distributions at lower values of \(p_\mathrm {T}^{Z}\). An additional method of estimating the uncertainty related to the unfolding is to unfold the detector-level MC distributions generated with Sherpa using the unfolding matrices based on the Powheg+Pythia MC sample. The results are compared with the particle level quantities predicted by Sherpa. After taking the uncertainty due to the MC prior into account, a slight discrepancy between the unfolded Sherpa sample and the particle-level distributions remains. Therefore, an additional contribution to the MC prior uncertainty is introduced to cover this remaining non-closure of the unfolded result and the Sherpa generator level. In general, it does not exceed the 2–4% level and is smoothed over the full range of the observable. In a few cases, this non-closure component dominates the MC prior uncertainty. These two separate unfolding uncertainties are added in quadrature in all figures.

All sources of systematic uncertainty are considered uncorrelated and are combined in quadrature. The MC prior uncertainty is one of the largest contributors to the total systematic uncertainty at all values of \(p_\mathrm {T}\) and in each \(p_\mathrm {T}^{Z}\) region. The statistical uncertainty of the data rises with increasing \(p_\mathrm {T}^{Z}\), contributing a significant fraction of the overall uncertainty. The breakdown of the individual sources of uncertainties for the four observables, \(p_\mathrm {T}\), \(N_{\text {ch}}\), \(\Sigma p_{\text {T}} \), and \(\text {mean}\ p_{\text {T}} \) is illustrated in Fig. 2 for the example of events with 10 < \(p_\mathrm {T}^{Z}\) < 20 \(\text {Ge}\text {V}\) in the trans-min region (the region most sensitive to the UE), inclusively in \(T_{\perp }\).

Figure 3 shows the systematic uncertainties in the arithmetic mean of the \(N_{\text {ch}}\) and \(\Sigma p_{\text {T}} \) spectra in the trans-min region as a function of \(p_\mathrm {T}^{Z}\) inclusively in \(T_{\perp }\). The largest contributions to the total systematic uncertainties of the mean distributions at all \(p_\mathrm {T}^{Z}\) values come from either the MC prior uncertainty or the track-related uncertainties. The statistical uncertainties of the data become large for \(p_\mathrm {T}^{Z}\) greater than around 200 \(\text {Ge}\text {V}\).

Fig. 2
figure 2

Breakdown of systematic uncertainties in the \(p_\mathrm {T}\) spectrum (upper left), the charged-particle multiplicity (\(N_{\text {ch}}\), upper right), the scalar sum of the transverse momenta (\(\Sigma p_{\text {T}} \), lower left) and the mean transverse momentum (\(\text {mean}\ p_{\text {T}} \), lower right) for events with 10 < \(p_\mathrm {T}^{Z}\) < 20 \(\text {Ge}\text {V}\) in the trans-min region inclusively in \(T_{\perp }\). Here ‘Prior’ combines the two approaches to estimate the unfolding-related uncertainties. ‘Detector’ includes the modelling of the detector and the pile-up conditions

Fig. 3
figure 3

A summary of the systematic uncertainties in the arithmetic mean of the \(N_{\text {ch}}\) and \(\Sigma p_{\text {T}} \) spectra in the trans-min region as a function of \(p_\mathrm {T}^{Z}\). Here ‘Prior’ combines the two approaches to estimate the unfolding-related uncertainties. ‘Detector’ includes the modelling of the detector and the pile-up conditions

7 Unfolded observables and comparison with model predictions

7.1 Overview of the results

Distributions of \(p_\mathrm {T}\), \(N_{\text {ch}}\), \(\Sigma p_{\text {T}} \), and \(\text {mean}\ p_{\text {T}} \) are obtained in slices of \(p_\mathrm {T}^{Z}\) for the different regions defined in the transverse plane and different regions of \(T_{\perp }\). The results for \(N_{\text {ch}}\) and \(\Sigma p_{\text {T}} \) are normalized relative to the area of the region in \(\eta \) and \(\phi \). In addition to the measurements in slices of \(p_\mathrm {T}^{Z}\), the arithmetic means of \(N_{\text {ch}}\), \(\Sigma p_{\text {T}} \), and \(\text {mean}\ p_{\text {T}} \) (\(\langle N_{\text {ch}}{}\rangle \), \(\langle \Sigma p_{\text {T}} \rangle \), and \(\langle \text {mean}\ p_{\text {T}} \rangle \)) are measured as a function of \(p_\mathrm {T}^{Z}\). Only a selection of the most relevant results is discussed in this section: the comparison of the unfolded data to the predictions of different MC generators focuses on the trans-min region. While the toward region provides insights of similar importance for tuning MC generators after having removed the two muons, the discussion focuses on the trans-min region to better facilitate comparison with previous measurements. The UE activity in the toward region is higher compared with that in trans-min. This is expected since the trans-min region is defined as the subregion of the transverse region with the lower activity and for \({ Z} \rightarrow \mu \mu \) events the UE activity is expected to be of similar magnitude in the toward and transverse regions. The trans-min region is statistically less affected by radiation and it is essentially the region where the contribution from ISR is subtracted. Apart from this difference in the amount of activity, the predictive performance of the different MC generators is comparable in the toward and trans-min regions. No significant difference in the predictive power between these regions is observed. Both \(\langle N_{\text {ch}}{}\rangle \) and \(\langle \Sigma p_{\text {T}} \rangle \) measured in the trans-min are compared with previous measurements of the UE in Z boson events at lower centre-of-mass energies.

7.2 Differential distributions

Figures 4 and 5 show the unfolded \(p_\mathrm {T}\) spectrum, \(N_{\text {ch}}\), \(\Sigma p_{\text {T}} \), and \(\text {mean}\ p_{\text {T}} \) for the trans-min region inclusively in \(T_{\perp }\) for events with \(p_\mathrm {T}^{Z}\) between 10 and 20 \(\text {Ge}\text {V}\) and between 120 and 200 \(\text {Ge}\text {V}\). The predictions from Powheg+Pythia, Sherpa, and Herwig++ are compared with the data. The ratio of prediction to data is shown beneath each plot. None of the tested MC generators describes all aspects of the data well and in some regions the differences exceed the 70% level. Generally, the MC generators predict a higher number of particles with small \(p_\mathrm {T}\) than is observed in data (see top left of Figs. 4, 5). This is consistent with the MC predictions tending to lower values of \(\text {mean}\ p_{\text {T}} \), as is shown on the lower right plots of Figs. 4 and 5. The largest differences between data and simulation are at low \(N_{\text {ch}}\) and low \(\Sigma p_{\text {T}} \), and arise due to the steeper transverse momentum spectrum of charged particles in MC simulations. Powheg+Pythia and Sherpa predict a higher fraction of events with fewer charged particles and a consistently smaller sum of \(p_\mathrm {T}\). However, Herwig++ slightly overestimates the fraction of particles with \(p_\mathrm {T}\) > 2.5 \(\text {Ge}\text {V}\) and is qualitatively closer to the shape of the distributions of \(N_{\text {ch}}\) and \(\Sigma p_{\text {T}} \). With rising \(p_\mathrm {T}^{Z}\), the data \(p_\mathrm {T}\) spectrum becomes harder, and \(N_{\text {ch}}\), \(\Sigma p_{\text {T}} \), and \(\text {mean}\ p_{\text {T}} \) increase. The relative discrepancy remains the same in comparisons with the generator predictions.

Fig. 4
figure 4

Measured spectra of \(p_\mathrm {T}\) (upper left), the charged-particle multiplicity, \(N_{\text {ch}}\) (upper right), the scalar sum of the transverse momentum of those particles, \(\Sigma p_{\text {T}} \), (lower left) and the mean transverse momentum, \(\text {mean}\ p_{\text {T}} \) (lower right) in the trans-min region inclusively in \(T_{\perp }\) for events with 10 < \(p_\mathrm {T}^{Z}\) < 20 \(\text {Ge}\text {V}\). Predictions of Powheg+Pythia, Sherpa. and Herwig++ are compared with the data. The ratios shown are predictions over data

Fig. 5
figure 5

Measured \(p_\mathrm {T}\) spectra (upper left), the charged-particle multiplicity \(N_{\text {ch}}\) (upper right), the scalar sum of the transverse momentum of those particles \(\Sigma p_{\text {T}} \) (lower left), and the mean transverse momentum, \(\text {mean}\ p_{\text {T}} \) (lower right) in the trans-min region inclusively in \(T_{\perp }\) for events with 120 < \(p_\mathrm {T}^{Z}\) < 200 \(\text {Ge}\text {V}\). Predictions of Powheg+Pythia, Sherpa, and Herwig++ are compared with the data. The ratios shown are predictions over data

The dependence on \(T_{\perp }\) is illustrated in Fig. 6 for the unfolded \(p_\mathrm {T}\) spectrum in the trans-min region for events with 10 < \(p_\mathrm {T}^{Z}\) < 20 \(\text {Ge}\text {V}\) and 120 < \(p_\mathrm {T}^{Z}\) < 200 \(\text {Ge}\text {V}\). Similar to the results for the measurement inclusive in \(T_{\perp }\), the MC generators predict a higher fraction of particles with low \(p_\mathrm {T}\) than present in data. The predictions of Powheg+Pythia are closer to the measured distributions in the lower \(p_\mathrm {T}^{Z}\) region, but Sherpa describes better the full \(p_\mathrm {T}\) range in the higher \(p_\mathrm {T}^{Z}\) bin. The Herwig++ simulations have significant statistical fluctuations at higher \(p_\mathrm {T}\). The most striking difference between the different regions in \(T_{\perp }\) is observed for the Powheg+Pythia generator when focusing on the low \(p_\mathrm {T}^{Z}\) bins for \(N_{\text {ch}}\) as presented in Fig. 7. In MPI-sensitive regions (left plot in Fig. 7) the distribution of \(N_{\text {ch}}\) by Powheg+Pythia is shifted towards higher numbers of charged-particles relative to the data, i.e. overshooting the data in the range \(1\le N_{\text {ch}}/\delta \eta \delta \phi \le 2.5\). But in the high thrust region (right plot) the MC generator underestimates the data almost over the full range except for the first two bins. In contrast, the performances of Sherpa and Herwig++ are consistent when comparing the low and high thrust regions for \(N_{\text {ch}}\); Herwig++ overestimates \(N_{\text {ch}}\), and Sherpa underestimates it. The same effect is observed for the distributions of \(\Sigma p_{\text {T}} \) but is less significant and therefore not presented. As pointed out in Ref. [8], the regions of high values of \(T_{\perp }\) are dominated by extra jet activity which is not adequately modelled in Powheg+Pythia, as shown in the right plots in Figs. 6 and 7.

Fig. 6
figure 6

Measured \(p_\mathrm {T}\) spectra in the trans-min region for \(T_{\perp }<0.75\) (left) and \(0.75\le T_{\perp }\) (right) for events with 10 < \(p_\mathrm {T}^{Z}\) < 20 \(\text {Ge}\text {V}\) (upper row) and 120 < \(p_\mathrm {T}^{Z}\) < 200 \(\text {Ge}\text {V}\) (lower row). Predictions of Powheg+Pythia, Sherpa, and Herwig++ are compared with the data. The ratios shown are predictions over data

Fig. 7
figure 7

Measured number of charged particles in the trans-min region for \(T_{\perp }<0.75\) (left) and \(0.75\le T_{\perp }\) (right) for events with 10 < \(p_\mathrm {T}^{Z}\) < 20 \(\text {Ge}\text {V}\). Predictions of Powheg+Pythia, Sherpa, and Herwig++ are compared with the data. The ratios shown are predictions over data

7.3 Underyling event activity as a function of \(p_\mathrm {T}^{Z}\)

Figure 8 shows the mean number of charged particles and the mean of the scalar sum of the transverse momenta of those particles per unit \(\eta \)\(\phi \) space as a function of \(p_\mathrm {T}^{Z}\) in the transverse, trans-min, and trans-max regions inclusively in \(T_{\perp }\). The trans-min region is further separated by \(T_{\perp }\) in the right plots of Fig. 8. In the trans-min region, the UE-sensitive variables \(N_{\text {ch}}\) and \(\Sigma p_{\text {T}} \) rise slowly with increasing Z boson transverse momentum. In contrast, the observables in the trans-max region have a strong dependence on \(p_\mathrm {T}^{Z}\). This is because it is heavily contaminated with the Z boson hadronic recoil leaking into the transverse region. The slope of the UE activity in the trans-min region as a function of \(p_\mathrm {T}^{Z}\) for events of high \(T_{\perp }\) is similar to the inclusive measurement. The total amount of activity measured in the trans-min region for events with high \(T_{\perp }\) is lower than the inclusive measurement due to the correlation of activity in the transverse region and \(T_{\perp }\). Furthermore, the right-hand plots of Fig. 8 demonstrate that the UE activity is higher for events with lower \(T_{\perp }\), as expected [8]. Lower values of \(T_{\perp }\) also increase the dependence on \(p_\mathrm {T}^{Z}\) in the trans-min region.

Fig. 8
figure 8

The mean number of charged particles (upper row) and the mean of the scalar sum of the transverse momentum of those particles (lower row) per unit \(\eta \)\(\phi \) space as a function of \(p_\mathrm {T}^{Z}\) in the full transverse region and for the trans-min and trans-max regions inclusively in \(T_{\perp }\) (left) and in the trans-min region separated in \(T_{\perp }\) (right)

The MC modelling of individual measurements in all 96 phase-space regions is further investigated by comparing the measured arithmetic means of the \(N_{\text {ch}}\), \(\Sigma p_{\text {T}} \), and \(\text {mean}\ p_{\text {T}} \) as functions of \(p_\mathrm {T}^{Z}\). Figures 9 and 10 show comparisons with the predictions of Powheg+Pythia, Sherpa, and Herwig++ for the trans-min and towards regions inclusively in \(T_{\perp }\). The predictions fail to describe the data in either of the regimes. For \(p_\mathrm {T}^{Z}\) > 20 \(\text {Ge}\text {V}\), Herwig++ predicts a slower rise in UE activity with rising \(p_\mathrm {T}^{Z}\) than in the measured distributions. On the other hand, Powheg+Pythia and Sherpa qualitatively describe the ‘turn-on’ effect of the UE activity, i.e. a steeper slope at low \(p_\mathrm {T}^{Z}\) which vanishes at higher values of \(p_\mathrm {T}^{Z}\). For Powheg+Pythia, the rise of the UE activity is underestimated, and hence the discrepancy with data grows with \(p_\mathrm {T}^{Z}\) and stabilizes around \(p_\mathrm {T}^{Z}\) = 100 \(\text {Ge}\text {V}\). Only in the toward region of the mean of the \(\text {mean}\ p_{\text {T}} \)  is Sherpa in good agreement with the data.

Fig. 9
figure 9

Comparison of measured arithmetic means of the \(N_{\text {ch}}\) (upper row) and \(\Sigma p_{\text {T}} \) (lower row) as functions of \(p_\mathrm {T}^{Z}\) for the trans-min (left) and towards (right) region inclusively in \(T_{\perp }\). Predictions of Powheg+Pythia, Sherpa and Herwig++ are compared with the data. The ratios shown are predictions over data

The \(p_\mathrm {T}^{Z}\) dependence for the two regions of \(T_{\perp }\) in the trans-min region is summarized in Figs. 11 and 12. In the low \(T_{\perp }\) region, the prediction by Sherpa improves, e.g. for \(N_{\text {ch}}\) the discrepancy shrinks from about 30% to roughly 10%. Referring to the same observable, Powheg+Pythia is in agreement with data for \(p_\mathrm {T}^{Z}\) > 80 \(\text {Ge}\text {V}\) in the low \(T_{\perp }\) regime within the uncertainties. For the selection on high \(T_{\perp }\) all generators underestimate the UE activity. Sherpa provides the best description of the data in \(\langle \text {mean}\ p_{\text {T}} {}\rangle \). Apart from the toward region, it tends to a constant underestimation but agrees with the overall shape. The agreement of Powheg+Pythia with data is better for \(T_{\perp }\) < 0.75 than for the inclusive measurement. The predictions of Herwig++ in the trans-min region improve with higher values of \(p_\mathrm {T}^{Z}\) and also in events of lower \(T_{\perp }\). However, the discrepancy between Herwig++ and the data in the lowest bins remains regardless of the selected region.

Fig. 10
figure 10

Comparison of measured arithmetic means of \(\text {mean}\ p_{\text {T}} \) as functions of \(p_\mathrm {T}^{Z}\) for the trans-min (left) and towards (right) regions inclusively, and in regions of \(T_{\perp }\). Predictions of Powheg+Pythia, Sherpa, and Herwig++ are compared with the data. The ratios shown are predictions over data

Fig. 11
figure 11

Comparison of measured arithmetic means of the \(N_{\text {ch}}\) (upper row) and \(\Sigma p_{\text {T}} \) (lower row) as functions of \(p_\mathrm {T}^{Z}\) for \(T_{\perp }{}<0.75\) (left) and \(0.75\le T_{\perp }{}\) (right) for the trans-min region. Predictions of Powheg+Pythia, Sherpa, and Herwig++ are compared with the data. The ratios shown are predictions over data

Fig. 12
figure 12

Comparison of the measured arithmetic mean of \(\text {mean}\ p_{\text {T}} \) as a function of \(p_\mathrm {T}^{Z}\) for ranges of \(T_{\perp }\) in the trans-min region. Predictions of Powheg+Pythia, Sherpa, and Herwig++ are compared with the data. The ratios shown are predictions over data

7.4 Comparison with other centre-of-mass energies

Figure 13 presents a comparison of the measured \(\langle N_{\text {ch}}{}\rangle \) and \(\langle \Sigma p_{\text {T}} \rangle \) for different centre-of-mass energies. The results for \(\sqrt{s}={7}~\hbox {TeV}\) are taken from the previous ATLAS measurement of the UE activity in Z boson events [2]. The event selection criteria are similar to the analysis presented in this paper, but the previous measurement also includes the \(Z\rightarrow e^{+}e^{-}\) channel. The CDF measurements at \(\sqrt{s}={1.96}~\text {TeV}\) [43] are also included in the comparison. The CDF analyses used Drell–Yan lepton pairs in a smaller invariant mass window (\(70<m_{\mu \mu }<110\) \(\text {Ge}\text {V}\)) in \(p\bar{p}\) collisions. The relative uncertainties of the two ATLAS measurements are of similar sizes, while the CDF measurements have large statistical fluctuations for \(p_\mathrm {T}^{Z/\mu \mu }> 30\,\text {Ge}\text {V}\). All three measurements show qualitatively the same behaviour, i.e. a growing UE activity with higher values of \(p_\mathrm {T}^{Z}\). With higher centre-of-mass energies, more energy is available for the processes forming the UE  e.g. MPI. Hence, the rise of the UE activity as a function of \(\sqrt{s}\) is expected.

Fig. 13
figure 13

The distributions of \(\langle N_{\text {ch}}{}\rangle \) and \(\langle \Sigma p_{\text {T}} \rangle \) measured at \(\sqrt{s}{}={13}\hbox { TeV}\) compared with the results of the previous ATLAS measurements at \(\sqrt{s}{}={7}\hbox { TeV}\) [2] and the CDF measurements at \(\sqrt{s}\)=1.96 Tev [43]. The error bars correspond to the full uncertainties of the corresponding measurement

8 Discussion and conclusion

Measurements of four observables sensitive to the activity of the UE in \({ Z} \rightarrow \mu \mu \) events are presented using 3.2 fb\(^{-1}\) of \(\sqrt{s}\) = 13 \(\text {Te}\text {V}\) pp collision data collected with the ATLAS detector at the LHC in 2015. Those observables are the \(p_{\text {T}}\) of charged particles, the number of charged particles per event (\(N_{\text {ch}}\)), the sum of charged-particle \(p_{\text {T}}\) per event (\(\Sigma p_{\text {T}} \)), and the mean of charged-particle \(p_{\text {T}}\) per event (\(\text {mean}\ p_{\text {T}} \)). They are measured in intervals of the Z boson \(p_{\text {T}}\) and in different azimuthal regions of the detector relative to the Z boson direction. The arithmetic means of the distributions are plotted as functions of the Z boson \(p_{\text {T}}\) , inclusively of and in regions of transverse thrust.

The predictions from three Monte Carlo generators (Powheg+Pythia8, Sherpa and Herwig++ ) are compared with the data. In general, all tested generators and tunes show significant deviations from the data distributions regardless of the observable. The arithmetic means of the observables deduced from the predictions of Powheg+Pythia8 and Sherpa match the main features of the UE activity in the fiducial region. The turn-on effect, i.e. the rising activity as a function of the hard-scatter scale (here \(p_\mathrm {T}^{Z}\)), is visible as is a saturation of this effect for higher values of \(p_\mathrm {T}^{Z}\). In contrast to the other generators, Herwig++ fails to reproduce the turn-on effect at low \(p_\mathrm {T}^{Z}\) as it predicts that the UE activity decreases as a function of \(p_\mathrm {T}^{Z}\) when considered only in the \(p_\mathrm {T}^{Z}< 20\,\text {Ge}\text {V}{}\) region. Otherwise, all generators underestimate the activity of the UE when quantified as the arithmetic mean of the observables for inclusive \(T_{\perp }\). The generators predict the mean values better in comparison with the data when focusing on the MPI-sensitive regions. Powheg+Pythia8 is in agreement with data within the uncertainties for \(\langle N_{\text {ch}}{}\rangle \) and \(\langle \Sigma p_{\text {T}} \rangle \), indicating an adequate handling of the MPI activity. However, since the predictive power shrinks for the region with \(T_{\perp} \geq 0.75\) in comparison with the inclusive measurement, the simulation of contributions other than MPI to the UE activity needs to be improved. Reference [8] points out that the region with \(T_{\perp }>0.75\) is dominated by extra jet activity, giving a first indication for a possible improvement of the MC generator prediction. This conclusion is valid when focusing on Powheg+Pythia8 for different regions of \(T_{\perp }\) for individual bins of \(p_\mathrm {T}^{Z}\).

In comparison with the measurements at \(\sqrt{s}\) = 7 \(\text {Te}\text {V}\) [2], the performance of Herwig++ is consistent for \(p_\mathrm {T}^{Z}\) > 20 \(\text {Ge}\text {V}\). Both measurements use the energy-extrapolation tunes [24] provided by the Herwig++ authors, i.e. UE-EE-3 for \(\sqrt{s}={7}~\hbox {TeV}\) and in the analysis presented here UE-EE-5. The latter tune was additionally validated against Tevatron and LHC measurements at \(\sqrt{s}={900}\hbox { GeV}\) and \(\sqrt{s}={7}~\hbox {TeV}\) [44]. The prediction of Herwig++ is slightly better for the distributions of \(\langle N_{\text {ch}}{}\rangle \) and \(\langle \Sigma p_{\text {T}} \rangle \) at higher values of \(p_\mathrm {T}^{Z}\). In the previous measurements, the divergence increased with \(p_\mathrm {T}^{Z}\), which might be related to improper modelling of the impact parameter. Apart from overestimating the mean activity, Herwig++ improved relative to the \(\sqrt{s}={7}~\hbox {TeV}\) measurements in the description of the shape of \(\text {d}N_{\text {ev}}/\text {d}(\Sigma p_{\text {T}}/\delta \eta \delta \phi )\), \(\text {d}N_{\text {ev}}/\text {d}(\text {mean}\ p_{\text {T}})\), and \(\text {d}N_{\text {ev}}/\text {d}(N_{\text {ch}}/\delta \eta \delta \phi )\) in the presented \(p_\mathrm {T}^{Z}\)-bins. Qualitatively it performs better than the other generators.

Powheg+Pythia8 performs as well at \(\sqrt{s}={13}~\text {TeV}\) as it does at \(\sqrt{s}={7}~\hbox {TeV}\), but is tuned with AU2 (only the MPI part was tuned by ATLAS using \(\sqrt{s}={7}~\hbox {TeV}\) UE data) in the previous measurements. Nevertheless, this indicates that the MPI energy extrapolation of Pythia8 works well, which is in agreement with the better description for distributions at low \(T_{\perp }\).

In contrast, while at \(\sqrt{s}={7}~\hbox {TeV}\) Sherpa version 1.4.0 with the CT10 PDF set consistently overestimates the UE activity metrics \(\langle N_{\text {ch}}{}\rangle \) and \(\langle \Sigma p_{\text {T}} \rangle \) by 5% to 15%, the present analysis and Sherpa version reveal a continuous underestimation. At \(\sqrt{s}={13}~\text {TeV}\), the discrepancy relative to the data decreases with higher values of \(p_\mathrm {T}^{Z}\).