CMS pythia 8 colour reconnection tunes based on underlying-event data

New sets of parameter tunes for two of the colour reconnection models, quantum chromodynamics-inspired and gluon-move, implemented in the pythia 8 event generator, are obtained based on the default CMS pythia 8 underlying-event tune, CP5. Measurements sensitive to the underlying event performed by the CMS experiment at centre-of-mass energies \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s}=7$$\end{document}s=7 and 13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {Te\hspace{-.08em}V}$$\end{document}TeV, and by the CDF experiment at 1.96\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {Te\hspace{-.08em}V}$$\end{document}TeV are used to constrain the parameters of colour reconnection models and multiple-parton interactions simultaneously. The new colour reconnection tunes are compared with various measurements at 1.96, 7, 8, and 13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {Te\hspace{-.08em}V}$$\end{document}TeV including measurements of the underlying-event, strange-particle multiplicities, jet substructure observables, jet shapes, and colour flow in top quark pair (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\text {t}} {}{\bar{\text {t}}}}$$\end{document}tt¯) events. The new tunes are also used to estimate the uncertainty related to colour reconnection modelling in the top quark mass measurement using the decay products of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\text {t}} {}{\bar{\text {t}}}}$$\end{document}tt¯ events in the semileptonic channel at 13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {Te\hspace{-.08em}V}$$\end{document}TeV.


Introduction
Monte Carlo (MC) event generators, such as PYTHIA 8 [1], are indispensable tools for measurements at the LHC proton-proton (pp) collider. To provide an accurate description of highenergy collisions, both the hard scattering and the so-called underlying event (UE) are computed for each simulated event. In the hard scattering process, two initial partons interact with a large exchange of transverse momentum, p T > O(GeV) (we use natural units with c = 1 throughout the paper). The UE represents additional activity occurring at lower energy scales that accompany the hard scattering. It consists of multiple-parton interactions (MPIs), initial-and final-state radiation (ISR and FSR), and beam-beam remnants (BBR). According to Quantum Chromodynamics (QCD), strong interactions are affected by colour charges that are carried by quarks and gluons. All of the coloured partons produced by these components are finally combined to form colourless hadrons through the hadronisation process.
Particularly relevant for the characterisation of the UE are the MPI, which consist of additional 2-to-2 parton-parton interactions occurring within the single collision event. With increasing collision energy, the interaction probability for partons with small longitudinal momentum fractions also increases, which enhances MPI contributions.
The PYTHIA 8 generator regularises the cross sections of the primary hard scattering processes and MPIs with respect to the perturbative 2-to-2 parton-parton differential cross section through an energy-dependent dampening parameter p T0 , which depends on the centre-of-mass energy √ s. The energy dependence of the p T0 parameter in PYTHIA 8 is described with a power law function of the form where p ref T0 is the value of p T0 at a reference energy √ s 0 , and is a tunable parameter that determines the energy dependence. At a given √ s, the mean number of additional interactions from MPI depends on p T0 , the parton distribution functions (PDFs), and the overlap of the matter distributions of the two colliding hadrons [2].
To track the colour information during the development of the parton shower, partons are represented and also connected by colour lines. Quarks and antiquarks are represented by colour lines with arrows pointing in the direction of the colour flow, and gluons are represented by a pair of colour lines with opposite arrows. Rules for colour propagation are shown in Fig. 1. Because each MPI system adds coloured partons to the final state, a dense net of colour lines that overlap with the coloured parton fields of the hard scattering and with each other is created. Parton shower algorithms, in general, use the leading colour (LC) approximation [3,4] in which each successively emitted parton is colour connected only to its parent emitters in the limit of infinite number of colours. Colour reconnection (CR) models allow colour lines to be formed between partons also from different interactions and thus allow different colour topologies compared with a simple LC approach. The CR was first included in minimum-bias (MB) simulations (see Sec. 4.1) to reproduce the increase of average transverse momentum p T of charged particles as a function of the measured multiplicity of the charged particles, N ch , and also to describe the dN ch /dη distribution [6,7]. The pseudorapidity is defined as η = − ln[tan(θ/2)], where the polar angle θ is defined with respect to the anticlockwise-beam direction. Introducing correlations between partons, including those also resulting from MPIs, generally changes the number of charged particles in an event and allows a more realistic simulation of N ch , and p T vs N ch distributions than in an event scenario without CR [7].
The CR effects are also important for processes occurring at larger scales in pp collisions. For example, in tt events, the inclusion of CR effects can lead to a significant improvement in the description of UE variables [8]. The effects of CR may become more prominent in precision measurements, such as the top quark mass m t . Uncertainties in m t related to CR are usually estimated from comparing the prediction of a given model with and without CR, which might underestimate their effect [9]. A better way to approach the uncertainty estimation would be to consider a variety of CR models and variations of their parameters [10] that probe the effects of the underlying soft physics of pp collisions on the relevant observable.
Various phenomenological models for CR have been developed and are included in PYTHIA 8. In these models, the general idea is to determine the partonic configuration that reproduces the minimal total string length. In the Lund string fragmentation model [11] used in PYTHIA 8, the confining colour field between two partons is approximated by a one-dimensional string stretched between the partons according to the colour flow. The fragmentation of a string with a probability given by the fragmentation function produces a set of hadrons. Thus, the colour flow of an event determines the string configuration and therefore hadronic production.
None of the MPI processes or the CR models are completely determined from first principles, and they all include free parameters. A specified set of such parameters that is adjusted to better fit some aspects of the data is referred to as a "tune". It is possible to derive a tune that describes the data at a particular √ s. However, such a model, without energy dependence, will be biased and cannot provide any reliable information about other √ s. Thus, whenever the collision energy ( √ s) has changed, additional constraints on the models must be applied using the information obtained from the new measurements. This is not a straightforward procedure since no single tune can describe all the data with the same precision. The default CMS PYTHIA 8 tune CUETP8M1 for 7 TeV [12] was derived using the inputs from the 0.9, 1.96 and 7 TeV measurements, and it describes the data at 7 TeV quite well. The default CMS PYTHIA 8 tune CP5, where CP stands for "CMS PYTHIA 8" for 13 TeV [13] was derived using the inputs from the 1.96, 7 and 13 TeV measurements. The CUETP8M1 describes data at 7 TeV better than CP5, but the overall performance of CP5 is much better than CUETP8M1 when 13 TeV data are also included.
This paper presents results from two tunes, which make use of the QCD-inspired [14] and the gluon-move [9] CR models. The new CR tunes presented are based on the default CMS PYTHIA 8 tune CP5. Along with the CP5 tune, which is derived from the MPI-based CR model, the performance of the new CR tunes (CP5-CR1 and CP5-CR2 defined below) is studied using several observables. These tunes can be used for the evaluation of the uncertainties due to CR effects, and deepening the understanding of the CR mechanism.
The paper is organised as follows. In Section 2, the different colour reconnection models implemented in PYTHIA 8 and used in this study are introduced. In Section 3, the tuning strategy is explained in detail and the parameters of the new tunes are presented. Section 4 shows a selection of validation plots related to observables measured at √ s = 1.96, 7, 8, and 13 TeV by various experiments compared with the predictions of the new tunes. In Section 5 a study of the uncertainty in the top quark mass m t measurement because of the CR modelling is presented before summarising the results in Section 6.

Colour reconnection models
The MPI-based CR model was the only CR model implemented in PYTHIA 8 until PYTHIA 8.2, which was released with two additional CR models. The models implemented in PYTHIA 8.2, referred to as the "MPI-based", "QCD-inspired", and "gluon-move" CR models, are briefly described in the following: • MPI-based model (CP5): The simplest model [6,15] implemented in MC event generators introduces only one tunable parameter. In this model, the partons are classified according to the MPI system to which they belong. Each parton interaction is originally a 2 → 2 scattering. For an MPI system with a hardness scale p T of the 2 → 2 interaction, a CR probability is defined as: with p T Rec = rp T0 , where r is a tunable parameter and p T0 is the energy-dependent dampening parameter defined in Eq. (1). The parameter p T0 avoids a divergence of the partonic cross section at low p T . According to Eq. (2), MPI systems at high p T would tend to escape from the interaction point, without being colour reconnected to the hard scattering system. Colour fields originating from a low-p T MPI system would instead more likely exchange colour. Once the systems to be connected are determined, partons of low-p T systems are added to strings defined by the highest p T system to achieve a minimal total string length.
• QCD-inspired model (CP5-CR1): The QCD-inspired model [14] implemented in PYTHIA 8 adds the QCD colour rules on top of the minimisation of the string length.
The model constructs all pairs of QCD dipoles allowed to be reconnected by QCD colour rules that determine the colour compatibility of two strings. This is done iteratively until none of the allowed reconnection possibilities result in a shortening of the total string length. It uses a simple picture to causally connect the produced strings in spacetime through a string length measure λ to determine favoured reconnections. The default parametrisation for λ is where E 1 and E 2 represent the energies of the coloured partons in the rest frame of the QCD dipole, and m 0 is a constant with the dimension of energy [14]. In addition, the QCD-inspired model allows us to create junction structures. A junction is a topological structure and is formed when three colour lines meet at a single point. The presence of junctions reduces the number of colour lines that need to be connected to the beam remnant, which in turn can affect the number of particles produced in a collision. Since the QCD-inspired CR model allows for different color topologies beyond LC, it can successfully describe the baryon production measured at the CMS experiment [14,16], which is not the case for previously available PYTHIA 8 tunes.
• Gluon-move model (CP5-CR2): In this scheme [9], final-state gluons are identified along with all the colour-connected pairs of partons. Then an iterative process starts. The difference between string lengths when a final-state gluon belonging to two connected partons is moved to another connected two-parton system is calculated. The gluon is moved to the string for which the move gives the largest reduction in total string length. This procedure can be repeated for all or a fraction of the gluons in the final state, which is controlled by the PYTHIA 8 parameter ColourReconnection:fracGluon.
In this scheme, quarks would not be reconnected, i.e. they would remain in the same position without any colour exchange. To improve this picture, the flip mechanism of the gluon-move model can be included. The flip mechanism basically allows reconnection of two different string systems, i.e. a quark can connect to a different antiquark. Junctions (Y-shaped three-quark configurations) are allowed to take part in the flip step as well, but no considerable differences are expected due to the limitation of the junction formation in this model. The flip mechanism has not been extensively studied and its effect on diffractive events is not known. For this reason the flip mechanism is switched off in PYTHIA 8 and not used in this paper. The main free parameters of the gluon-move model account for the lower limit of the string length allowed for colour reconnection, the fraction of gluons allowed to move, and the lower limit of the allowed reduction of the string lengths.
In addition to these models, the effects of early resonance decay (ERD) [9] in top quark decays are also studied. With this option, top quark decay products are allowed to participate directly in CR. Normally the ERD option is switched off in PYTHIA 8 but in Section 4.5 we investigate the ERD effects.
Usually, MPI and CR effects are investigated and constrained using fits to measurements sensitive to the UE in hadron collisions. The UE measurements have been performed at various collision energies by ATLAS, CMS, and CDF Collaborations [17][18][19][20][21]. The measurements are typically performed by studying the multiplicity and the scalar p T sum of the charged particles (p sum T ), measured as a function of the p T of the leading charged particle in the event, p max T . Different regions of the plane transverse to the direction of the beams are defined by the direction of the leading charged particle. A sketch of the different regions is shown in Fig. 2. A "toward" region includes mainly the products of the hard scattering, whereas the "away" region includes the recoiling objects belonging to the hard scattering. The two "transverse" regions contain the products of MPIs and are affected by contributions from ISR and FSR.
In Ref. [17,18,21], the transverse region is further subdivided into "transMIN" and "trans-MAX", defined to be the regions with the minimum and maximum number of particles between the two transverse regions. This is done to disentangle contributions from MPI, ISR, and FSR. For events with large ISR or FSR, the transMAX region contains at least one "transverseside" jet, whereas both the transMAX and transMIN regions contain particles from the MPI and BBR. Thus, the transMIN region is sensitive to MPI and BBR, whereas the difference between transMAX and transMIN (referred to as the transDIFF region) is sensitive to ISR and FSR.
The CMS Collaboration showed that a consistent description of the N ch and the p sum T distributions is not possible using only the PYTHIA 8 hadronisation model without taking into account the CR effects [12]. In general, the largest difference between the predictions from tunes and the data is observed in the soft region (p T ∼ 2-5 GeV), where CR effects are expected to be more relevant.
The new CR models, QCD-inspired and gluon-move, were implemented in PYTHIA 8.226 after tuning the model parameters to the existing data at √ s = 7 TeV and at lower centre-of-mass energies [9,14]. The models were tuned to different data sets starting from different baseline tune settings. The model predictions, with their default parameter settings in PYTHIA 8.226 Figure 2: The schematic description of the result of a typical hadron-hadron collision. The "toward" region contains the "toward-side" jet, whereas the "away" region may contain an "away-side" jet. and CP5, are given in Fig. 3 for N ch and p sum T densities measured by the CMS experiment at 13 TeV [17] in the transMIN and transMAX regions, and in Fig. 4 for the dN ch /dη distribution measured by CMS at 13 TeV [22]. In these figures, the data points, shown in black, are well described by CP5. The predictions for CP5-"QCD-inspired" and CP5-"gluon-move" were obtained by replacing the MPI-based CR model in CP5 with the QCD-inspired and gluon-move CR model, respectively. As mentioned earlier, these models were tuned to data at 7 TeV and at lower centre-of-mass energies. The comparisons show that the models must be retuned to describe the underlying soft physics of pp collisions at 13 TeV.

The new CMS colour reconnection tunes
A new set of event tunes, based on UE data from the CMS and CDF experiments, are derived using the QCD-inspired and the gluon-move CR models, as implemented in the PYTHIA 8.226 event generator. Having tunes for different CR models allows a consistent way of evaluating systematic uncertainties because of colour reconnection effects in specific measurements. The RIVET 2.4.0 [24] routines used as inputs to the fits, as well as the centre-of-mass energy values and the names of the RIVET distributions, the x-axis ranges (fit ranges), and the relative importance (R) of the distributions are displayed in Table 1 for the tunes CP5-CR1 and CP5-CR2. The CP5 tune is used as a baseline for the CR tuning since it is the default PYTHIA 8 tune for most of the new CMS analyses using data at √ s = 13 TeV published since 2017, and it has explicitly been tested against a large number of different final states (MB, QCD, top quark, and vector boson + jets) and observables [13].
The parameters and their ranges in the fits are shown in  transMAX (lower) regions as functions of the p T of the leading charged particle, p max T , measured by the CMS experiment at √ s = 13 TeV [17]. The predictions of the tunes CP5, CP5-"QCD-inspired", and CP5-"gluon-move" using their default parameter settings in Refs. [9,14], are compared with data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data where the model uncertainty is also included. The comparisons show that the models do not describe the data and need to be retuned.

CMS Data CP5
CP5-"QCD-inspired" CP5-"gluon-move"  √ s = 13 TeV [23]. The predictions of the tunes CP5, CP5-"QCD-inspired", and CP5-"gluon-move" using their default parameter settings in Refs. [9,14], are compared with data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data where model uncertainty is also included. The comparisons show that the models need to be retuned in order to have a better agreement with the data.   Tune CP5 uses the next-to-next-to-leading order (NNLO) NNPDF31 nnlo as 0118 [26] PDF set, the strong coupling parameter α S value of 0.118 for ISR, FSR, and MPI, and the MPIbased CR model. It also uses a double-Gaussian functional form with two tunable parameters, coreRadius and coreFraction, to model the overlap of the matter distribution of the two colliding protons [6]. The tune parameters are documented in Ref. [13] and displayed in Table 3. Also in Figs. 11 and 12 in Ref. [13], predictions of the CP5 tune are compared with event shape observables measured at LEP. The results show that a value of α FSR S (m Z )∼0.120 better describes the data compared with higher values of α FSR S (m Z ) which generally overestimates the number of final-state partons. As concluded in the Ref. [13], LEP event shape observables are well described by MADGRAPH5 aMC@NLO + PYTHIA 8 with CP5.
The new tunes are obtained by constraining simultaneously the parameters controlling the contributions of the MPI and of each of the CR models. The strategy followed to obtain the CP5-CR1 and CP5-CR2 tunes is similar to that used for the CP5 tune, i.e. the same observables sensitive to MPI are considered to constrain the parameters. These are the N ch and average p sum T as functions of the leading charged particle transverse momentum p max T , measured in the transMIN and transMAX regions by the CMS experiment at √ s = 13 TeV [17] and 7 TeV [19] and by the CDF experiment at 1.96 TeV [21]. The N ch as a function of η, measured by CMS at √ s = 13 TeV [22] is also used in the fit. In Ref. [17], the transMIN and transMAX regions are defined with respect to both the leading charged particles and the leading charged-particle jets as reference objects. The uncertainty in measurements using leading charged particles as reference objects is lower than the uncertainty in measurements using leading charged-particle jets as reference objects. This is one of the reasons why we choose to use leading charged particle observables instead of leading charged-particle jet observables in the fits. Another reason is that we want to use the same observables that were used to derive CP5, and CP5 was derived using leading charged particle observables in the fits. As a cross-check, we also derived another version of the CP5-CR1 tune using leading charged-particle jet observables, such as N ch and average p sum T , as functions of the transverse momentum of the leading charged-particle jets in the fits. The results showed that the use of leading charged-particle jet observables in the fits makes a very small difference, which is negligible when tune uncertainties are taken into account. As for CP5, the region with p max T between 0.5 and 2.0 or 3.0 GeV is excluded depending on the distribution from the fit, since this region is affected by diffractive processes whose free parameters are not considered in the tuning procedure.
The MPI-related parameters that are kept free in both the CP5-CR1 and CP5-CR2 tunes are: • MultipartonInteractions:pT0Ref, the parameter p ref T0 included in the regularisation of the partonic QCD cross section as described in Eq.(1). It sets the lower cutoff scale for MPIs; • MultipartonInteractions:ecmPow, the exponent of the √ s dependence as shown in Eq.(1); • MultipartonInteractions:coreRadius, the width of the core when a double-Gaussian matter profile is assumed for the overlap distribution between the two colliding protons [6]. A double-Gaussian form identifies an inner, dense part, which is called core, and an outer, less dense part; • MultipartonInteractions:coreFraction, the fraction of quarks and gluons contained in the core when a double-Gaussian matter profile is assumed.
The tunable CR parameters in CP5-CR1 that are considered in the fit are: • ColourReconnection:m0, the variable that determines whether a possible reconnection is actually favoured in the λ measure in Eq.(3); • ColourReconnection:junctionCorrection, the multiplicative correction for junction formation, applied to the m0 parameter; • ColourReconnection:timeDilationPar, the parameter controlling the time dilation that forbids colour reconnection between strings that are not in causal contact.
More details on these parameters are reported in Ref. [1]. For the CP5-CR1 tune, the parameters related to the hadronisation, StringZ:aLund, StringZ:bLund, StringFlav:probQQtoQ, and StringFlav:probStoUD, proposed in Ref. [14], are also used as fixed inputs to the tune. The first two of these parameters govern the longitudinal fragmentation function used in the Lund string model in PYTHIA 8, whereas the latter two are the probability of diquark over quark fragmentation, and the ratio of strange to light quark production, respectively.
For the optimisation of CP5-CR2, the following parameters are considered: • ColourReconnection:m2lambda, an approximate hadronic mass-square scale and the parameter used in the calculation of λ; • ColourReconnection:fracGluon, the probability that a given gluon will be moved. It thus gives the average fraction of gluons being considered.
The remaining parameters of PYTHIA 8 are kept the same as in the CP5 tune.
The fits are performed using the PROFESSOR 1.4.0 software, which takes random values for each parameter in the defined multidimensional parameter space, and RIVET, which provides the data points and uncertainties, and produces the individual generator predictions for the considered observables. About 200 different choices of parameters are considered to build a random grid in the parameter space. For each choice of parameters, one million pp inelastic scattering events, including contributions from single-diffractive dissociation (SD), doublediffractive dissociation (DD), central diffraction (CD), and nondiffractive (ND) processes, are generated. The bin-by-bin envelopes of the different MC predictions are checked. After building the grid in the parameter space, PROFESSOR performs an interpolation of the bin values for the observables in the parameter space using a third-order polynomial function. We verified that the degree of the polynomial used for the interpolation does not affect the tune results significantly. The function f b (p) models the MC response of each bin b of the observable O as a function of the parameter vector p. The final step is the minimisation of the χ * 2 function given by: where R b is the data value for each bin b, and ∆ 2 b expresses the total bin uncertainty of the data. The χ * 2 is not a true χ 2 function as explained in the following. Treating equally all distributions that are used as inputs to the fit for the CP5-CR2 tune results in a tune that describes the data poorly; in particular, it underestimates the dN ch /dη distribution measured in data at √ s = 13 TeV by about 30%. This is because the χ 2 definition treats all bins equally and the importance of dN ch /dη may be lost because of its relatively low precision with respect to other observables. The dN ch /dη distribution is one of the key observables that is sensitive to a number of processes and, therefore, increasing the importance of this observable in the fit is reasonable.
In PROFESSOR, this is done by using weights with a nonstandard χ 2 definition. To keep the standard properties of a χ 2 fit, we increase the total uncertainties of the other distributions. The total uncertainty in each bin is scaled up by 1/ √ R with R (relative importance) values displayed in Table 1. Therefore, the total uncertainty of each bin of p sum T in the transMIN and transMAX regions at √ s = 13 TeV is scaled up by √ 2 and that of all other distributions by √ 10. These scale factors ensure that the distributions are well described after the tuning. For the CP5-CR1 model, a good description of the input observables is obtained without scaling, meaning that all distributions are considered equally important.
The experimental uncertainties used in the fit, in general, have bin-to-bin correlations. However, some of the bins of the UE distributions used in the fit, e.g. p max T > 10 GeV, are dominated by statistical uncertainties, which are uncorrelated between bins. In the minimisation procedure, because the correlations between bins are not available for the input measurements, the experimental uncertainties are assumed to be uncorrelated between data points.
The parameters obtained from the CP5-CR1 and CP5-CR2 fits, as well as the value of the goodness of the fit are shown in Table 3. Uncertainties in the parameters of these tunes are discussed in Appendix B. In Ref. [13], the number of degrees of freedom (N dof ), defined as the sum of the number of bins of fit observables minus the number of fit parameters, for the tune CP5 is given as 63. However, this value of N dof corresponds to the case when only 13 TeV distributions are used. The value of N dof for CP5 consistent with our calculation in this paper is 183. The tune CP5 was derived using two additional distributions in the fits; dN ch /dη at 13 TeV with NSD-enhanced selection and SD-enhanced selection. Since these two observables depend on modelling of single diffraction dissociation, which is not well understood, they are not included in the fits for CP5-CR tunes. Therefore, the N dof values for CP5-CR tunes are lower than the N dof of CP5. The slight difference in the N dof values between the CP5-CR1 and CP5-CR2 tunes is due to the difference in the number of fit parameters used in each tune, which are 7 and 6 respectively. Although the fit ranges for the CP5-CR tunes differ slightly, as shown in Table 1, the sum of number of bins of fit observables is the same for both tunes.
A preliminary version of the CP5-CR2 tune was derived including several jet substructure observables [27][28][29] in the fits. This tune, called CP5-CR2-j, has been used in the MC production in the CMS experiment. The CP5-CR2 and CP5-CR2-j tunes have very similar predictions in all final states discussed in this paper, because the tunes differ slightly only in the following parameters, where the listed values are for CP5-CR2-j: Table 3: The parameters obtained in the fits of the CP5-CR1 and CP5-CR2 tunes, compared with that of the CP5 tune. The upper part of the table displays the fixed input parameters of the tune, whereas the lower part shows the fitted tune parameters. The number of degrees of freedom (N dof ) and the goodness of fit divided by N dof are also shown. PYTHIA  The CP1 and CP2 are the two tunes in the CPX (X = 1-5) tune family [13] that use an LO PDF set [26]. We also derive CR tunes based on the CP1 and CP2 settings to study the effect of using a leading order (LO) PDF set with alternative CR models, although they are not used in precision measurements. We find that the predictions of the CR tunes based on CP1 and CP2 for the MB and UE observables are similar to the predictions of CR tunes based on CP5. However, CP1-CR1 (i.e. CP1 with the QCD-inspired colour reconnection model) has a different trend in particle multiplicity distributions compared with the predictions of other tunes discussed in this study. This different trend of CP1-CR1 cannot be attributed to the use of LO PDF set, because both CP1 and CP2 use the same LO PDF set and we do not see a different trend with CP2-CR1. The different trend observed with CP1-CR1 in the particle multiplicity distributions may become a collective effect rather than a single parameter effect, and could be an input for further tuning and development of the QCD-inspired model. Therefore, in Appendix A of this paper, we present the tune settings of the CR tunes based on CP1 and CP2, along with their predictions in the particle multiplicity distributions.

Performance of the tunes
In Figs. 5-18 we show the observables measured at centre-of-mass energies of 1.96, 7, 8, and 13 TeV. The data points are shown in black, and are compared with simulations obtained from the PYTHIA 8 event generator with the tunes CP5 (red), CP5-CR1 (blue), and CP5-CR2 (green). For simplicity, the tunes CP5-CR1 and CP5-CR2 will be referred to as CP5-CR when convenient. The lower panels show the ratios between each MC prediction and the data.

Underlying-event and minimum-bias observables
MB is a generic term used to describe events collected with a loose selection process that are dominated by relatively soft particles. Although these events generally correspond to inelastic scattering, including ND and SD+DD+CD processes, these contributions may vary depending on the trigger requirements used in the experiments. For example, a sample of non-singlediffractive-enhanced (NSD-enhanced) events is selected by suppressing the SD contribution at the trigger level.
The UE observables measured by the CMS experiment at √ s = 13 TeV [17], namely N ch density and the average p sum T in the transMIN and transMAX regions are well described by all tunes in the plateau region as shown in Fig. 5. The region up to ≈5 GeV of p max T is highly sensitive to diffractive contributions [30]. There is a lack of measurements in this region where the tunes, in general, do not perform well. Although the optimisation of these components is beyond the scope of this study, we have extended the fit range to ≈2-3 GeV as long as the data are well described. The rising part of the spectrum excluding the region up to ≈5 GeV of the N ch density distributions is similarly described by all tunes, whereas in the p sum T density distributions the predictions of CP5 differ slightly from the predictions of the CR tunes. These show that the CP5 tune has a harder p T spectrum at low p max T values. Through tuning the N ch and average p sum T density in the transMIN and transMAX regions, a satisfactory agreement is obtained for the same observables in the transDIFF region as well. Figure 6 shows the pseudorapidity distribution of charged hadrons in inelastic pp collisions measured by the CMS experiment at √ s = 13 TeV [22]. This observable is sensitive to the softer part of the MPI spectrum and well described by all tunes.
A crucial test for the performance of UE tunes, and of the CR simulation in particular, is the description of the average p T of the charged particles as a function of N ch . Comparisons of the mean average p T to the measurements by the ATLAS Collaboration at √ s = 13 TeV in the transMAX and transMIN regions [18] are displayed in Fig. 7. The tune CP5 describes the central values of the data perfectly for N ch > 7, whereas the CR tunes show an almost constant discrepancy of 5-10% because of the harder p T spectrum predicted by the tune CP5 for low-p T particles. All CR tunes show a reasonable agreement with the data, confirming the accuracy of the parameters obtained for the new CR models. The improvement in the tuned CR models and their success in describing the data is seen by comparing Fig. 5 with Fig. 3, and Fig. 6 with Fig. 4. In these figures, CP5 tune predictions are also shown for easier comparison of CR tunes predictions with CP5.
In Fig. 8, charged-particle and p sum T densities measured by the CMS experiment at √ s = 7 TeV [19] in the transMIN and transMAX regions, as functions of p max T , are compared with predictions from the tunes CP5 and CP5-CR. The data are reasonably well described for p max T > 5 GeV.
In Fig. 9, charged particle and p sum T densities in the transverse region, as functions of p max T , and the average p T in the transverse region as functions of p max T and of the N ch , measured by the ATLAS experiment at √ s = 7 TeV [20], are compared with the predictions from the tunes CP5 and CP5-CR. The central values of the average p T in bins of the leading charged particle p T and of the N ch are consistent with the data points within 10%. A similar level of agreement as observed at 13 TeV is achieved by the new tunes at 7 TeV.
The performance of the new tunes is also checked at 7 TeV using inclusive measurements of charged-particle pseudorapidity distributions. In Fig. 10, the CMS measurements for dN ch /dη at 7 TeV [31] with at least one charged particle in |η| < 2.4 are compared with predictions from the tunes CP5 and CP5-CR. The CP5 and CP5-CR1 have similar predictions, while CP5-CR2 predicts about 4% less charged particles than the first two tunes in all η bins of the measurement. Although all tunes provide a reasonable description of dN ch /dη with deviations up to ≈10%, the data and MC simulation show different trends for |η| > 1.2, where the trend for the data is not described well by the tunes. In the more central region, i.e. |η| < 1.2, the shape of the predictions agrees well with the data but there is a difference in normalisation. For example, CP5 and CP5-CR1 predict 3-4% and CP5-CR2 predicts about 7% fewer charged particles in all bins for |η| < 1.2 compared with the data.
In Fig. 11, charged-particle and p sum T densities measured as functions of p max T at √ s = 1.96 TeV by the CDF experiment [21] in the transMIN and transMAX regions are compared with predictions from the tunes CP5 and CP5-CR, respectively. All predictions reproduce the UE observables within ≈10% at √ s = 1.96, 7, and 13 TeV.
We compare the new CMS tunes also with MB and UE data measured at forward pseudorapidities. The energy density, dE/dη, measured in MB events and in NSD events by the CMS experiment at √ s = 13 TeV, is shown in Fig. 12. The data are well described by CP5-CR2 within uncertainties and for all measured |η| bins. The predictions of CP5 and CP5-CR1 overestimate the data in 4.2 < |η| < 4.9.
The events are required to have at least one charged particle in 5.3 < η < 6.5 or −6.5 < η < −5.3 with p T > 0. All tunes describe the data within the uncertainties. Additionally, Fig.13 shows the pseudorapidity of charged particles, dN ch /dη, in 5.3 < |η| < 6.4 in events with at least one charged particle with p T > 40 MeV, measured by the TOTEM experiment at √ s = 7 TeV [33]. Both CP5 and CP5-CR1 describe the data within the uncertainties, whereas CP5-CR2 underestimates the data by 15%.   Charged-hadron multiplicity, √ s = 13 TeV    Figure 7: The mean charged-particle average transverse momentum as functions of chargedparticle multiplicity in the transMAX (left) and transMIN (right) regions, measured by the ATLAS experiment at √ s = 13 TeV [18]. The predictions of the CP5 and CP5-CR tunes are compared with data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data.    Figure 9: The charged-particle (upper left) and p sum T densities (upper right) in the transverse region, as functions of the p T of the leading charged particle, and average transverse momentum in the transverse region as functions of the leading charged particle p T (lower left) and of the charged particle multiplicity (lower right), measured by the ATLAS experiment at √ s = 7 TeV [20]. The predictions of the CP5 and CP5-CR tunes are compared with data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data.  Figure 10: The pseudorapidity of charged particles, dN ch /dη, with at least one charged particle in |η| < 2.4, measured by the CMS experiment at √ s = 7 TeV [31]. The predictions of the CP5 and CP5-CR tunes are compared with data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data.  Figure 11: The charged-particle (left) and p sum T densities (right) in the transMIN (upper) and transMAX (lower) regions, as functions of the p T of the leading charged particle, p max T , measured by the CDF experiment at √ s = 1.96 TeV [21]. The predictions of the CP5 and CP5-CR tunes are compared with data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data.  data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data. For the CMS-TOTEM measurement, at least one charged particle with p T > 0 is required in 5.3 < η < 6.5 or −6.5 < η < −5.3. For the TOTEM measurement, at least one charged particle with p T > 40 MeV is required in 5.3 < |η| < 6.4. Figure 14 shows the strange particle production for Λ baryons and K 0 S mesons as a function of rapidity (y) measured by the CMS experiment [16] in NSD events at √ s = 7 TeV. The rapidity is defined as y = 1 2 ln

Particle multiplicities
, where E is the particle energy and p L is the particle momentum along the anticlockwise-beam direction. It is shown in Ref. [14] that the new CR models might be beneficial for describing the ratios of strange particle multiplicities, for example Λ/K 0 S in pp collisions. We observe that all CP5 tunes, regardless of the CR model, describe particle production for K 0 S mesons as a function of rapidity very well. However, they underestimate particle production for Λ versus rapidity by about 30%. Therefore, the ratio Λ/K 0 S is not perfectly described but this could be improved by different hadronisation models [35,36]. Including these observables, as well as the recent measurements of baryon production from the ALICE and LHCb experiments [37,38], could be beneficial in future tune derivations. This is discussed in Appendix C. The strange particle production, Λ baryons (left) and K 0 S mesons (right), as a function of rapidity, measured by the CMS experiment at √ s = 7 TeV [16]. The predictions of the CP5 and CP5-CR tunes are compared with data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data.
The multiplicities of identified particles are also investigated in simulated MB events (ND+SD+DD+CD). Figure 15 shows the ratio of proton over pion production, as a function of particle p T [39]. All the tunes predict a similar trend, showing that the new CR models do not lead to a significant improvement in the description of the ratio of proton to pion production. However, it is known that this observable is strongly correlated with event particle multiplicity [39][40][41] and not only CR, since also hadronisation and MPI play a key role in describing the ratios of particle yields.

Jet substructure observables
The number of charged particles contained in jets is an important observable that makes it possible to distinguish quark-initiated jets from gluon-initiated jets. The average number of charged hadrons with p T > 500 MeV inside the jets measured by the CMS experiment as a function of the jet p T is shown in Fig. 16 [27]. The predictions of the CR tunes are comparable, and produce roughly 5% fewer charged particles than the CP5 tune. All predictions show a reasonable description of the data.    The predictions of the CP5 and CP5-CR tunes are compared with data. The coloured band and error bars on the data points represent the total experimental uncertainty in the data.

Drell-Yan events
Drell-Yan (DY) events [42,43] with the Z boson decaying to µ + µ − were generated with PYTHIA 8 and compared with CMS data at √ s = 13 TeV. Figure 18 shows the N ch and p T flow as a function of the Z boson p T (in the invariant µ + µ − mass window of 81-101 GeV) in the region transverse to the boson momentum [44], which is expected to be dominated by the UE.
The CP5 tunes predict up to 15% too many charged particles at low Z boson p T , where additional effects, such as the intrinsic transverse momentum of the interacting partons (i.e. primordial k T ) are expected to play a role. Higher-order corrections, as implemented in MAD-GRAPH5 aMC@NLO v2.4.2 [45] with FxFx merging [46], are necessary to describe the total p T flow. The impact of the different CR models is negligible in DY events.

Jet substructure in tt events
A study of the UE in tt events [8] also estimated the effects of the CR on the top quark decay products by investigating the differences between predictions using PYTHIA 8 with the ERD off and on options. In Ref. [8], in addition to the QCD-inspired and gluon-move models, predictions of the rope hadronisation model [47,48] are also compared with the data. In the rope hadronisation model, overlapping strings are treated to act coherently as a "rope". The interactions between overlapping strings are described by an interaction potential inspired by the phenomenology of superconductors [35,36,[47][48][49][50][51][52]. The ERD off and on options allow the CR to take place before or after the top quark decay, respectively. In particular, the ERD option allows the top quark decay products to be colour reconnected with the partons from MPI systems. Ref. [8] showed that these different models and options produce similar predictions for UE observables in tt events. However, some jet-shape distributions in tt events display a more significant effect [53], e.g. in the number of charged particles in jets. In the following, we investigate how the PYTHIA 8 CR tunes describe the CMS tt jet substructure data [53]. In the CMS measurement, jets reconstructed using the anti-k T algorithm [54] with a distance param-  eter of R = 0.4 as implemented in FASTJET 3.1 [55] are used. Jets with p T > 30 GeV within |η| < 2 are selected. Jet pairs (j 1 and j 2 ) are required to be far from each other in η-φ space, ∆R(j 1 , j 2 ) = √ (η j 1 − η j 2 ) 2 − (φ j 1 − φj 2 ) 2 > 0.8. Jet substructure observables are calculated from jet constituents with p T > 1 GeV, e.g. in the plateau region of high track finding efficiency and low misidentification rate. Here we focus on two variables, (i) λ 0 0 (N), which is the number of charged particles with p T > 1 GeV in the jet, and (ii) the separation between two groomed subjets, ∆R g , that are shown in Fig. 19 for gluon jets and inclusive jets, respectively. A "groomed jet" refers to a jet with soft and wide-angle radiation removed by a dedicated grooming algorithm [56,57].
The compatibility of data and MC predictions is evaluated using a measure defined as χ 2 = ∆ T C −1 ∆, where ∆ is the difference vector between measured and predicted values, and C is the total covariance matrix of the measurement. Since the measured distribution is normalised to unity, its covariance matrix is singular, i.e. not invertible. To render C invertible, the vector entry and matrix row/column corresponding to one measured bin need to be discarded; we choose to remove the last bin. The results are displayed in Table 4 for all jets inclusively as well as for each jet flavour separately. We observe that none of the tunes describe the λ 0 0 (N) data well for all jet flavours. As concluded in Ref. [53], flavour-dependent improvements in the nonperturbative physics modelling may be required for a better description of the data. The angle between the groomed subjets, on the other hand, is infrared and collinear safe and can be described very well by an increase in the α FSR S (m Z ), which corresponds to a decrease in the FSR renormalisation scale µ FSR R . Table 4 shows the results obtained by varying µ FSR R by factors of 0.5 and 2. Figure 20 displays the normalised tt differential cross section for the jet pull angle [59] defined using the jets originating from the decay of a W boson in tt events, as measured by the ATLAS experiment [58]. The observable is shown for the case where only the charged constituents of  Figure 19: Distributions of the particle multiplicity in gluon jets (left) and the angle ∆R g between two groomed subjets in inclusive jets (right) measured by the CMS experiment in tt events at √ s = 13 TeV [53]. The coloured band and error bars on the data points represent the total experimental uncertainty in the data. Table 4: The χ 2 values and the numbers of degrees of freedom (N dof ) for the comparison of tt data with the predictions of the different PYTHIA 8 tunes, for the distributions of the chargedparticle multiplicity λ 0 0 , the angle between the groomed subjets ∆R g at √ s = 13 TeV [53], and the pull angle measured in the ATLAS analysis of the colour flow at 8 TeV [58]. The FSR up and down entries denote variations of the renormalisation scale in the α FSR S (m Z ) by factors of 0.5 and 2, respectively.

Pull angle in tt events
Charged-particle multiplicity λ 0 0 Angle between groomed subjets ∆R g Pull angle φ(j 1 , j 2 )  Table 4. The pull angle is particularly sensitive to the setting of the ERD option. With ERD turned off, the decay products of the W boson in tt events are not included in CR, and the predictions using the tunes with the various CR models are similar to each other. With ERD enabled, CR now modifies the pull angle between the two jets, which is observed in Fig. 20. The predictions of each tune also show significant changes when ERD is enabled. For both the nominal and CP5-CR1 (QCD-inspired) tunes, the prediction with ERD improves the description of the data, and the difference between the predictions with or without ERD is larger for the CP5-CR1-based tune. We observe the opposite for the CP5-CR2-based (gluon-move) tunes, for which the choice without ERD is preferable. This picture might be different if the flip mechanism had been added in the tuning of the gluon-move model. The move step in the gluon-move model is more restrictive because it allows only gluons to move between the string end-points. The inclusion of the flip mechanism would also allow the string end-points to be mixed with each other and, therefore, could further reduce the total string length in an event. However, as indicated earlier, the effect of the flip mechanism on diffractive events is not well understood and, therefore, this mechanism is not used in this paper.
Overall, the QCD-inspired model with ERD provides the best description of the jet pull angle. The differences between the predictions using the different tunes observed here indicate that the inclusion of observables, such as the jet pull angle and other jet substructure observables, could be beneficial in future tune derivations.  20: Normalised tt differential cross section for the pull angle between jets from the W boson in top quark decays, calculated from the charged constituents of the jets, measured by the ATLAS experiment using √ s = 8 TeV data [58] to investigate colour flow. The coloured band and error bars on the data points represent the total experimental uncertainty in the data.

Uncertainty in the top quark mass due to colour reconnection
The top quark mass has been measured with high precision using the 7, 8, and 13 TeV tt data at the LHC [10,[61][62][63][64][65][66][67][68][69][70][71][72][73]. The most precise value of m t = 172.44 ± 0.13 (stat) ± 0.47 (syst) GeV was measured by the CMS Collaboration combining 7 and 8 TeV data [67]. To further improve the precision of m t measurements, a complete analysis of the systematic uncertainties in the measurement is crucial. One of the dominant systematic uncertainties is due to the modelling of CR in top quark decays [67]. The procedure for estimating this uncertainty used for the LHC Run 1 (years 2009-2013) analyses at √ s = 7 and 8 TeV was based on a comparison of two values of m t , calculated by using predictions with the same UE tune with and without CR effects. In Ref.
[67], this is done using the tune "Perugia 2011" with and without CR effects included. The "Perugia 2011" tunes family is the updated version of the "Perugia (Tevatron)" tunes family and also takes into account lessons learned from LHC MB and UE data at 0.9 and 7 TeV [74].
The new CMS tunes, presented in Section 2, which use different CR models, can be used to give a better evaluation of the CR uncertainty. In particular, the uncertainty is now calculated by comparing results for m t values obtained from different realistic CR models, such as CP5, CP5-CR1, and CP5-CR2.
Additionally, one can also estimate the effects of the CR on the top quark decay products by investigating the differences between predictions using PYTHIA 8 with the option ERD off and on, which was done for the UE observables [8].
A determination of m t using a kinematic reconstruction of the decay products in semileptonic tt events at √ s = 13 TeV is reported in Ref. [10]. In these events, one of the W bosons from the top quark decays into a muon or electron and a neutrino, and the other into a quark-antiquark pair.
In this analysis, m t and the jet energy scale factor were determined simultaneously through a joint-likelihood fit to the selected events. The results with the QCD-inspired and gluon-move models were also compared. The PYTHIA 8 CUETP8M2T4 [75] UE tune was used, and the parameters of the CR models were tuned to UE and MB data at √ s = 13 TeV [10]. They found that the gluon-move model results in a 0.31 GeV shift from the m t value obtained with the default simulation. This shift, which is larger than the shifts caused by the other CR models, is assumed to be the uncertainty due to the modelling of CR in the measured m t . It is much larger than the shift, 0.01 GeV, due to the CR modelling in the Run 1 measurement [67]. This is the largest source of uncertainty in the measured m t , where the total uncertainty is 0.62 GeV [10]. Similar studies using single top quark final states are reported in Refs. [71,76].
We compare the m t and W boson mass values obtained with different tune configurations based on our new tunes in Table 5. Top quark candidates are constructed by a RIVET routine in a sample of simulated semileptonic tt events. Events must contain exactly one lepton with p T > 30 GeV and |η| < 2.1. Leptons are "dressed" with the surrounding photons within a cone of ∆R = 0.1 and are required to yield an invariant mass window of 5 GeV centred at 80.4 GeV, when combined with a neutrino in the event. The events must also contain at least four jets, reconstructed with the anti-k T algorithm, with p T > 30 GeV within |η| < 2.4. At least two of the jets are required to originate from the fragmentation of bottom quarks, and at least two other jets, referred to as light-quark jets, must not originate from bottom quarks. One jet originating from a bottom quark is combined with the lepton and neutrino to form a leptonically decaying top quark candidate, whereas the other jet originating from a bottom quark is combined with two other jets to form a hadronically decaying top quark candidate. The difference in the invariant mass window of the two top quark candidates is required to be less than 20 GeV, and the invariant mass of the two light-quark jets is within a window of 10 GeV centred at 80.4 GeV. If more than one combination of jets satisfy these criteria when combined with the lepton and neutrino, then only one combination is chosen based on how similar the invariant masses of the two top quark candidates are to each other and on how close the invariant mass of the light-quark jets is to 80.4 GeV. The invariant mass of the hadronically decaying top quark candidates constructed in this way for each of the different tune configurations is shown in Fig. 21. The top quark and W boson mass values are obtained from these hadronically decaying top quark candidates by fitting a Gaussian function within an 8 GeV mass window around the corresponding mass peak. Table 5 also contains the differences from the nominal m t and m W values (∆m t , and ∆m W ) and the difference in ∆m hyb t , a quantity that was introduced in Ref.
[67] to incorporate both an in situ jet scale factor determined from the reconstructed m W as well as prior knowledge about the jet energy scale in a hybrid approach to extract m t . Here, ∆m hyb t is approximated as ∆m t − 0.5∆m W . From Table 5, we observe that the largest deviation from the predictions of CP5 is CP5-CR2 ERD (0.32 GeV) similar to the largest shift found in Ref. [10] using CUETP8M2T4. However, CP5-CR2 ERD is not able to describe the available colour flow data, and can therefore be excluded from the list of modelling uncertainties.

Summary and conclusion
New sets of parameters for two of the colour reconnection (CR) models implemented in the PYTHIA 8 event generator, QCD-inspired and gluon-move, are obtained, based on the default CMS PYTHIA 8 tune CP5. Measurements sensitive to underlying-event (UE) contributions performed at hadron-colliders at √ s = 1.96, 7, and 13 TeV are used to constrain the parameters for the CR and for the multiple-parton interactions simultaneously. Various measurements at 1.96, 7, 8, and 13 TeV are used to evaluate the performance of the new tunes. The central values predicted by the new CR tunes for the UE and minimum-bias events describe the data significantly better than the CR models with their default parameters before tuning. The predictions of the new tunes achieve a reasonable agreement in many UE observables, including the ones measured at forward pseudorapidities. However, the models after tuning do not generally perform better than the CP5 tune for the observables presented in this study. Although the new CR tunes presented in this work are not intended to improve the description of the measurements of strange particle multiplicities for Λ baryons and K 0 S mesons, we test the new tunes against them. We find that the new CR models, when tuned using only measurements that are sensitive to the UE, do not provide a better description of the distribution of strange particle production as a function of rapidity for Λ baryons. However, we observe that all CP5 tunes, irrespective of the CR model, describe particle production for K 0 S as a function of rapidity well. Including these observables in the fits, along with the latest measurements of baryon/meson production, could be beneficial for future tune derivations.
The predictions of the new tunes for jet shapes and colour flow measurements done with top quark pair events are also compared with data. All tunes give similar predictions, but none of the tunes describe the jet shape distributions well. Some differences are also observed with respect to the colour flow data, which is particularly sensitive to the early resonance decay option in the CR models. The differences between the predictions using the different tunes observed here indicate that the inclusion of observables, such as the jet pull angle and other jet substructure observables, could be beneficial in tuning studies. A study of the uncertainty in the top quark mass measurement due to CR effects is also presented. The new CR tunes will play a role in the evaluation of systematic uncertainties associated with the modelling of colour reconnection.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid and other centres for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC, the CMS detector, and the supporting computing infrastructure provided by the following funding agencies:

A Colour reconnection tunes with a leading-order PDF set
The list of input RIVET routines used as inputs for the fits, as well as the centre-of-mass energy values, the η ranges, the names of the distributions, the x-axis ranges, and the R values of the distributions are displayed in

B Parameter ranges and uncertainties in the tunes
The parameter ranges are chosen such that the sampled MC space does not destroy the definition of a particular observable in the fits. In Fig. B.1, some sample histograms showing the range of variation available on the observable histograms are given for CP5-CR1. The results are similar for other observables used in the fits as well as for CP5-CR2.
The CP5-CR1 and CP5-CR2 tunes were developed to evaluate the uncertainty in CP5 that results from different color reconnection models. The uncertainties in the parameters for these tunes were estimated using eigentunes provided by PROFESSOR. Eigentunes represent variations of the tuned parameters in the parameter space along the maximally independent directions. The magnitude of the variation corresponds to a change in the χ * 2 (∆χ * 2 ) equal to the χ * 2 of the fit. The choice of ∆χ * 2 , which is recommended by the PROFESSOR Collaboration, is based on empirical grounds since modifying it in equation Eq.(4) does not yield a statistically meaningful variation. Such a change, ∆χ * 2 = χ * 2 , is considered reasonable for reflecting the combined statistical and systematic uncertainty in the model parameters, and results in variations similar in magnitude to the uncertainties in the fitted data points. However, this approach may result in uncertainties that do not fully encompass the data in every bin. If the uncertainties in the fitted data points are uncorrelated, their magnitudes will depend on the bin widths. For the data used in the fit, the uncertainties are mostly correlated between bins. However, for UE observables with high p max T (p max T 10 GeV) statistical uncertainties, which are uncorrelated between bins, dominate. This creates some dependence of the eigentunes on the bin widths of the data used in the fit, and leads to uncertainties in the tunes that are much larger and more asymmetric than those on the data points when all eigentunes are added in quadrature.
The number of eigentunes is equal to twice the number of free parameters used in the fit. For the QCD-inspired and gluon-move models, there are 14 and 12 eigentunes, respectively. However, using all 12 or 14 eigentunes to calculate the tune uncertainty for a given observable is computationally inefficient. Therefore, the "up" and "down" tune settings are calculated by comparing the positive and negative differences between each eigentune and the central Charged-hadron multiplicity, √ s = 13 TeV  prediction of the nominal tune for each bin of the observable. The upper and lower bounds of the uncertainty in each bin are defined by adding the positive differences in quadrature and taking the square root, and similarly for the negative differences. These "up" and "down" variations are then fit using the same procedure as in Section 3 to obtain new parameter sets that can be used to estimate the uncertainties in the nominal tune.
The predictions of the CP5-CR1 and CP5-CR2 tunes are compared with observables at 13 TeV in Figs. B.2-B.5. The shaded bands in these figures correspond to the envelope of the predictions of the eigentunes of each tune. The parameters of the "up" and "down" tunes for CP5-CR1 and CP5-CR2 are given in Tables B.1

C The ColourReconnection:junctionCorrection parameter
The QCD-inspired model implemented in PYTHIA 8 allows for the creation of string junctions when three color lines meet at a single point. The presence of these junctions can affect the number of particles produced in a collision and result in the production of additional gluons and quark-antiquark pairs. The junctionCorrection parameter in PYTHIA 8 controls the strength of this effect and can impact various observables, including the charged particle pseudorapidity distribution.  √ s = 13 TeV [22]. The prediction of the CP5-CR2 tune is compared with data. The coloured band represents the tune uncertainties.  [17] in the transMIN and transMAX regions. The predictions for CP5-"QCD-inspired" were obtained by replacing the MPI-based CR model in CP5 with the QCD-inspired model, where the default value of the junctionCorrection parameter is 1.2. The other predictions presented in the figures were obtained by setting the junctionCorrection parameter to 4.0 and to 0.05, respectively. These values were chosen arbitrarily to test how the prediction changes when a relatively high or low value is set for the junctionCorrection parameter. These comparisons demonstrate the sensitivity of the junctionCorrection parameter to these observables. According to Ref. [14], the junctionCorrection parameter is most sensitive to the baryon/meson ratio in pp collisions. The sensitivity of the junctionCorrection parameter to the production of Λ baryons and K 0 S mesons, measured by the CMS experiment at √ s = 7 TeV [16], is shown in We also derived a new version of CP5-CR1 by including the rapidity distributions of Λ baryons and K 0 S mesons, as well as some recent baryon and meson measurements from ALICE and LHCb experiments [37,38]. The new tune, named CP5-CR2-v2, resulted in a significant improvement in the description of the Λ rapidity distribution and reasonable agreement with the data for Λ/K 0 S , but it was not able to reproduce the dN ch /dη at 13 TeV. The values of the  Figure B.5: The charged-particle (left) and p sum T densities (right) in the transMIN (upper) and transMAX (lower) regions as functions of the p T of the leading charged particle, p max T , measured by the CMS experiment at √ s = 13 TeV [17]. The predictions of the tunes CP5-CR2 are compared with data. The coloured band represents the tune uncertainties.
parameters obtained in the fits of the CP5-CR1-v2 are presented in Table C.1. The remaining parameters of PYTHIA 8 are kept the same as in the CP5 tune.