Thermodynamical String Fragmentation

The observation of heavy-ion-like behaviour in pp collisions at the LHC suggests that more physics mechanisms are at play than traditionally assumed. The introduction e.g. of quark-gluon plasma or colour rope formation can describe several of the observations, but as of yet there is no established paradigm. In this article we study a few possible modifications to the Pythia event generator, which describes a wealth of data but fails for a number of recent observations. Firstly, we present a new model for generating the transverse momentum of hadrons during the string fragmentation process, inspired by thermodynamics, where heavier hadrons naturally are suppressed in rate but obtain a higher average transverse momentum. Secondly, close-packing of strings is taken into account by making the temperature or string tension environment-dependent. Thirdly, a simple model for hadron rescattering is added. The effect of these modifications is studied, individually and taken together, and compared with data mainly from the LHC. While some improvements can be noted, it turns out to be nontrivial to obtain effects as big as required, and further work is called for.


Introduction
QCD, the theory of strong interactions, is at the origin of a wide range of phenomena. In one extreme, progress on high-energy perturbative calculations offers an increasingly precise and successful description of hard processes, as a large community is steadily improving calculational techniques. NLO calculations, once rare, are now standard, NNLO is getting there, and even NNNLO is starting to appear (see e.g. [1] and references therein). In another extreme, the nonperturbative aspects of low-energy interactions are less well understood. Lattice QCD can be used to calculate static hadron properties, but not (yet?) dynamical processes. Specifically, the description of hadronization, the step whereby partons turn into hadrons in high-energy collisions, cannot be derived directly from the QCD Lagrangian within any currently known formalism. Instead string [2] and cluster [3][4][5] models, developed in the early eighties, have been used almost unchanged from PETRA/LEP e + e − events to SppS/Tevatron/LHC pp/pp ones -the assumed "jet universality". Differences have been attributed to the quite disparate parton-level configurations that undergo hadronization: while e + e − involves only hard process and final-state radiation (FSR), pp adds aspects such as initial-state radiation (ISR), multiparton interactions (MPIs), beam remnants and colour reconnection (CR).
Cracks have started to appear in this picture as new LHC data have been presented. Specifically, several studies have shown how high-multiplicity pp events have properties similar to those observed in heavy-ion AA collisions. Some observations may have an explanation within the current framework, e.g. CR may give some flow-like patterns [6], but others do not. An early example was the discovery of "the ridge", an enhanced particle production around the azimuthal angle of a trigger jet, stretching away in (pseudo)rapidity [7][8][9]. A more recent example is the smoothly increasing fraction of strange baryon production with increasing charged multiplicity, a trend that lines up with pA data before levelling out at the AA results [10] 1 . Conventional wisdom holds that the formation of a quark-gluon plasma (QGP) requires a larger volume and longer time for thermalization than pp or pA systems can offer, so such trends are unexpected, see e.g. [12].
It is therefore time to rethink the picture of hadronization in high-energy and highmultiplicity collisions. One possible approach is to imagine that a QGP is at least partly formed in pp collisions, such that individual colour fields (strings) cease to exist. Such a behaviour is already implemented in the EPOS model [13]. Another is to imagine that strings survive as a vehicle e.g. of short-range flavour correlations, but that their properties are modified. Colour ropes [14][15][16] is one such example, wherein several colour-triplet strings combine to a higher colour-representation field. A detailed implementation of rope dynamics is found in the DIPSY program [17]. Both EPOS and DIPSY qualitatively describe several of the new key features, such as the increasing rate of strangeness production at higher multiplicities.
With the studies described in this article we want to add to the set of alternative models that can be used to compare with data. At best it may offer some new insights, at worst it will act as a straw man model. Firstly, rather than the particle-mass-independent Gaussian p ⊥ spectrum assumed in the standard string model, it introduces an exponential p ⊥ dependence, exp(−p ⊥ /T ). This is split among possible flavours according to hadronic m ⊥ , exp(−m ⊥ /T ). Such p ⊥ and m ⊥ shapes were used to describe early pp data, e.g. at the ISR [18], and has some foundation in the Hagedorn temperature [19][20][21] and in related [22] ideas. (Later powerlike p ⊥ ansätze [21,23,24] or two-component exponential + powerlike ones [25] can be viewed as a consequence of perturbative jet production, and is in our framework generated as such, in an earlier stage than the nonperturbative hadronization.) Secondly, it assumes that the close-packing of several strings leads to an increased effective temperature and thereby both a changed particle composition and changed p ⊥ spectra. In spirit this is close to the rope model, but it does not have to assume that the individual strings either fuse or melt away. Thirdly, if the fragmenting strings are close-packed this also implies the initial formation of a dense hadronic gas, wherein rescattering may lead to collective-flow effects. Such effects are simulated in a crude first approximation.
The impact of these mechanisms on experimental distributions is studied, in order to quantify their significance. As a prime example, consider the p ⊥ (n ch ) distribution, with a characteristic rising trend that has been proposed as a signal for colour reconnection [26]. Alternative interpretations are now offered in terms of close-packing of strings and/or hadrons, and these are presented and compared with data individually. At the end of the day, we should expect the "true" nature of high-multiplicity pp collisions to contain many contributing mechanisms, however. To be more more specific, in quantum mechanics any process that is not explicitly forbidden by some selection rule is bound to occur, the question is only with what rate. The final task therefore is try to constrain the relative importance of the mechanisms, not to prove a specific one "right" or "wrong".
The new model components are implemented as options in the standard Pythia event generator [27,28], which makes it easily accessible for further experimental tests. They should be viewed as a first iteration. Should they prove useful there is room for further improvements, as we will indicate.
The article is organized as follows. Section 2 outlines relevant features of the existing Lund string model and introduces key observables, with emphasis on those new ones that are not well described by the current Pythia generator. Section 3 introduces the alternative approaches explored in this article, and presents some first toy studies for simplified string topologies. Comparisons with data are presented in section 4, highlighting what seems to work where and what not. Finally section 5 contains a summary and outlook.

The Lund string model
The Lund string fragmentation model [2] is very successful in many respects, but more so for the overall longitudinal fragmentation structure than for its description of the particle composition.
The central assumption in the string model is that of linear confinement, V (r) = κr, with a string tension κ ≈ 1 GeV/fm. The word "string" should here not be taken literally; the physical object is a kind of flux tube stretched between the endpoints, with a typical transverse size of the order of the proton one, r p ∼ 0.7 fm. The one-dimensional "mathematical" string should then be viewed as a description of the location of the center of the flux tube. By analogy with superconductivity the tube could be viewed as a vortex line like in a type II superconductor, alternatively as an elongated bag in a type I one.
In the case of a simple stable back-to-back qq system, with m q = p ⊥ q = 0, quarks move with the speed of light in "yo-yo"-mode oscillations, as energy moves between being stored in the endpoint quarks and in the intermediate string. If creation of new qq pairs is allowed the original system can break up into smaller ones, each a colour singlet in its own right. Denoting the original pair q 0 q 0 , and ordering the new pairs q i q i , 1 ≤ i ≤ n − 1 from the quark end, results in the production of n hadrons q 0 q 1 , q 1 q 2 , . . . , q n−1 q 0 .
Aligning the x axis with the string axis, the breakup vertices are characterized by their location (t i , x i ). These vertices have a spacelike separation, and so have no unique time ordering. (Except for the original (t 0 , x 0 ) = (0, 0) of course. But here it is actually the turning points of the q 0 and q 0 that define the vertices in eq. (2.1) below, and then spacelike separation is restored.) Two adjacent ones are correlated by the constraint that the hadron produced should have the correct mass m i : If the vertices are assigned from the quark end, say, each new vertex therefore corresponds to one degree of freedom, which should be selected according to some probability function. Imposing consistency constraints, mainly that results should be the same (on the average) if fragmentation is instead considered from the antiquark end, gives the solution [2] f (z) with a and b two free parameters, and where m 2 → m 2 ⊥ once transverse momentum is introduced. Here z is the fraction of available lightcone momentum E + p x taken by a hadron, with the remainder 1 − z retained by the string for subsequent particle production.
This ansatz leads to vertices having an equilibrium distribution (after having taken a few steps away from the endpoints) with the same a and b as above. (For the special case a = 0 this result agrees with the Artru-Mennessier model [29], which is based on constant decay probability per string area dt dx, without any mass constraint.) The associated probability for producing n particles can be written as [30] where A tot is the total space-time area under the breakup vertices. The relation between dP n and dP n−1 (at a reduced c.m. energy) is then given by the fragmentation function eq. (2.2), where it is easy to show that the exponentials match, and somewhat less trivial that a larger N (i.e. larger weight for higher multiplicities) corresponds to a larger a (i.e. less momentum taken away in each step). The simple qq fragmentation picture can be extended to qqg topologies if the gluon is viewed as having separate colour and anticolour indices, as in the N C → ∞ limit [31]. Then one string piece is stretched between the quark and the gluon, and another between the gluon and the antiquark. The absence of a string piece stretched directly between the quark and antiquark leads to predicted asymmetries in the particle production [32] that rapidly were observed experimentally [33]. In general, a string can stretch from a quark end via a number of intermediate gluons to an antiquark end. Technically the motion and fragmentation of such a string system can become rather complicated [34], but the fragmentation can be described without the introduction of any new principles or parameters. This is the most powerful and beautiful aspect of the string fragmentation framework. Note that the leading hadron in a gluon jet can take momentum from both the string pieces that attaches it to colour-adjacent partons. This is unlike cluster models, where gluons are forced to branch into qq pairs, such that smaller colour singlets are formed rather than one single string winding its way between the partons. The string model is easily extended to closed gluon loops and, with rather more effort [35], to junction topologies, where three string pieces come together in a single vertex.
We now turn to the breakup mechanism. If a qq pair is massless and has no transverse momentum it can be produced on-shell, in a single vertex, and then the q and q can move apart, splitting the string into two in the process. But if the q (and q) transverse mass m ⊥q = m 2 q + p 2 ⊥q > 0 this is no longer possible. By local flavour conservation the qq pair is still produced at a common vertex, but as virtual particles that each needs to tunnel out a distance d = m ⊥ /κ. Using the WKB approximation [2] to calculate the tunneling probability for the pair gives a factor where the Gaussian answer allows a convenient separation of the m and p ⊥ dependencies (with implicit phase space d 2 p ⊥ ). The latter is implemented by giving the q and q opposite and compensating p ⊥ kicks, with p 2 ⊥q = κ/π = σ 2 ≈ (0.25 GeV) 2 . A hadron receives its p ⊥ as the vector sum of it q and q constituent kicks, and thus p 2 ⊥had = 2σ 2 . Empirically the tuned σ value comes out larger than this, actually closer to σ = 0.35 GeV. This implies that almost half of the p 2 ⊥ kick is coming from other sources than tunneling. One source could be soft gluon radiation below the perturbative (parton shower) cutoff, where α s becomes so big that perturbation theory breaks down [36]. Effectively radiation near the perturbative/nonperturbative border is thus shoved into an artificially enhanced tunneling answer, with the further assumption that the Gaussian shape and the p ⊥ balancing inside each new qq pair still holds.
Uncertainties also arise in the interpretation of the mass suppression factor of eq. (2.5): what quark masses to use? If current quark masses then the u and d ones are negligible while the s is below 0.2 GeV, predicting less strangeness suppression than observed, while with constituent masses m u ≈ m d ≈ 0.33 GeV and m s ≈ 0.51 GeV [37] too much suppression is predicted. Intermediate masses and suppression factors closer to data can be motivated e.g. by noting that an expanding string corresponds to confinement in the two transverse dimensions but not in the longitudinal one. In the end, however, the s/u suppression is viewed as an empirical number to be tuned to data. Whichever values are used, c and b quark tunneling production is strongly suppressed, so this mechanism can be totally neglected relative to the perturbative ones.
Considering only mesons in radial and rotational ground states, i.e. only the pseudoscalar and vector multiplets, naive spin counting predicts relative rates 1 : 3, whereas data prefers values closer to 1 : 1, at least for π : ρ. It is possible to explain a suppression of the vector mesons based on the difference in the hadronic wave functions, from the spin-spin interaction term [2], but the amount has to be tuned to data. And further brute-force suppression factors are needed specifically for the η and η mesons, which have "unnaturally" large masses owing to the U (1) anomaly.
Baryon production can be introduced by allowing diquark-antidiquark breakups of the string [38], to be viewed as occurring in two consecutive qq creation steps [39]. A baryon and the matching antibaryon would normally be nearest neighbours along the string, but the "popcorn mechanism" also allows one (or more) mesons to be produced in between. Diquark masses can be used to derive approximate suppressions, but again free parameters are used, for qq/q, sq/qq, qq 1 /qq 0 and others. Unfortunately the tuned values do not always match so well with the tunneling-formula expectations.
In total O (20) parameters are used to describe the outcome of the string/tunneling mechanism for particle production. Notable is that the particle masses do not enter explicitly in these considerations. This is unlike cluster models, e.g., where hadron masses occur in the phase space available for different cluster decay channels. A fair overall description of the particle composition is then obtained with very few parameters [40,41]. Note that while most fragmentation parameters in Herwig++ exist in different copies for light (u, d, s), c, and b quarks, the ones for heavy quarks have either been set equal to the values of those for light quarks [40] or have not been included in further tuning processes [41].
The hadron masses can be explicitly introduced into the Lund framework by assuming that the integral 1 0 f (z) dz, with f (z) given by eq. (2.2), provides the relative normalization of possible particle states. This concept has been developed successfully within the UCLA model [42,43], in that particle rates come out quite reasonably with minimal further assumptions. There are some other issues with this approach, however, and we do not pursue it further here.

Key data
An immense number of studies have been published based on hadron collider data, and it is not the intention here to survey all of that. Instead we here bring up some of the key data and distributions that have prompted us to this study. Several of them will be shown repeatedly in the following. We note that all histograms we will present in this article are produced by utilizing Rivet [44].
The list of key observables includes: • The change of flavour composition with event multiplicity. Specifically, high-multiplicity events have a higher fraction of heavier particles, meaning particles with a higher strangeness content [10]. Pythia contains no mechanism to generate such a behaviour. On the contrary, within a single fixed-energy string a higher multiplicity means more lighter particles, for phase space reasons. In pp collisions a higher multiplicity is predominantly obtained by more MPIs, however, so the composition stays rather constant.
• The average transverse momentum p ⊥ is larger for heavier particles, both at RHIC [45] and LHC [46]. This is a behaviour that is present also in Pythia, and comes about quite naturally e.g. by lighter particles more often being decay products, with characteristic p ⊥ values smaller than the primary particles in the string fragmentation. The mass dependence is underestimated, however. That is, π ± obtains a too large p ⊥ in Pythia and baryons a too small one. Recently p ⊥ has also been presented as a function of n ch , inclusive [47] and for different hadron species [48], providing a more differential information on this mismatch. In figure 1 we show these observables and compare default Pythia with data, with the above expected conclusions. Note that the data in figure 1 of [48] is not (yet) publicly available. To obtain an estimate of the data that is comparable to MC predictions we used an estimate of the logarithmic fits shown in figure 1 of [48] and used n ch values on the x axis rather than dn ch /dη |η|<0.5 .
• The charged particle p ⊥ spectrum is not correctly modelled at low p ⊥ scales, with Pythia producing too few particles at very low values [47,49,50]. Often tunes then compensate by producing a bit too many at intermediate p ⊥ scales. The issue shows up e.g. in minimum-bias dn ch /dη distributions, where it is not possible to obtain a good description for data analyzed with p ⊥ > 0.1 GeV and p ⊥ > 0.5 GeV simultaneously.
• In the p ⊥ spectra for identified particles [51] it turns out that the deficit at low p ⊥ is from too little π ± production. This is not unexpected, given the previous two points, but stresses the need to revise the mass dependence of p ⊥ spectra.
• The Λ/K p ⊥ spectrum ratio, measured by CMS [52], where Pythia is not able to reproduce the peak at ∼ 2.5 GeV completely and overshoots the distribution for large-p ⊥ values.
• The observation of a ridge in pp collisions was one of the major surprises in the 7 TeV data [7], and has been reconfirmed in the 13 TeV one [8,9]. The ridge is most clearly visible at the very highest multiplicities, but more careful analyses hints the effect is there, to a smaller extent, also at lower multiplicities. Like in heavy-ion collisions one may also seek a description in terms of correlation functions, C(∆φ) ∝ 1 + n≥2 v n cos(n∆φ), notably the v 2 coefficient, with a similar message. These phenomena are not at all described by Pythia: there is no mechanism that produces a ridge and, once the effects of back-to-back jet production have been subtracted, also no rise of v 2 .
There are also some other reference distributions that have to be checked. These are ones that already are reasonably well described, but that inevitably would be affected by the introduction of new mechanisms.
• The charged particle multiplicity distribution P (n ch ) is sensitive to all mechanisms in minimum-bias physics, but especially the MPI and CR modelling. A mismatch in n ch is most easily compensated by modifying the p ⊥0 scale of the MPI description. This parameter is used to tame the dp 2 ⊥ /p 4 ⊥ divergence of the QCD cross section to a finite dp 2 ⊥ /(p 2 ⊥0 + p 2 ⊥ ) 2 shape. It can be viewed as the the inverse of the typical colour screening distance inside the proton. A mismatch in the width of the n ch distribution can be compensated by a modified shape of the b impact-parameter distribution of the two colliding protons. Specifically, a distribution more sharply peaked at b = 0 gives a longer tail towards high multiplicities.    [46,48] and ATLAS [47] data. The data in the bottom plots is taken to be an estimate of the logarithmic fits in [48] and therefore no error bars are included.
• An p ⊥ increasing with n ch was noted already by UA1 [53], and has remained at higher energies [47,48]. It offers a key argument for introducing CR in pp/pp collisions, as follows [26]. The tail towards large n ch is driven by events with more MPI activity, rather than e.g. by events with higher-p ⊥ jets. If each MPI subcollision produces particles essentially independently the p ⊥ (n ch ) would be rather flat. CR implies that fewer and fewer extra particles are produced for each further MPI, as the possibilities to reduce the total string length by CR increase the more partons are already present. The amount of p ⊥ from the MPIs thus increases faster than the n ch , meaning more p ⊥ per particle. (To this comes the normal hadronization p ⊥ contribution, which raises the overall p ⊥ level but does not contribute to the p ⊥ (n ch ) slope.) The exact nature of CR is not known, meaning that many models have been developed [26,54,55]. In most of them there is some overall CR strength parameter that can be adjusted to fit the p ⊥ (n ch ) slope.
• A natural reference for hadronization properties always is e + e − data. The principle of jet universality -or, in our case, string universality -is deeply rooted, so it it useful to check that no changes of fundamental string properties have too adverse an impact on e + e − . There is also a possibility of improvements in some places, like the inclusive p ⊥in and p ⊥out spectra; unfortunately these are not available for identified particles.

The New Models
In this section we outline the basic ideas and implementations that we have developed to offer new options to the traditional Pythia hadronization framework. As we later compare with data we will have reason to go into more detail and discuss some variations.

Variations of the normal string model
As described above, the standard tunneling framework suggests a Gaussian suppression of the production of heavier quarks and diquarks, with a further suppression based on the hadronic spin state, but no obvious room for an explicit dependence on the hadron mass. It also provides a common Gaussian p ⊥ spectrum for all new qq pairs. We will study a few variations of this framework, mainly as a reference for the thermodynamical ansatz below. Firstly, consider a Gaussian suppression associated with the masses of the produced hadrons rather than with the quarks. That is, let the relative production rate of different hadron species be given by a factor exp(−m 2 ⊥had /2σ 2 ), which factorizes into a speciesindependent p ⊥ spectrum and an exp(−m 2 had /2σ 2 ) mass suppression. The question is then whether this would give the appropriate suppression for the production of heavier particles.
Secondly, the universal p ⊥ spectrum could be broken by assigning a larger width in string breakups of the ss and qqqq kinds, relative to the baseline uu and dd ones. The issue to understand here is how dramatic differences are required to get a better description of the individual π, K and p p ⊥ spectra.
Thirdly, assume that more MPIs leads to a closer packing of strings in the event, but that each string "flux tube" remains as a separate entity. The transverse region of the string shrinks and, essentially by Heisenberg's uncertainty relations, this should correspond to a higher energy, i.e. a larger string tension κ. (Such a relation comes out naturally e.g. for bag models of confinement [56].) Overall the dense-packing effect on κ and related parameters should scale as some power of n MPI , i.e. the number of MPIs in the current event. Since n ch and n MPI are strongly correlated it is thus interesting to study how the particle composition and p ⊥ depend on n ch . For a more differential picture it should be preferable to estimate the number of strings in the neighbourhood of each new hadron being produced. This is done by making a reasonable guess for the momentum of the hadron that is the next to be produced on the current string. Using an average hadron mass and p ⊥ , defined in the frame of the parent string, and an average Γ value of Γ = (1 + a)/b, the momentum of the "average expected" hadron is calculated. Using this information, we determine the number of strings that cross the rapidity of the expected hadron. For this purpose the rapidity range that a string will populate is defined by the rapidity of the endpoint partons of each string piece, where m 2 min has the purpose to protect against strings with low-m ⊥ endpoints from populating the full rapidity range. The rapidity-density measure is reasonable for low-p ⊥ hadroproduction, but does not reflect the phase space inside a high-p ⊥ jet, where closepacking of strings should be rare. Therefore the effective number of strings is calculated as where p ⊥had is the physical hadron p ⊥ and p ⊥ 0 is the MPI regularization parameter.
As two examples, the rapidity distribution of the strings in a typical QCD event and in a diffractive event are shown in figure 2. Using eq. (3.2), the string tension in eq. (2.5) is modified to be where the exponent r is a left as a free parameter, that can be used to tune the model to data. Note that while junctions 2 contribute to the calculation of n string by assuming one string stretched between the highest-and lowest-rapidity parton, their fragmentation does not make use of eq. (3.3). Junctions are rare in the models we study, so this is not a significant simplification. The effect of modifying the string tension due to the local density has also been studied in other Monte Carlo programs, which are primarily for heavy-ion collisions. Some of them have hardly been used for pp physics as they miss out on other physics aspects such as QCD jet production. In the RQMD model [57] for studying relativistic nucleus-nucleus collisions, colour strings are allowed to fuse into ropes if they are overlapping, which weakens the suppression of strangeness and baryon production due to the increased string tension [58]. A similar model with string fusion into colour ropes is implemented in the DIPSY event generator [17] and shows improvement in the description of identified particle spectra in pp minimum bias data. For the UrQMD model [59,60], used for relativistic heavy-ion and hadron-hadron collisions, the authors of [61] show that a better description of particle yields is achieved with an enhanced string tension. The effect of an increased string tension in a densely populated environment on strangeness and diquark production, antibaryonto-baryon ratios and other observables has been investigated in [62]. In AMPT [63,64], a Monte Carlo transport model for heavy-ion collisions at relativistic energies, parameters in the Lund string fragmentation model have been modified, as the string tension is expected to be increased in the dense matter formed in heavy-ion collisions [65]. In the PSM [66,67] Monte Carlo model for simulating nuclear collisions, string fusion associated with high string densities is taken into account to reduce multiplicities and increase p ⊥ , baryon and strangeness production. Ref. [68] presents a model which introduces the interaction between strings via their fusion and percolation analytically. The p ⊥ of the produced particles, and therefore also the string tension, depends on the string density and how much strings overlap [69][70][71].

One-string toy model
A very simple toy model is introduced to validate the modifications to the string tension in the conventional string model. A single string with energy m Z is spanned along the z axis. The flavour of the endpoint quarks is chosen random from the set (u, d, s, c, b). The study includes only primary produced hadrons, i.e. no hadron decays, and also excludes the hadrons containing the endpoint quarks. (Such hadrons would have lower p ⊥ since the endpoint quarks by definition have p ⊥ = 0.) The p ⊥ and the mean multiplicity for different hadron species are shown in figure 3. As expected, increasing the string tension either for s quarks or for diquarks leads to an increased p ⊥ value for the hadrons concerned. Note that for η + η the p ⊥ is only increased slightly due to the uu + dd quark component being more frequently produced compared to ss. There is a slight reduction of the production probability for hadrons with s quarks or diquarks, shown in the top right plot in figure 3, due to the increased string default increased κ for diquarks increased κ for strangeness Mean multiplicity for different hadron species Multiplicity 0.6 0.8  tension leading to fewer particles being produced in affected events. The bottom row of figure 3 shows the K and p/n p ⊥ spectra, shifted to larger values as the string tension for that hadron species is increased.

Multi-string toy model
To investigate the effect of the close-packing of strings, as in eq. (3.3), the above toy model is extended to include several strings along the z axis. The number of strings is picked randomly between two and eight and the string energies are chosen such that they sum up to 1 TeV. Figure 4 shows p ⊥ as a function of the number of charged particles and the p ⊥ distribution and compares the modified model to default Pythia. Two different choices for the baseline value for the string tension are made in case of taking the close-packing of strings into account. In the first case the tension is denoted with κ and its value is  adjusted such that p ⊥ agrees with default Pythia for small values of n ch . In the second case, where the string tension is denoted by κ the value is adjusted to obtain the same p ⊥ , averaged over all hadrons and charged multiplicities. The latter case serves as a cross check when investigating the influence on the p ⊥ spectrum of charged hadrons.
As expected the p ⊥ increases with the charged multiplicity, eventually flattening out at large multiplicities. The left histogram in figure 4 also nicely shows that the rise is independent of the baseline string tension value.
When fitting the string tension such that the same overall p ⊥ is reached as in the default model, the charged hadron p ⊥ spectrum exhibits only small changes; making the spectrum somewhat broader.

Gaussian m 2
⊥had suppression To test the applicability of the Gaussian transverse mass suppression, the quark p ⊥ is generated according to exp −p 2 ⊥q /σ 2 , see eq. (2.5), with the hadron flavour chosen based on exp −m 2 ⊥had /2σ 2 . The additional factor of two arises from the hadron receiving p ⊥ contributions from two quarks. As the comparison to data is of interest here, realistic e + e − → jets events with s = m 2 Z are investigated. In figure 5 the particle composition is shown as a function of mass. This clearly indicates that the suppression based on the transverse mass squared of the hadrons is suppressing heavier hadrons too much. We will therefore not consider this option further.

The thermodynamical string model
The most radical departure from standard Lund string principles that we explore in this article is to replace the Gaussian suppression factor in mass and p ⊥ by an exponential one.  Figure 5: The conventional string model with its default options (red) and with the relative production rate of different hadron species given by a factor exp(−m 2 ⊥had /2σ) (blue), compared to PDG data [72].
To be more explicit, instead of a quark-level suppression governed by eq. (2.5) there will be a hadron-level suppression The inspiration clearly comes from a thermodynamical point of view, which is why we choose to associate the dimensional parameter with a temperature T . This association should not be taken too literally, however; there are many differences relative to a purely thermal model. The main one is that we keep the longitudinal string fragmentation structure unchanged, which ensures local flavour conservation. Another is that e.g. the Hagedorn approach [19,20] is based on the assumption of a steeply increasing density of excited states as a function of mass, whereas we only include a few of the lowest multiplets. (By default only the ground states corresponding to no radial or orbital excitation, optionally also the lowest L = 1 meson multiplets.) This means that, although our T comes out to be a number of the order of the Hagedorn temperature, there is no exact correspondence between the two. Also, T ∼ κ/π = σ from dimensional considerations, so our T could be viewed as a manifestation of the string energy per unit length, not directly linked to a temperature.
There is also an experimental historical background to the choice of an exponential shape, in that already fixed-target and ISR data showed that a distribution like exp(−Bp ⊥ ) offered a good fit to the inclusive dn ch /dp 2 ⊥ spectrum, with B ≈ 6 GeV −1 [18,[73][74][75]. With data split by particle type, a lower B value is noted for kaons and protons than for pions, but with the modified form exp(−Bm ⊥ ) all the spectra can be described by almost the same B ≈ 6 value.
As an aside, the preference for an exponential shape was and is not a show-stopper for the Gaussian approach in the normal string fragmentation. At larger p ⊥ the spectrum is dominated by the fragmentation of (mini)jets, giving a larger rate than the nonperturbative hadronization one. And at smaller p ⊥ the pattern of decays makes the spectrum more steep than the Gaussian one of the primary hadrons. So at the end of the day a Gaussian ansatz lands not that far away from an exponential spectrum, although differences remain. See further section 3.4.1, in particular figure 9.
In more detail, our model is intended to give each new hadron in the string fragmentation a p ⊥ according to an exponential distribution. We want to preserve the concept of local p ⊥ conservation in each qq breakup vertex, so seek a distribution that convoluted with itself (in two transverse dimensions) gives an exponential, (3.5) Using Fourier transforms to turn the convolution into a product, The transformation back off q then gives [76] f is a regular Bessel function of the first kind, and K 1/4 is the modified Bessel function of the second kind of order 1/4. An implementation of K 1/4 has been included in Pythia based on [77], using a power series for p ⊥ q /T < 2.5 and an asymptotic expansion for p ⊥ q /T > 2.5.
Consider the fragmentation of a string, where the quark q of one breakup has a certain p ⊥ 1 . The transverse momentum p ⊥ 2 of the (di)quark of the next breakup pair q q is constructed by picking its absolute value according to eq. (3.7) and a random azimuthal angle. The partner anti(di)quark must thus have −p ⊥ 2 due to local momentum conservation. The hadron transverse momentum is simply the sum of the p ⊥ of the two contributing quarks, p ⊥had = p ⊥ 1 − p ⊥ 2 . Having p ⊥had at hand we decide on the flavour of the breakup pair q q , and therefore also on the hadron species, as follows: calculate the transverse mass m ⊥had of all hadrons whose flavour content includes the incoming quark q and determine the basic probability for each hadron as P had = exp(−m ⊥had /T ) . (3.8) Assuming the production of two hadrons with different masses m 1 and m 2 , then eq. (3.8) implies the same production rate for p ⊥ m 1 , m 2 , but more suppression of the heavier hadron at low p ⊥ . Thus there is less production of heavier states, but they come with a larger p ⊥ .
As mentioned above, by default we only include the light-flavour (u, d, s) meson and baryon multiplets without radial or orbital excitation 3 . However, if desired, more hadrons can be added to the procedure. Depending on the flavour content of the hadron, the probability in eq. (3.8) receives additional multiplicative factors: • Due to spin-counting arguments vector mesons receive a factor of 3 and tensor mesons a factor of 5. • For same-flavour mesons we include the diagonal meson mixing factors, similar to what has been done previously in the conventional Lund string model. • Baryons receive a free overall normalization factor with respect to mesons, as well as an additional factor stemming from the SU (6) symmetry factors, see [38]. The relative weight of spin 1/2 baryons with respect to those with spin 3/2 is 2 : 4, similar to the factors for mesons arising from the spin-counting arguments in point 1.
• For the special case of octet baryons with three different flavours, e.g. Λ and Σ 0 , their probability for different internal spin configurations is taken into account. • An extra suppression factor for hadrons with strange (di)quarks is included to get more control over the relative hadron production and thus a better description of data. All probabilities are then rescaled to sum up to unity and the hadron species and therefore the flavour of the next (di)quark pair is chosen accordingly. Note that we have not (yet) implemented popcorn baryon production, i.e. no mesons are produced in between a baryon and its antibaryon partner.
Similar to eq. (3.3) the temperature can be modified as with n eff string given in eq. (3.2) to take into account the effect of close-packed strings. Note that in [78] the temperature has been related to the density in the context of the percolation of color sources (the density is however defined differently).

Asymmetry in different flavour transitions
Consider a very simple model, where only string breaks with dd and ss quark pairs are allowed to produce only pseudoscalar mesons, and the mixing of diagonal mesons is ignored. Then it is rather easy to see that the p ⊥ spectra and p ⊥ of the hadrons produced in (d → s) transitions is not the same as for (s → d) transitions, due to the difference in competition. In the first instance (d → s) competes with (d → d), and since the former produces the heavier meson it also obtains the higher p ⊥ . In the latter instance (s → d) instead competes with (s → s) and so gives the lighter meson and lower p ⊥ . Assuming that fragmentation is performed from the quark end inwards, K 0 = ds would thus obtain a harder p ⊥ spectrum thanK 0 = ds, which should not be the case. A simple solution for obtaining the same p ⊥ for both (d → s) and (s → d) transitions is to adjust the temperature in eq. (3.8) in case of initial s/s quarks such that ds and ds hadrons are produced with the same p ⊥ value, higher than dd, and ss becomes even higher than that.   Figure 6: The inclusive hadron p ⊥ spectrum for (d → s) and (s → d) transitions with the same temperature for both cases (left) and with the adjusted temperature (right).
In figure 6 we show the p ⊥ spectra for both types of transitions with the same temperature and with the adjusted temperature. Note that though both transitions end up with the same p ⊥ value, the shape of the p ⊥ distribution still differs somewhat.
Unfortunately this is a price to pay for working with a recursive model, where flavour is conserved locally. A traditional thermal model based on eq. (3.8) would not conserve flavour or momentum, however, so is not an option here.

The hadronic rescattering model
A close-packing of fragmenting strings also implies a close-packing of the produced primary hadrons, i.e. a dense hadronic gas. This gives the possibility for hadrons to rescatter on the way out, in particular at the earliest times after hadronization. A detailed simulation of this mechanism would require a knowledge of where in space-time each hadron is produced. For a single string, say stretched along the z axis, it is straightforward to translate between the (E, p z ) values of the primary hadrons and the (t, z) coordinates of the string breakups. For the more realistic case, when a string is stretched between several partons and the string motion is considerably more complicated [34], appropriate rules have not been worked out. To this should be added ambiguities in the transverse production coordinates, both as a consequence of the transverse distribution of the MPIs and of transverse fluctuations inside each string. The modelling of all of these aspects is an interesting task for the future. In addition, the cross section for the scattering of two hadrons against each other varies between hadron kinds, and depends on the relative energy of the two, adding a further layer of complexity.
Here we want to avoid such a detailed model, but still be able to explore whether hadronic rescattering effects could contribute to the resolution of some of the effects that we are attempting to explain. Collective flow -whether dictated by properties of the QGP or by hadronic rescattering -is well-established in heavy-ion collisions, see e.g. [79][80][81] and references therein. In particular, a common average radial velocity means that heavier particles have a higher p ⊥ than lighter. Of course we do not expect as dramatic effects in pp, but they may still contribute to the same kind of π/K/p p ⊥ separation as in the thermodynamic scenario above, so it should be interesting to compare the two possibilities.
The simple modelling we have in mind is applied to the primary hadrons produced directly from the string fragmentation, before secondary decays are considered. Furthermore, for strings stretched along the z axis there is a strong correlation between the rapidity y of a particle and its space-time production vertex. Therefore, for a given hadron, the density of other hadrons at around the same rapidity is a reasonable (and longitudinally boost invariant) measure of how close-packed particle production is. If there is a contribution from particles coming from the same simple string it has presumably already been absorbed in the tuned fragmentation parameters, so we should disregard such pairs. Unlike e + e − events, however, it is common with topologies where a string consists of pieces stretched back and forth across the same rapidity range, and then the above argument does not apply. In practice, it is therefore more relevant to exclude rescattering only between close neighbours in the fragmentation chains.
One should further note that the rapidity density of hadrons refers to low-p ⊥ particles. The hadronization of a scattered high-p ⊥ parton mainly occurs at larger p ⊥ scales, and these hadrons would be essentially unaffected.
The angular distribution of a rescattering, defined in the rest frame of the hadronic pair, should depend on the orbital angular momentum L. For simplicity, we restrict to s-wave isotropic scattering (i.e., L = 0) by requiring that the classical value of angular momentum L = b |p| < b |p max | ∼ 1, where b is the impact parameter and p max is the maximally allowed three-momentum of the hadrons in their rest frame, left as a (in principle) free parameter. We don't have access to b for each pair, but assume it is the same distribution for all combinations of hadron types. A common restriction on the three-momentum is thus introduced for all pairs, which is implemented as a cut on the invariant mass of the hadron pair, with m 1 and m 2 being the masses of the hadrons and m inv the physical invariant mass of the hadron pair. For all hadron pairs that are not excluded by eq. (3.10) we calculate the difference in rapidity, ∆y = |y 1 − y 2 |, and the rescattering probability. For hadrons that are not produced in the same string the latter is P ds (∆y) = P max ds 1 − ∆y ∆y max , (3.11) where the maximum scattering probability P max ds and the maximum rapidity difference ∆y max are left as free parameters. Eq. (3.11) simply means a probability of P max ds for zero rapidity difference of the hadron pair, linearly decreasing to zero at a rapidity difference of ∆y max . ⊥ π K η, η ′ ρ, ω K * φ p,n Λ, Σ Ξ ∆ Σ * Ξ * Ω w/wo (Thermal) Figure 7: The p ⊥ for different hadron species, with (blue triangles) and without (red dots) hadron scattering for the Gaussian p ⊥ and with (orange pentagons) and without (green squares) hadron scattering for the thermal p ⊥ in the toy model. The right plots show the ratio of the Gaussian model with to without hadron scattering in the upper panel and the same ratio for the thermodynamical model in the lower panel.
As an alternative to eq. (3.11), without the cut on the invariant mass in eq. (3.10), the probability can be chosen to be P ds (∆y, ∆ϕ) = P max with ∆ϕ being the difference in azimuth of the hadron pair and R max the maximally allowed value of the radius R = (∆y) 2 + (∆ϕ) 2 . For hadron pairs that are produced in the same string we introduce the difference in hadron index (called rank e.g. in [82]), ∆ ij = |i − j|, to denote how close two hadrons are, i.e. two neighbours have ∆ ij = ∆ i i+1 = 1, next-to-neighbours ∆ ij = ∆ i i+2 = 2 and so on. The scattering probability for same-string hadrons is P ss (∆y) = P ds (∆y) · 13) where P min/max ss is the minimum/maximum probability associated with the nearest/furthest neighbour, characterized by ∆ min/max ij , with a linear behaviour of the probability in between; zero probability for hadrons closer than ∆ min ij and maximum probability for those further apart than ∆ max ij . All four are left as free parameters. In the case where eq. (3.12) is applied, P ds (∆y) in eq. (3.13) has to be replaced by P ds (∆y, ∆ϕ).

Multi-string toy model
In order to test the hadron scattering a simple toy model is applied: five strings, each with energy m Z , are constructed along the z axis with different quark flavours for the endpoint quarks, and the primary produced hadrons are studied. As in section 3.1 the hadrons Gaussian p ⊥ wo hadron scattering Gaussian p ⊥ w hadron scattering Termal p ⊥ wo hadron scattering Termal p ⊥ w hadron scattering  and protons and neutrons (bottom right), with (blue) and without (red) hadron scattering for the Gaussian p ⊥ and with (orange) and without (green) hadron scattering for the thermal p ⊥ in the toy model.
containing the endpoint quarks are excluded. The following plots are obtained by making use of eqs. (3.10) and (3.11); using eq. (3.12) instead leads to similar results, and thus the same conclusions. Figure 7 shows the p ⊥ for different hadron species. For the Gaussian hadronic p ⊥ distribution without hadron scattering all hadrons receive the same p ⊥ spectrum. Including hadron scattering, the p ⊥ decreases for pions, the lightest hadrons, by about 20% and increases for heavy hadrons by up more than 40%. The same effect is present for the thermodynamical model, although changes only reach around 10%, as the p ⊥ is higher for heavier hadrons already without hadron rescattering. Figure 8 shows the normalized p ⊥ spectra for all hadrons and, to exemplify the difference between light and heavy hadrons, the spectrum for pions, kaons, and protons/neutrons. Comparing the inclusive p ⊥ spectrum for the Gaussian p ⊥ , we notice that the distribution gets broader with hadron scattering, i.e. we get more pions with small p ⊥ and more heavy hadrons with higher p ⊥ .
As the thermodynamical model without hadron rescattering comes with a different starting point, compared to the Gaussian model, the effect of the rescattering is not as large, and shifts the p ⊥ spectrum towards larger values. The exception is for pions only, where there is a slight broadening also towards smaller p ⊥ values.

The Effect of Decays
Hadron decays, such as ρ → ππ or η → π + π − π 0 , influence the p ⊥ spectra of the final state hadrons. As the decays are mostly dictated by kinematics, they constitute a limiting factor on the possibilities of modifying for instance the pion p ⊥ spectrum during the fragmentation process. Even though the primary hadrons follow a Gaussian or exponential p ⊥ distribution, the spectra obtained after decays do not, and become more similar. In addition, in realistic events the effects of perturbative jet production leads to a p ⊥ broadening and the emergence of a powerlike high-p ⊥ tail.
To investigate this smearing of the p ⊥ spectra, we consider realistic e + e − → jets events and inelastic pp collisions, where the previously discussed effects of string density and hadron rescattering are not taken into account. The normalized transverse momentum distributions of π ± and p,p are shown in figure 9 for the Gaussian and the thermodynamical model. Comparing the p ⊥ spectra to the previous plots (note the different range of x and y axis) reveals how much the p ⊥ distributions have moved to higher p ⊥ values, as mentioned above. Four different ratio plots are included for each histogram to investigate different effects: the ratio of distributions after hadronic decays with respect to before, for both models, and the ratio of the thermodynamical with respect to the Gaussian model, with and without decays.
Decays shift the p ⊥ spectra towards smaller values, where the Gaussian model shows a larger change, compared to the thermodynamical model. For LEP the difference between the predictions with and without decays is limited to around 50% at most, while for LHC the changes are rather large, especially for small and large p ⊥ values. Figure 9 also nicely  Figure 9: The p ⊥ spectrum for charged pions (left) and protons (right), with (blue) and without (red) hadronic decays for the Gaussian and with (orange) and without (green) hadronic decays for the thermal model in e + e − → jets (top) and inelastic pp collisions (bottom).  Figure 10: The p ⊥ spectrum for charged pions (left) and protons (right), with (blue) and without (red) hadronic decays for the Gaussian and with (orange) and without (green) hadronic decays for the thermal model in inelastic pp collisions (bottom) with hadron rescattering.
shows that the differences between the Gaussian and thermodynamical model become more than a factor of two smaller when hadronic decays are included, where, as before, the differences are more pronounced for LHC events. Unfortunately, this will limit the impact of the modifications previously discussed in this section.
Another question is how much of the hadron rescattering effect on the primary hadrons survives the decays. Note that some of the primary hadrons, such as the ρ meson, are so short-lived that some of their decay products could rescatter which would influence the p ⊥ spectra further. A realistic interleaving of rescattering and decays would require a detailed space-time picture, however, which is for the future.
In figure 10 the pion and proton p ⊥ spectra are shown again for LHC events with hadron rescattering. The same ratio plots as before are included. The ratio of predictions with to without hadron decays shows a similar behaviour as the plots before, where no rescattering was included. The difference between the two models without hadronic decays becomes smaller when hadron rescattering is included. This it not surprising since the effect of the rescattering, that of shuffling some p ⊥ from lighter to heavier hadrons, is smaller in the thermodynamical model where more massive hadrons obtain more p ⊥ already form the beginning. Including decays brings the predictions of the two models even closer together.

Adding more Hadrons
We now briefly investigate the effect of including additional hadrons in the flavour picking process in the thermodynamical model. e + e − → jets events at √ s = m Z are analyzed with the effects of string density and hadron rescattering not being used. As discussed in section 3.2, by default only hadrons with u/d/s quarks and no radial or orbital excitation are included. Firstly, consider including hadrons with charm quarks. To obtain a rough estimate of the suppression of c production in string breaks, compared to that of s quarks, the rates of D and K mesons and their ratios are analyzed; similar for vector mesons and baryons. The results in figure 11 show that the c hadrons are suppressed by more than an order of magnitude compared to s hadrons, although a bit less when only vector mesons are considered. In absolute numbers the amount of extra charm production is non-negligible, and probably inconsistent with both LEP and LHC observed rates. Recall that an additional suppression factor for s quarks was introduced for the hadron rates in section 3.2; we would therefore expect that a similar, even stronger factor is needed when including c quarks in the thermodynamical model. Given this, neither charm nor bottom production is included in the nonperturbative hadronization in the rest of our studies.
Secondly  mesons leads to an increase of the total meson multiplicity after decays. All L = 1 multiplets are suppressed by more than an order of magnitude with respect to the pseudoscalar multiplet, with the scalar mesons being suppressed the most due the combination of them being the heaviest of the considered hadrons and their smaller spin-state weight 2J + 1. The normalized p ⊥ spectra exhibit slight shifts towards smaller values, as the now included heavier mesons decay to more lighter hadrons. The excited mesons combined constitute a fraction of roughly 10% of the total meson multiplicity. Given that in addition those mesons and their decay channels are not very well understood, we consider it reasonable to not include those in further studies. In default Pythia the suppression of light vector mesons with respect to pseudoscalar mesons is ∼ 0.5. The thermodynamical naturally comes with a fairly similar value of ∼ 0.35.

Comparisons with Data
We now proceed to compare the models with data. Note that, in a first step, there is no ambition to obtain a better overall description than the one achieved in several of the standard tunes that come with Pythia. It is rather to explore how the modelling of the new mechanisms impacts selected distributions, notably the ones discussed in section 2.2. That is, whether the mechanisms have the potential to improve the agreement with data in some crucial respects. Only in a second step is there some attempt to combine the various mechanisms, but still without the ambition of a full-fledged tune. In section 4.1 we present a comparison of the different effects we have discussed so far, while section 4.2 gives an overview of the results obtained by combining the effects into a more complete picture. Note that the new mechanisms will be available in the next public Pythia release.

Impact of the Different Effects
Based on a limited set of LHC observables, this section presents the impact of the new mechanisms outlined in section 3 on the description of data. The observables have been chosen to illustrate the effects of the change of the Gaussian width or temperature, respectively, depending on the close-packing of strings as in eqs. (3.3) and (3.9), of hadron rescattering, and of colour reconnection. The latter has been included as it serves a similar purpose and shows a somewhat comparable behaviour. The baseline prediction, which serves as the main comparison for both models, is obtained by switching off all of the aforementioned effects. For a clear picture of the influence of the individual mechanisms, only one of them is switched on at a time. Note that the prediction of the Gaussian model with colour reconnection, labelled "Gaussian p ⊥ ColRec" in the plots, corresponds to default Pythia 8. Recall that the results presented in this subsection are not obtained with parameter settings that optimize the data description but rather illustrate their general effect. The average transverse momentum p ⊥ as a function of the hadron mass and the charged multiplicity are shown in figure 13, together with the charged particle p ⊥ spectrum.
For both models, the description of p ⊥ as a function of mass improves for each of the different mechanisms, compared to the baseline prediction, as heavier hadrons obtain larger p ⊥ values. The thermodynamical model provides a somewhat better description of this observable, compared to the Gaussian model, which comes naturally due to the exponential hadronic transverse-mass suppression.
The baseline prediction for p ⊥ (n ch ) plateaus at small multiplicities, therefore underestimating p ⊥ for values n ch 25. All of the effects investigated in this study have a somewhat similar effect, in the sense that they are able to push up the prediction, compared to the baseline settings. While including the n eff string -dependence significantly improves the description, it is still slightly worse than the prediction with colour reconnection. The hadron rescattering provides a fairly good description of p ⊥ for small n ch values, but clearly overshoots the distribution at high multiplicities.
Similar to the previous observable, the n eff string -dependence and colour reconnection improve the description of the inclusive p ⊥ spectrum. The Gaussian model without additional effects switched on produces a bump at p ⊥ ∼ 0.5 GeV/c and a broad dip at   Ch. p ⊥ vs. n ch at 7 TeV, p ⊥ track > 100 MeV, n ch ≥ 2, |η| < 2.5  Figure 13: Comparisons to ALICE [46] and ATLAS [47] data: p ⊥ as a function of the hadron mass (top), charged multiplicity (middle), and the charged particle p ⊥ (bottom). Predictions with the Gaussian (thermodynamical) model are shown in the left (right) plots. ColRec / HadScat / NrString means that only colour reconnection / hadron rescattering / n eff string -dependence is switched on, otherwise everything is switched off.
p ⊥ ∼ 2.5 GeV/c. While colour reconnection removes the dip almost completely, the bump is still clearly visible. The n eff string -dependence somewhat reduces both bump and dip, but at the cost of introducing another dip towards very small p ⊥ values. The baseline prediction of the thermodynamical model, compared with the Gaussian one, has the same dip at p ⊥ ∼ 2.5 GeV/c, while the bump is much less visible. Both colour reconnection and the n eff string -dependence reduce the dip quite substantially and provide a very good description of the data. The hadron rescattering, while somewhat improving the description in the low-p ⊥ region, overestimates mid-p ⊥ values by around 20% before undershooting the distribution.

Results
Using the information of the last subsection we adjust the parameters associated with the new mechanisms to obtain a good data description, with the main focus lying on p ⊥ spectra of pions, kaons, and protons. We begin with LHC data, as this is the motivation for the thermodynamical model, the n eff string -dependence, and the hadron rescattering, and continue with a cross-check of some LEP and SLC observables.

LHC
The new parameters are adjusted such that an improvement of the p ⊥ spectra of π ± , K ± and p,p, measured with ALICE [51], is achieved, while still giving a reasonable description of the charged particle p ⊥ distribution and p ⊥ as a function of the multiplicity, both measured with ATLAS [47]. The corresponding settings and values can be found in appendix A. The LHC data set presented here includes the aforementioned p ⊥ spectra and the Λ to K 0 S ratio shown in figure 14, p ⊥ as a function of the hadron mass and the charged multiplicity, both inclusive and for different hadrons, shown in figure 15, and the ratio of yields with respect to (π + +π − ) as a function of the charged multiplicity for different hadrons, shown in figure 16. The predictions of default Pythia are compared to the Gaussian and thermodynamical model with the modifications outlined in section 3.
Default Pythia describes the ATLAS charged particle p ⊥ distribution very well for values of p ⊥ > 1 GeV/c, but shows a bump at around 0.5 GeV/c. The Gaussian model with modifications gives a similar shape and reduces the bump somewhat, while undershooting the distribution large p ⊥ by a few %. The thermodynamical model improves the description quite substantially, especially for low-p ⊥ values, where the aforementioned bump is almost gone. The predictions for the CMS charged hadron p ⊥ spectrum behave mostly similar, with the same bump visible for default Pythia and the Gaussian model.
For default Pythia, pions obtain a too hard p ⊥ spectrum. The modifications to the Gaussian model improve the distribution slightly, but there is still no good overall description. With the thermodynamical model the spectrum improves for low-p ⊥ values quite a bit; however, it is still a bit too high in the large-p ⊥ region. The K ± p ⊥ spectrum shows the opposite behaviour: too many soft and too few hard kaons. The Gaussian model with modifications improves the description in the soft region somewhat, compared to default Pythia. Both the Gaussian and thermodynamical model change the shape of the spectrum slightly, but do not provide a better overall description of the K ± p ⊥ .   Figure 14: Inclusive (top), π ± , K ± , and p, p (middle and bottom left) p ⊥ spectra and the Λ to K 0 S ratio (bottom right). Predictions of default Pythia, the Gaussian and thermodynamical model with modifications, compared to ATLAS [47], CMS [52,83] and ALICE data [51].
Mean transverse momentum vs. mass at 7 TeV, |y| < 0.5  [46,48] and ATLAS [47] data. The data in the bottom plots is taken to be an estimate of the logarithmic fits in [48] and therefore no error bars are included. Ratio of yields to (π + + π − ) at 7 TeV, |η| < 0.5  Figure 16: Ratio of yields with respect to (π + + π − ) as a function of the charged multiplicity. Predictions of default Pythia, the Gaussian and thermodynamical model with modifications. The ALICE measurement can be found in [10].
The prediction of both models for the p,p p ⊥ spectrum are better compared to default Pythia, where especially the thermodynamical model improves the low-p ⊥ region quite substantially.
While the prediction of the thermodynamical model for the Λ/K 0 S ratio is somewhat flatter with respect to the data, especially in the low-p ⊥ region, the normalization is off by almost a factor of two due to the combination of producing slightly too many K 0 S and not enough Λ. The observable could be improved by adjusting the overall normalization factor of baryons with respect to mesons. The value of this parameter has been fixed using the proton p ⊥ spectrum, however.
All models give very similar predictions for p ⊥ as a function of n ch , with an extremely good description of the region n ch > 20, but too low p ⊥ for smaller multiplicities. It is quite obvious that the description of this observable could be further improved by choosing a larger value for the width or temperature respectively and simultaneously lowering the n eff string -dependence and hadron rescattering. This would however come hand in hand with worse descriptions of other observables. For default Pythia, pions obtain a too large p ⊥ and heavier hadrons a too small one. While the thermodynamical model improves the predictions for p ⊥ (m), there is still no full agreement with data. The Gaussian model lies in between default Pythia and the thermodynamical model. We observe a similar behaviour for the p ⊥ (n ch ) distribution for individual hadrons. The pion p ⊥ is described fairly well with a slope that is slightly too steep. The main difference of the other hadrons with respect to pions is that they obtain a too small p ⊥ over the whole n ch range. As for pions, the slopes tend to be too steep.
ALICE [10] found that the production of strange and multi-strange hadrons is enhanced with increasing multiplicity. While default Pythia is not able to reproduce such a behaviour, figure 16 shows that the thermodynamical model achieves an increase of strangeness with charged multiplicity for K 0 S , Λ, and Ξ, but not for Ω. Except for the latter, we therefore expect the thermodynamical model to give an improved description of the data presented in [10]. The Gaussian model with modifications shows the opposite effect, a decrease with growing multiplicity. These findings can be explained as follows: In default Pythia all (primary) hadrons are produced with a probability that is independent of the multiplicity or number of strings. In the thermodynamical model heavier hadrons are produced preferably at large p ⊥ values. Including the n eff string -dependence leads to potentially higher temperatures for events with large n ch , where heavy hadrons have a higher probability to be produced, compared to low-n ch events. With the modifications to the Gaussian model, all hadrons obtain more p ⊥ in events with large values of n eff string . Due to phase-space constraints heavier hadrons might be rejected more often compared to lower-mass hadrons, leading to the decrease with growing multiplicity. This might also be the reason for the slight drop towards large n ch for Ω in the thermodynamical model, as it eventually dominates over the effect of the n eff string -dependence.

LEP and SLC
While the main motivation for introducing the exponential p ⊥ distribution is arising from LHC data, the valid question of whether the same model is able to describe e + e − observables as well remains. The effect of the close-packing of strings and hadron rescattering are not included for e + e − data as we do not expect them to represent relevant physics here. Furthermore the string dependence relies on rapidity differences and an event axis aligned with the beam, which is not present in e + e − collisions.
The parameters of the thermodynamical model are adjusted using the charged particle momentum spectrum as well as the scaled momenta of π ± , K ± and p,p, measured with SLD [84], while the ALEPH event shapes [85] served as cross checks. For the Gaussian model the width and prefactors for strange and diquarks have been adjusted such that the mean charged multiplicities agrees with the value obtained with the thermodynamical model. The corresponding settings and values can be found in appendix A. The e + e − data set presented here includes the aforementioned momenta and mean multiplicities for different hadrons shown in figure 17, as well as the charged multiplicity distribution, scaled momentum and the inclusive p ⊥in and p ⊥out spectra shown in figure 18. The predictions of default Pythia are compared to the Gaussian and thermodynamical model as described above.
The only difference between default Pythia and the prediction labelled as "Gaussian p ⊥ " is an adjusted value for the Gaussian width and its prefactors for s and diquarks, i.e. there is no change of the flavour selection parameters. Therefore, the values of the mean multiplicities remain, leading to overlapping data and Monte Carlo histogram points in figure 17, which are thus not fully visible. With the thermodynamical model we obtain a fairly good description of most hadrons, with the notable exceptions of producing too many heavy baryons. Note however, that the Gaussian model comes with around 20 parameter for selecting the flavour of new hadrons, whereas the thermodynamical model makes use of only three parameters: the temperature, the overall normalization factor of baryons with    Figure 17: Mean hadron multiplicities (top), charged particle momentum (middle left), and scaled momenta x p = 2|p|/E cm of π ± (middle right), K ± (bottom left) and p,p (bottom right). Predictions of default Pythia 8, the Gaussian and thermodynamical model compared to PDG [72] and SLD data [84].  respect to mesons, and the additional suppression factor for hadrons with strange quarks, see section 3.2. Hence, the result is fairly acceptable. The predictions of the two models for the charged particle momentum agree very well with data in the soft region; there is only some small deviation for medium and large momenta, where especially the thermodynamical model predicts somewhat too many particles in the hard region. The same effect is even clearer visible in the scaled momentum spectrum of pions. For kaons and protons we observe the opposite effect: the new model predicts too few hadrons with large momenta. Similar to the charged particle momentum, the predictions of both models for the logarithm of the scaled momentum agrees well with data, with some small deviation for medium and large values. However, the sudden drop in the ratio of the Monte Carlo prediction to data at around ξ p = 4.7 remains almost unchanged. The description of the charged multiplicity distribution improves slightly, compared to default Pythia, towards having less events with small multiplicities and more events with larger ones. This is due to having an increased mean charged multiplicity. While both models describe the low-p ⊥ region of the inclusive p ⊥in and p ⊥out spectra very well, they underestimate the amount of events with larger p ⊥ values. The thermodynamical model provides a better description of especially the p ⊥out spectrum, compared to the Gaussian model.
To summarize we note that the thermodynamical model is able to provide predictions for event shapes and momentum spectra in e + e − events that are of a similar quality as those by the Gaussian model. Nevertheless, the hadron decomposition is not described well, a price to pay for reducing the amount of flavour selection parameters.

Summary and Outlook
The understanding of soft hadronic physics is changing under the onslaught of LHC pp data. Of course, there has never been an approach that could describe all aspects of pp physics perfectly, but before LHC it was often assumed that all the basic concepts were in place, and that what remained was successive refinements. Now we see that there is still much left to learn. There have already been several surprises, and further data analyses may well produce more.
In view of this we have revisited some of the basic soft-physics assumptions of the Pythia event generator, which has been quite successful in predicting and describing many aspects of the data, but now starts to show cracks. New approaches have here been studied for some areas, to understand how much room for improvements there would be, without any claim that either of them would necessarily be the one and only right way to go. A central pillar of Pythia has been the Lund string fragmentation model, where a tunneling mechanism for string breakups leads to a universal Gaussian p ⊥ spectrum. In this work a thermodynamical model is implemented as an alternative, where p ⊥ instead follows an exponential distribution. For an already selected p ⊥ , the hadron flavour is picked based on an exponential m ⊥ weight, with additional factors due to spin-counting rules and so on. This approach suppresses the production of heavier hadrons, and gives them a larger p ⊥ . Such a pattern is observed in data, and exists in the Gaussian approach mainly owing to particle decays, but there undershoots data.
Making the Gaussian p ⊥ width, or temperature in case of the thermodynamical model, dependent on the close-packing of strings allows for modelling the influence of strings on each other in a simple way. An effective number of density of strings is introduced for low p ⊥ 's, while high-p ⊥ fragmentation tends to occur outside the close-packed string region and is left unaffected. Such a mechanism could e.g. be used to explain a changing flavour composition at high multiplicities.
Finally we implemented a simple model for hadronic rescattering, applied to the primary hadrons, before decays. The probability of two hadrons to rescatter is based on how close they are in phase space. By favouring a shift towards equal transverse velocities, it should also give higher p ⊥ for heavy hadrons and lower for pions.
Not surprisingly we found the hadronic decays to limit the hoped-for effects. Specifically, most pions come from decays of heavier hadrons, and so the mechanisms intended to give less p ⊥ to pions and more to kaons and protons are largely nullified. The mechanisms are also not simply additive; starting out from the thermodynamical model with its already-existing mass differentiation, the further effects of varying temperatures or hadronic rescattering are smaller than corresponding effects in the Gaussian approach. Nevertheless the thermodynamical model is able to provide reasonable descriptions of observables such as the p ⊥ spectrum of charged hadrons, the average transverse momentum as a function of the hadron mass, or the recently measured enhanced production of strange and multi-strange hadrons with increasing multiplicity. These observables have so far been described rather poorly by Pythia. And, given the small number of flavour parameters in the thermodynamical model, it is able to describe a reasonable number of e + e − data rather well, even if it can not compete with the many-more-parameter tunes of default Pythia.
It should be noted that we have not compared with all relevant available data, by far. Notably, the ridge effect was not described by the existing Pythia model, and our current changes do not introduce any mechanism to induce it. The ridge was first observed in AA collisions [86][87][88][89], where nuclear geometry and hydrodynamical expansion offer natural starting points [90,91], although the range of detailed models is too vast to cover here [92]. In the field of pp physics [93], the EPOS model addresses the issue by having an inner core that can push strings in the outer corona [94], whereas a recent extension of DIPSY [95] provides a corresponding shove from the excess energy of central overlapping strings that form ropes. In a similar spirit, our higher string tension could introduce a push also without rope formation. A detailed modelling is not trivial, however, and we have not pursued it for now. To advance to the next level of sophistication within the line of research advocated here, it would be necessary to do a microscopic tracing of the full space-time evolution of the event, both for partons and for hadrons, and including both production and decay vertices. This is nontrivial beyond the simple one-string picture, even in the cleaner e + e − events, and the further complications of MPIs and CR in hadronic events will make it even worse. What it would allow is a more detailed understanding of the close-packing both of strings and of hadrons. Combined with a more detailed modelling of hadronic rescattering, a more realistic picture may emerge.
Some of the limitations encountered here are likely still to remain, so further mechanisms may be at play, in addition to the ones studied here. This would not be the first time where a cocktail of smaller effects combine to give a significant signal. What is less likely is actually the opposite, that one single mechanism does it all. Specifically, whatever else may be going on, the close-packing of strings and hadrons appears unavoidable in high-multiplicity pp events, and collective-flow effects are here to stay. In sum, we have an interesting and challenging time ahead of us, where some of the most unexpected new LHC observations may well come in the low-p ⊥ region rather than the in high-p ⊥ one.