The space–time structure of hadronization in the Lund model

The assumption of linear confinement leads to a proportionality of the energy–momentum and space–time pictures of fragmentation for a simple \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{q}\bar{\mathrm{q}}$$\end{document}qq¯ system in the Lund string model. The hadronization of more complicated systems is more difficult to describe, and in the past only the energy–momentum picture has been implemented. In this article also the space–time picture is worked out, for open and closed multiparton topologies, for junction systems, and for massive quarks. Some first results are presented, for toy systems but in particular for LHC events. The density of hadron production is quantified under different conditions. The (not unexpected) conclusion is that this density can become quite high, and thereby motivate the observed collective behaviour in high-multiplicity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{p}\mathrm{p}$$\end{document}pp collisions. The new framework, made available as part of the Pythia event generator, offers a starting point for future model building in a number of respects, such as hadronic rescattering.


Introduction
The Standard Model of particle physics is solidly established by now, and has been very successful in describing all perturbatively calculable observables for LHC pp collisions, i.e. those dominated by large momentum transfer scales [1]. But at lower scales the perturbative approach breaks down, and phenomenological models have to be developed.
One of the underlying assumptions for these models has been that the nonperturbative hadronization process, wherein the perturbatively produced partons turn into observable hadrons, is of a universal character. Then relevant nonperturbative parameters can be determined e.g. from LEP data, and afterwards be applied unmodified to LHC pp collisions. The hadronizing partonic state is quite different in the two processes, however. Firstly, the composite nature of the incoma e-mail: silfer@nikhef.nl b e-mail: torbjorn@thep.lu.se ing protons leads to multiple semiperturbative parton-parton collisions, so-called MultiParton Interactions (MPIs) [2,3], and also to beam remnants and initial-state QCD radiation. Secondly, the high number of interacting partons leads to the possibility of nontrivial and dynamically evolving colour topologies, collectively referred to as Colour Reconnection (CR) phenomena. Both MPIs and CR need to be modelled, and involve further new parameters. (CR has been observed in the cleaner e + e − → W + W − process by the LEP collaborations [4], but that information is not easily transposed to the pp context.) The most successful approach to providing a combined description of all relevant phenomena, at all scales, is that of event generators. Here Monte Carlo methods are used to emulate the quantum mechanical event-by-event fluctuations at the many stages of the evolution of an event [5]. For pp physics the three most commonly used generators are Pythia [6,7], Herwig [8,9] and Sherpa [10]. Fragmentation here proceeds either via strings [11], for the former, or via clusters [12], for the latter two. A note on terminology: "fragmentation" and "hadronization" can be used almost interchangeably, but the former is more specific to the breakup of a partonic system into a set of primary hadrons, whereas the latter is more generic and can also include e.g. decays of short-lived resonances.
In spite of an overall reasonable description, glaring discrepancies between data and models have been found in some cases. Most interesting is that high-multiplicity LHC pp events show a behaviour that resembles the one normally associated with heavy-ion collisions and the formation of a Quark-Gluon Plasma (QGP). In particular, ALICE has shown that the fraction of strange baryons increases with multiplicity, the more steeply the more strange quarks the baryon contains, while the proton rate is not enhanced [13]. Long-range azimuthal "ridge" correlations have also been observed by both CMS [14,15] and ATLAS [16], as well as other signals of collective flow [17][18][19]. This is unlike conventional expectations, that QGP formation requires volumes and timescales larger than the one that can be obtained in pp collisions [20][21][22]. Nevertheless core-corona models have been developed, like the one implemented in EPOS [23], where a central high-density region can turn into a QGP, while the rest of the system remains as normal individual strings. Other mechanisms that have been proposed include rope formation [24] and shoving [25], or an environment-dependent string tension [26]. Common for all of them is that they introduce a space-time picture of the collision process.
In the traditional Lund string model [11] the linear confinement potential leads to a linear relationship between the energy-momentum and space-time pictures of a simple qq fragmenting system. Many of the above models are based on the approximation of a number of such simple strings, parallel along the pp collision axis but displaced in the transverse plane by the collision/MPI geometry.
For a generic multiparton system, like qg 1 g 2 . . . g nq , only an energy-momentum picture has been available until now [27]. The purpose of this article is to overcome that limitation, and provide a full space-time picture of the hadronization process, as part of the Pythia event generator. 1 This will offer a natural starting point for more detailed future studies of a number of collective effects. The models mentioned above deal with the space-time structure before (like core-corona or shove) or during (like ropes or QGP) fragmentation. To this we would also add a possibility for studies of what happens after the first stages of the hadronization, when hadronic rescattering and decays can occur in parallel. In addition to the already mentioned observables, Bose-Einstein correlations could also be used to characterize final states.
A warning is that we are applying semiclassical models to describe the quantum world. Formally the Heisenberg uncertainty relations impose limits on how much simultaneous energy-momentum and space-time information one can have on an individual hadron. Our approach should still make sense when averaged over many hadrons in many events, as will always be the case.
The plan of the article is as follows. Section 2 gives a brief summary of relevant earlier work, on the "complete" description of the simple qq system [11], and on the energymomentum picture of an arbitrary partonic system [27]. Section 3 then introduces the new framework that provides a space-time picture also in a general configuration. Several special cases need to be addressed, and technical complications have to be sorted out, with some details relegated to two appendices. Section 4 contains some first studies, partly Fig. 1 A simplified colour-field topology in a qq system and its further simplified string representation for toy systems but mainly for LHC events. This is without any of the collective effects that may be added later, but still provides an interesting overview of the overall space-time evolution of hadronization at the LHC. Finally, Sect. 5 concludes with a summary and outlook.
Natural units are assumed throughout the article, i.e. c = h = 1. By default energy, momentum and mass is given in GeV, and space and time in fm.

The linear force field in QCD
Confinement is one of the most fundamental properties of QCD. It can be viewed as a consequence of an approximately linear term in the QCD potential, between a quark and an antiquark in an overall colour singlet state, where r is the distance between them and α s is the strong coupling constant. The presence of a linear term was first inferred from hadron spectroscopy (Regge trajectories), from which a κ ≈ 1 GeV/fm can be extracted, and has later been confirmed by lattice QCD calculations. The linear term dominates at large distances, and in the Lund string model only this term is used to describe the breakup of a high-mass qq system into several smaller-mass ones. Then the full colour field can be approximated by a one-dimensional string stretched straight between the q and q, Fig. 1. This string can be viewed as parametrizing the center of a cylindrical region of uniform width along its full length, such that the longitudinal and transverse degrees of freedom almost completely decouple.

The two-parton system
The Lund model is easiest to understand in the context of a simple quark-antiquark pair created at the origin (e.g. by e + e − annihilation) and moving out along the ±z axis. Neglecting the transverse degrees of freedom, the Hamiltonian can then be written as [11] H = E q + Eq + κ|z q − zq|.
Here |z q − zq| is the distance between q andq, and E q and Eq are the energies of the q andq. With both assumed massless, it also holds that E q/q = |p q/q | = |p z,q/q |. From the Hamiltonian, the equation of motion gives rise to a linear relation between the space-time and the energymomentum pictures, The signs of the derivatives depend both on the direction of motion of the parton and on the direction the string pulls it in. When the parton moves out along the +z axis, e.g., the string pulls the parton in the −z direction, and all signs are negative.

Simple string motion
In the absence of string breaks, the motion of the simple qq system in its rest frame can be described as a "yo-yo" motion, where a string is alternatingly "reeled out" and "reeled in", Fig. 2a. In the first quarter of a period the q andq are moving apart from each other with the speed of light, z = ±t, such that the string length l string = 2t. Therefore, the fourmomenta of the q,q and the string evolve with time as where E cm is the center-of-mass energy of the full system. At time t = E cm /2κ all the energy is carried by the string, whose string tension then forces the q andq to turn around.
In the second quarter of the period the string length decreases like l string = 2(E cm /κ − t), and energy and momentum is transferred back to the q/q. At t = E cm /κ the string has vanished and the q/q are back at the origin, but now moving in the ∓z direction. The second half of the full period therefore becomes a repeat of the first half, only with the role of q and q interchanged. Normally string breaks are assumed to occur so rapidly that only the first quarter of the first period needs to be considered. The kinematics of the yo-yo motion can conveniently be rewritten in terms of light-cone coordinates, both in energymomentum,p ± = E± p z , and in space-time,z ± = t±z. For instance, for the quark in the first quarter period, such that dp + q /dz + q = −κ.p ± also obey the relatioñ which reduces top +p− = m 2 when p x = p y = 0. The simplest yo-yo system can be generalized as illustrated in Fig. 2b, where the quark and the antiquark have different initial energies, E q = Eq. Equivalently, this system can be viewed as a boosted copy of the restframe setup in Fig. 2a. The energy-momentum and spacetime coordinates suffer simultaneous transformations under a longitudinal boost, and Eq. (3) holds also after the boost. The transformation is especially easily formulated in light-cone coordinates, wherep ± = k ±1p± with k = √ (1 + β)/(1 − β) for a boost with velocity β, and similarly forz ± .
Note that a string piece with E = κl but p z = 0 in the original rest frame will obtain a p z = 0 after a boost to the frame with E q = Eq. This is in seeming contradiction with a description set up in a rest frame where E q = Eq from the onset, where one would again expect p z = 0. The solution is that a string piece is an extended object, so that the two ends of it, if originally simultaneous, will no longer be it after the boost. Only a string piece at constant time in the new frame will obey E = κl and p z = 0 there.

String breaking and hadron formation
As described in the previous section, the potential energy stored in the string increases with the separation between an original q 0 andq 0 . This makes the creation of a new q 1q1 pair in the string energetically favourable, if the invariant mass of the system is big enough. It is here assumed that colours are matched so that the original colour-singlet q 0q0 string breaks into two pieces, q 0q1 and q 1q0 , that separately are colour singlets. By local flavour conservation the q 1 andq 1 have to be created in the same vertex. They are created with vanishing energy, and are then pulled apart by the string tension. Naively, the probability for the string to break increases with time, because the string length increases. On the other hand, a break can inhibit later breaks, since each break fragments the string into two smaller systems, leaving an in-between region without a string. If on-mass-shell criteria for hadrons are ignored, as in the Artru-Mennessier model [29], a naive constant breakup probability per unit string area then is modified by an exponential-decay factor.
In general, several string breaks can occur between the q 0 andq 0 . Consider two adjacent ones, b 1 at (t 1 , z 1 ) and b 2 at (t 2 , z 2 ), as depicted in Fig. 3. The q 1 from the b 1 vertex combines with theq 2 from the b 2 vertex, forming a hadron q 1q2 with mass m h . Since q 1 andq 2 are created with no energy-momentum, the four-momentum of the hadron entirely comes from the intervening string piece, which can be read off like [11] Next consider the quantitiesx ± and x ± , illustrated in Fig. 3a. The former represent the light-cone coordinates of breakup vertices, scaled by the corresponding coordinates of the q 0 andq 0 turning points, so as to be restricted to a physical region 0 ≤ x ± ≤ 1. The latter represents the light-cone separation between two (adjacent) breaks, correspondingly scaled. For the q 1q2 hadron this translates to Defining the two fourvectors p + = p q 0 (t = 0) = E q 0 (1; 0, 0, 1) and p − = pq 0 (t = 0) = Eq 0 (1; 0, 0, −1), and using the proportionality between space-time and energy-momentum, the hadron four-momentum then becomes Although Eq. (7) has been derived for a system in which q andq are moving in opposite directions, it is valid in all frames, which makes thex ± coordinates and x ± fractions most useful. Since E 2 cm = ( p + + p − ) 2 = 2 p + p − , the hadron mass obeys Do note the factor of 2 for the p ± vectors in E 2 cm = 2 p + p − , as opposed to the relation E 2 cm =p +p− for the two scalar quantitiesp ± , and correspondingly for the hadronic subsystems.
Each breakup vertex is characterized by its invariant time τ . A convenient corresponding energy-momentum quantity is which geometrically corresponds to the string area in the backwards light cone of the vertex. Using the notation of Fig. 3a it can also be expressed as

Selection of breakup vertices
The breakup vertices are causally disconnected. That is, b 1 and b 2 in Fig. 3 have a spacelike separation. Which happens first then depends on the Lorentz frame in which the event is studied. It is therefore possible to describe the fragmentation process starting from the hadron closest to the q 0 end and then moving towards theq 0 one, or the other way around. Assuming e.g. that b 1 has already been selected, so thatx ± 1 are fixed, then the selection of the two x ± h values of the hadron defines the location of b 2 . But, assuming that the hadron and its mass are already specified, the mass constraint in Eq. (8) reduces it to one degree of freedom. For the fragmentation from the q 0 side this is conveniently chosen to be the x + h values. More specifically, z + fractions are introduced, as illustrated in Fig. 3b, as the hadron fraction of whatever system light-cone momentum that still remains after the production of previous hadrons. That is, the first hadron q 0q1 acquires a fraction z + 1 = x + 1 of the totalp + of the system, while the remnant-system is left with ap + fraction of 1 − z + 1 = 1 − x + 1 . The second hadron q 1q2 takes a fraction z + 2 from the leftoverp + , i.e.
Since the fragmentation process is iterative, the x ± fractions related to hadron i can be written as where the relation for x − i is given by Eq. (8). Alternatively the fragmentation could have been described from theq 0 end in terms of negative light-cone fractions z − and x − . Since the breakup points are causally disconnected, the two procedures should result in the same average particle distribution. This requirement, "left-right symmetry", gives a probability distribution [11,30] for the z value of each new hadron, where z = z + (z = z − ) for fragmentation from the q 0 (q 0 ) end. The a and b are parameters that should be tuned to reproduce the experimental data. Hence, f (z) determines how the individual vertices correlate in order to create a hadron of mass m h by taking a fraction z of the energy-momentum left in the system. Note that the form of f (z) does not depend on previous steps taken, which leads to a flat rapidity plateau of the inclusive hadron production.
The Γ values of breakup vertices can be obtained recursively by simple geometrical considerations, where Γ i and Γ i−1 are the scaled squared invariant times of the i and i − 1 breakups, respectively. The q 0 andq 0 turning points define Γ 0 = 0. The inclusive Γ distribution, after some steps away from the endpoint(s), is with the same a and b as in Eq. (12). The breaks of the string can be determined from Eq. (11) by iteratively picking z i values according to Eq. (12) for the hadrons with masses m i . This works well for the simple Fig. 4 Hyperbolae of constant Γ and m 2 h represented by dashed and full lines, respectively q 0q0 system, but Eq. (11) will not hold in systems with more than two partons. To this end an alternative procedure can be introduced [27] via Γ recursion. Here a z = z i is still selected by Eq. (12), and converted to a Γ i by Eq. (13). As illustrated in Fig. 4, each fixed Γ value corresponds to a hyperbola with the origin as its center. Correspondingly each fixed m i corresponds to a hyperbola with the i − 1 vertex as its center. Therefore a given (m i , Γ i ) pair corresponds to the unique crossing of two hyperbolae at the location of the next vertex.

The tunneling process
Up to this point, we have assumed that the q iqi pairs generated from string breakings are massless and have no transverse momenta. Both q i andq i are then created as real particles at a common space-time location, with vanishing energy-momentum. If the pair is massive or carries transverse momentum, the q i andq i still have to be created in the same space-time location, but as virtual particles. Each now has to tunnel out a distance l = m ⊥ /κ to acquire enough energy from the string to correspond to its transverse mass m ⊥ . This tunneling results in a Gaussian suppression factor A consequence of this mechanism is the suppression of heavy quark production in string breaks, approximately like uū : dd : ss : cc ≈ 1 : 1 : 0.3 : 10 −11 [11]. It is therefore assumed that c and b production only occurs by perturbative processes. The combination of a q i−1 and aq i gives the flavour of a meson but does not fully specify it. The quark spins can combine e.g. to produce a pseudoscalar or a vector meson, and flavour-diagonal mesons mix, and so on. All of these aspects are relevant for the model as a whole, but for the considerations in this article we only need to know the masses of the produced mesons. Similarly for baryon production, where the production mechanisms are less well understood, whether The cc system and the equivalent system formed by massless q andq "diquark" or "popcorn" [31,32]. In the latter approach actually three different production vertices are involved, one for each of the quarks, but also here an effective description in terms of two, as for mesons, is meaningful. A diquark is taken to be a colour antitriplet, just like an antiquark, and we thus use the notationq as shorthand for either of them.
Since the string itself has no transverse motion, it is assumed that the transverse momentum is locally compensated inside each q iqi pair. The transverse momentum of a hadron q i−1qi is then given by the vector sum of its constituent transverse momenta. The hadron masses in Sect. 2.2.2 have to be replaced by the corresponding transverse masses.

Massive quarks
Although massive quarks are not created from string breaking, they can be generated in the hard process and form a system that might fragment further. In this section, the yo-yo model is extended to account for massive quarks as the endpoints of the system. Since the massive q andq do not travel at the speed of light, their motion is described by hyperbolae instead of straight lines.
To study the motion of the massive yo-yo system, consider a cc system in the CM frame, in which c andc are moving along the z axis in opposite directions. The massive yo-yo system is depicted in Fig. 5a, along with the massless case for comparison. At time t = 0, The proper relativistic definition of force, d p z /dt = ±κ, then gives with the motion of thec its mirror image. Notice that the oscillation time is reduced by a factor p 0 /E 0 relative to the massless system with the same E 0 .
Although the motion properties of the massless and massive cases hold in every longitudinal boosted frame, the effects of boosts are simpler to address for massless quarks. A useful trick is to replace the effect of the quark mass by an extra string piece of length (E c (t) − p z,c (t))/κ at each endpoint. Its length is m q /κ at the turning point, see Fig. 5b, where the massive motion is illustrated by the hyperbolae, whose asymptotes are the straight lines of the massless case. Thereby time t = 0 is also offset to account for the reduced oscillation time. The extra string piece is purely fictitious and does not break during the fragmentation process. The physical region, between the hyperbolae, is highlighted in grey in Fig. 5b. Given that the hadron created from the endpoint is always heavier than the endpoint quark, all the hadrons are automatically created inside the physical region.
The four-momenta of the massless reference four-vectors have to be linear combinations of the massive quark fourmomenta for Lorentz covariance reasons. If p q and pq are the four-momenta of the massive quarks, while p 0q and p 0q are the massless four-momenta, the relation between them becomes where the k 1 and k 2 values are fixed by p 2 0q = p 2 0q = 0 [27].

Multiparton systems
Next, more complicated string topologies need to be considered. An example is the Z 0 decay into a pair of massless quarks, either of which can emit a gluon: Both such radiation and the hadronization can occur over widely varying time scales in high-energy events, but in a local context the radiation takes place at time scales shorter than those of the hadronization itself. As a reasonable first approximation all three partons can thus be assumed created at the origin. In the Lund model the colour flow is based on the limit of infinitely many colours [33]. Then there is one string piece from the q to the g and another from the g to theq, and the two do not interfere. The gluon thus can be viewed as a kink on a single string stretched from the q to theq. The motion of the qgq string system can conveniently be studied in a Lorentz frame where the q moves in the +z direction, g in the +x direction andq in the −z direction. Two string pieces are present initially, as illustrated in view 1 of Fig. 6. Each string piece defines a separate string region, which behaves similarly to the string piece of a qq system, except that it is now transversely boosted. The region formed by the qg string evolves with time as and correspondingly forqg, but with z → −z. Note the factor of two in the gluon four-momentum, which comes from the loss of energy-momentum to both string pieces attached to it. Unlike the simple qq system, the two string pieces are not at rest, but move in the transverse direction: the qg string piece has a velocity vector v x = v z = 1/2, while for the gq piece v x = −v z = 1/2. The energy per unit string length is higher than for a string at transverse rest, but the lower string length drawn out per unit time exactly compensates, such that the force acting on the endpoints is of the same magnitude [27]. After time t = E g (0)/2κ the gluon has lost all its energy and a new string piece is created by the inflowing momentum from the q andq, and is hence denoted as the qq region, see view 2. Later the q has also lost all its energy and starts to move in the +x axis as it absorbs g four-momentum. The q eventually gains and re-emits half of the gluon energy, views 3 and 4. Subsequently it absorbs originalq four-momentum and moves in the −z direction, views 5 and 6. A similar process occurs forq. As shown in view 7, the gluon will eventually reappear, and in view 8 the sequence starts to repeat, only with the momenta of q andq swapped.
Although Fig. 6 is useful to visualize the time evolution of the system, the parameter plane picture is most convenient when addressing the kinematics [27]. This is a diagram that Fig. 7 The parameter plane picture for the qgq system. The dash lines indicate the turnover regions, normally neglected displays the different string regions in terms of the light-cone four-vectors defining each region, i.e. p + q = p q , p − q = pq and p + g = p − g = p g /2 in the qgq case, whose parameter plane is displayed in Fig. 7. The low regions represent the states in which none of the partons have lost their energy, corresponding to the two string regions in view 1 of Fig. 6, the qg and the gq string pieces. The intermediate region corresponds to the new string piece created from the q andq momenta once the gluon has lost all its energy. Finally, the upper regions are related to the two string pieces formed when g re-appears. Although the complete parameter plane picture (for half a period) is the one shown in Fig. 7, the dashed upper regions are normally neglected, since the system is assumed to fragment before then. This reasonable assumption avoids a large number of complications for handling fragmentation in these regions. The three remaining regions are then formed by the combination of one + component and one − one, where ± no longer relates to motion along the ±z axis, but more generically denotes the reference vector directed towards (+) or away from (−) the q end of the system. From the parameter plane picture, the equations defining the hadron properties and the fragmentation process of Sect. 2.2 can be easily generalized. For the qgq system, the hadron momentum can generically be written as, The hadron mass enters via the constraint The level lines of constant m h and constant Γ again give hyperbolae inside each string region, where physically allowed, which connect at the borderline between regions. As before there will be (at most) one allowed solution to a given (m h , Γ ) pair, which can be found by starting in the current region and, if not found there, step by step move on to other possible regions. There are a number of complications that have to be overcome to do this [27]. The parameter-plane picture can be extended to a multiparton system, resulting in the most convenient approach to study the kinematics of any multiparton system. As an example, the parameter plane for a system consisting of three gluons, one quark and one antiquark is depicted in Fig. 8, where the turnover regions have been ignored. In such a system, there are four low regions, or initial regions, and six intermediate regions. The number of initial and intermediate regions can be generalized for any multiparton system formed by n partons, out of which n −2 are gluons, as n −1 initial regions and (n − 1)(n − 2)/2 intermediate regions. The expression for the hadron four-momentum can also be generalized to an n-parton system by accounting for the momenta taken from each parton as where usually most of the x ± vanish. Apart from these aspects, the rest of the properties are similarly determined as in previous cases.

Fragmentation implementation summary
The fragmentation process in Pythia is based on the fourmomenta of the partons created in the (semi)perturbative stages of the collision process, plus the partons in the beam remnants [34]. By the colour-connection between those partons, an LHC event is likely to contain several qg 1 g 2 . . . g n−2q systems, that can be handled separately.
The production of each new hadron begins with the selection at random of whether to split it off from the q end or from theq one of the system. The flavour of a new qq break of the string (where q may also represent an antidiquark), leads to the formation of a new hadron, as already described. Its mass is selected, according to a Breit-Wigner for short-lived particles with a non-negligible width. The transverse momentum is obtained as the vector sum of those of the hadron constituents, assuming that the old and new breakup vertices are in the same region. Then the longitudinal momentum fraction z is picked according to the probability distribution in Eq. (12), with the difference that the hadron mass has to be replaced by the transverse ditto, m h → m ⊥h . In a simple qq system, the new breakup vertex is easily obtained from the (m ⊥h , z) pair. Else the Γ i value of the new breakup is calculated using Eq. (13), again with m h → m ⊥h , and a solution is sought to the (m ⊥h , Γ i ) pair of equations. Vertex i may end up in the same string region as i − 1, or involve a search in other regions. Among technical complications of this search is that the transverse directions are local to each string region, which leads to discontinuities in the hyperbolae of constant m ⊥h at the borders between string regions, that would not be there for p ⊥ = 0.
The random steps from both string ends continue until the remaining invariant mass of the system is deemed so small that only two final hadrons should be produced. Details on this final step can be found in "Appendix A", along with the challenges encountered when implementing the space-time picture and the methods applied to solve them. Had the fragmentation always proceeded from the q end, say, the final step would always have been at thē q end, with the minor blemishes of this step concentrated there. Now these are instead smeared out over the whole event. Fig. 9 Hadron formation in a qq system. The blue, red and green dots represent the "early", "middle" and "late" definitions of hadron production points, respectively

The space-time description
So far, the fragmentation process in Pythia was developed in terms of the energy-momentum fractions x ± and z ± of breakup vertices and hadrons, presented in Sect. 2.2.2. Therefore, the location of the breakup vertices is only specified in the energy-momentum picture. In order to study the density of hadron production, this information should first be translated to the space-time one, which will be done in this section.

The two-parton system
To begin, consider a breakup point i in a simple qq system. Its location with respect to the origin of the energy-momentum picture, where q andq have been created, is given by thex ± fractions. Then, considering p + to be the q four-momentum and p − theq four-momentum, the location of breakup i in the energy-momentum picture is defined asx + i p + +x − i p − . Recalling the linear relation between space-time and energymomentum, Eq. (3), the space-time location of breakup point i thereby is defined as Note that the q i andq i generated by the string break are considered to be created in the same space-time vertex, even when quark masses and transverse momenta are included, such that q andq have to tunnel some distance apart before becoming on-shell. Such effective vertices in practice is the best one can do. Since hadrons are formed by two adjacent string breaks, the hadron production point should be related to these two. But the definition cannot be unique, since hadrons are composite and extended particles. For that reason, we propose three alternative definitions, illustrated in Fig. 9. Two breakup points, i and i + 1, with space-time coordinates v i and v i+1 , together define the q iqi+1 subsystem that forms hadron i. One obvious choice is then to define the hadron production point as the average of the two, red dot in the figure. Alternatively to this "middle" definition, the "late" hadron production point is where the two partons forming the hadron cross for the first time, green dot. Taking into account the hadron four-momentum p h , the "late" hadron production point is offset from the "middle" definition as Finally, an "early" definition, blue dot, is given by which is where the backwards light cones of theq i and q i+1 vertices cross, just like the "late" definition is where the forwards light cones cross. In a causal world, this would be the latest time for information to be sent out that can correlate the breakup vertices to give the correct hadron mass. Note that the two endpoint hadrons are situated on the light cone with this "early" definition. The different results obtained with the three alternative definitions can be used as a measure of uncertainty, see Sect. 4.1. If not stated otherwise, the choice in this article is the "middle" definition of Eq. (24).

More complex topologies
Multiparton systems are more complicated to address than a single qq string, as already demonstrated for the energymomentum picture. Their complexity also affects the spacetime implementation, which has to be extended to include several string regions. Initially consider a qgq system formed by massless partons, with a parameter plane as in Fig. 10, ignoring the turnover regions. Since each string region separately behaves like a simple qq system, Eq. (23) can be used. Nevertheless, the intermediate region is formed after the gluon has lost all of its energy, at a different location in space-time than the initial regions, so an offset has to be taken If the system is composed of more than one gluon, also more than one intermediate region has to be taken into account, as illustrated in Fig. 11. In such cases, more gluons have to be included when determining the space-time offset of some intermediate regions, such as the qg 3 one. This region is created when both g 1 and g 2 have lost their energies, giving an offset v reg = ( p g 1 + p g 2 )/2κ.
A general expression for the space-time offset of any intermediate region in any multiparton system can be easily defined, if all partons are numbered consecutively, starting from the q end, and region labels jk are for ones containing four-momenta from partons j and k, k ≥ j. The jk region offset is found to be where p m is the four-momentum of parton m, and for a breakup vertex in this region it thus holds that, While seemingly simple enough, there are a number of significant challenges to a robust implementation in a multiparton configuration, in part paralleling similar problems in the energy-momentum picture [27], in part going further. There are two main problems: the determination of the space-time Fig. 12 The string configuration and the corresponding parameter plane for a three gluon-loop topology. In the two initial endpoint regions the full lines indicate the "active area", and the dashed ones the complementary excluded one location of the final breakup in the system, and the nonphysical values of thex ± fractions that can arise when fragmentation moves to a new region. Those issues are further explained in "Appendices A and B", respectively, along with the solutions found to properly implement the space-time picture.

Gluon loops
So far, gluons have only appeared in open strings between a q and aq end, but it is also possible to have closed gluon loops, as exemplified in Fig. 12a for a ggg system. In order to reduce the problem to a familiar one, an initial qq is generated by string breaking in one of the string regions. This break should be representative of what ordinary fragmentation is expected to give. Thus the region is chosen at random, but with a bias towards ones with larger masses, where more ordinary string breaks are to be expected. Inside that region, the Γ value of the vertex is chosen according to Eq. (14), and a further random choice gives the longitudinal location of the breakup. Having taken this step, the n-gluon-loop topology is effectively mapped onto an (n +1)-parton open string with q andq as endpoints. The key difference is that, unlike open strings considered so far, Γ q = Γq = 0.
As an example, the parameter plane for a gluon-loop consisting of three gluons is displayed in Fig. 12b. In this case, the string between g 1 and g 3 has broken into two string pieces, generating two new string regions, g 1 q and g 3q . Although the full g 1 g 3 region is duplicated in the parameter plane, in the right endpoint region only the "active area" between q and g 1 is open to fragmentation, while the left endpoint region only uses the complementary area between g 3 andq. Apart from that, the fragmentation process can now play out in the same way as for an open string, with the same rules for the space-time locations of the breakups. Note that the q and q "endpoints" correctly will be assigned the same creation vertex in this procedure.

Smearing in transverse space
Strings can be viewed as the center of cylindrical tubelike regions of directed colour flow. So far we have assigned production vertices as if they all were in the very center of the string. A more realistic picture is to introduce some transverse smearing. For simplicity this is done according to a two-dimensional Gaussian where x and y are transverse spatial coordinates and σ is the width of the distribution.
The width of the string should be of typical hadronic scales, but related to confinement in two dimensions rather than three. Taking the proton radius r p ≈ 0.87 fm [35] as starting point, the default σ = r p / √ 3 then gives a, The smearing in transverse space might generate unwanted situations, such as negative values for the Γ parameter of the breakup points. Since the space-time location is first obtained from the fragmentation picture in the longitudinal direction, the squared invariant time should not change when introducing smearing. Therefore, the time coordinate is adjusted after including the smearing in transverse space, in order to retain the Γ value determined by the longitudinal scheme. Alternative procedures could be envisioned, in particular when the collision process itself does not happen in the origin, but for now this smearing possibility is good enough to indicate trends.

Massive quark implementation
As illustrated in Sect. 2.2.5, the origin of the massive and massless oscillations are displaced for technical reasons; correspondingly the initial point of the massive oscillation is offset from the origin of the space-time coordinate system. Since the fragmentation process is performed in the massless system, the space-time locations of the breakups have to be adjusted.
To determine the offset, consider the qq system in Fig. 13a, studied in the CM frame, with q/q moving in the ±z directions. In this case the q andq masses are different, with m q > mq. At time t = 0 we have p 0 = p z,q = −p z,q and E cm = E q + Eq, with p 0 , E q and Eq given by standard twobody decay kinematics. The massive oscillation in Fig. 13a is offset both in time and z-component of space, represented as Δt and Δz. The former can be determined from the difference between the time coordinates at which the massless and massive quarks lose their three-momenta, t massless and t massive in Fig. 13a, i.e.
The process to define the space offset is slightly different. In Fig. 13a, the distance of the massive q endpoint to the centre of the massive oscillation is denoted z 1 , while z 2 is the distance of the massive q endpoint to the centre of the massless system. The equation of motion then gives, The time and space offsets can be combined as, where p 0q and p 0q are the four-vectors of the equivalent massless system, cf. Eq. (18). Hence, for each vertex in a region formed by at least one massive quark, the spacetime location is defined as usual from the massless system, v massless , and then corrected to, For more complex topologies, such as multiparton systems consisting of a massive q and/orq and several gluons, the effect of the massive q orq is only non-negligible in the lowest respective endpoint region. Therefore, the massive correction is only performed in those regions. Also the space-time location of the massive endpoint quark "vertex" has to be offset, away from what it would have been for a massless quark. This is exemplified in Fig. 13b, for the same massive qq system as before. The three vertices v 1 , v q and v 2 correspond to the space-time location of the massless endpoint, the massive turning point and the closest breakup to the massless endpoint, respectively. The system can be studied in a Lorentz frame where the three vertices are simultaneous, v 1,t = v 2,t = v q,t . Then, linearity between energy-momentum and space-time gives, where m q is the mass of the heavy quark and m h the mass of the hadron formed from the vertices v 1 and v 2 . From this v z,q can be extracted. Recast in Lorentz-invariant four-vector notation, this gives, Note that, after the correct endpoint location has been determined, the offset correction of Eq. (35) has to be included. If a system is formed by a massless q and a massiveq, say, Eq. (37) has to be applied only to the massiveq, while the offset in Eq. (35) has to be used both for q andq. A final feature is that the oscillation period for a hadron composed of massive quarks is shorter than a same-mass one with massless quarks. This discrepancy only affects the estimation of the "late", v h l , and "early", v h e , definitions of hadron production points. The expression in Eqs. (25) and (26) where α red accounts for the reduced oscillation period. This parameter is determined in the hadron rest frame by the absolute three-momentum of the quarks forming the hadron where m h is the mass of the hadron and m q and mq the masses of the constituent q andq, respectively. Needless to say, these semiclassical estimates of oscillation periods cannot be taken too literally. It could be argued that all hadrons, light as heavy, have hadronic sizes of order 1 fm, and should have essentially common oscillation periods related to that. That would give us problems notably for pions, however, which are abnormally light in relation to their size.

Other implementation details
Up until now, only open qg 1 g 2 . . . g n−2q and closed g 1 g 2 . . . g n strings have been considered. A third possibility is junction topologies, wherein three string pieces meet in a common vertex [36], and whereby the junction effectively carries the baryon number of the system. Such topologies can arise e.g. when the three valence quarks are all kicked out of an incoming proton, but there are also scenarios in which further junctions and antijunctions may be formed [37].
A junction system consists of three different "legs", each stretched from an endpoint quark via a number of gluons in to the junction. In Pythia the fragmentation process is most conveniently defined in the rest frame of the junction. Here the total energy of each leg is determined, and the two legs with the lowest energies are fragmented from the respective q end inwards. The process stops when the next step would require more energy than left in the leg. Once the two initial legs have fragmented, the two leftover q from the respective last breaks are combined to create a diquark. Together with the third leg and its original endpoint q, this diquark defines a final string system, which now fragments as a normal open string.
The assignment of space-time locations in junction topologies introduces no new principles, but requires some extra bookkeeping. The three junction legs are considered as three different systems, to be dealt with in the same order as they fragmented, starting from the leg with the lowest energy.
Low-invariant-mass systems hadronize about as highmass ones, even if kinematics is more constrained. The exception is when the invariant mass of the system is so low that only one hadron can be formed. In such cases, the "early" hadron production point is at the origin of the qq system, i.e. v h e = (0; 0, 0, 0). Note that smearing in transverse space will give rise to negative squared invariant times in such cases. This is not a problem if the reason is that the collision of two Lorentz-contracted proton "pancakes" naturally would lead to a spread of x, y coordinates of collisions at t = 0. The "middle" and "late" definitions are calculated from the fourmomentum p h of the hadron as v h = p h /2κ and v h l = p h /κ, respectively.
Finally, many of the hadrons produced during the string fragmentation are unstable and decay further, a process known as secondary particle production. In such cases the invariant lifetime is selected at random according to an exponential decay, P(τ ) ∝ exp(−τ/ τ ), where τ is the tabulated average lifetime [35]. For short-lived particles it is rather the width Γ of the mass distribution that is known, and then one can use τ =hc/Γ . Given a known hadron production vertex, the decay one becomes for a hadron with four-momentum p h and mass m h . This equation can be used recursively through decay chains, also e.g. for leptons. Truly stable particles are only e ± , p,p, γ and the neutrinos. Also some weakly decaying particles with long lifetimes are effectively treated as stable by default: μ ± , π ± , K ± , K 0 L and n/n.

A comparison of time scales
In this article we only address the space-time picture of hadronization. In the context of a hard collision process, say qg → qg, also perturbative emission of further partons off the two scattered partons is extended in space and time. This is related to the regeneration time of the QCD field, mainly consisting of gluons, at typical time scales of order, for emitted partons of energy E and transverse momentum p ⊥ [38]. This expression is conveniently split into a "Heisenberg uncertainty" factor ( p ⊥ is a measure of the off-shellness of intermediate propagators) and a "time dilation" factor, as indicated. Similar relations hold for emissions off the two incoming partons. Typically, parton shower descriptions in event generators such as Pythia stop at scales of the order p ⊥min = 0.5-1 GeV, mainly because α s becomes so big that perturbation theory cannot be trusted below that. (The current default value for Pythia final-state radiation is 0.5 GeV, but that is the p ⊥ for each daughter of a branching with respect to the mother direction, meaning a separation of 1 GeV between the two daughters. Eq. (41) should not be trusted up to factors of 2 anyway.) This corresponds to a τ regen ≈ 0.25 fm, say, to be compared with the average hadronization time τ had ≈ 1.3 fm (see Sect. 4.2 below), i.e. about a factor five difference. To a good first approximation, the simulated perturbative activity can therefore be viewed as happening in a single point as far as the hadronization process is concerned. This is even more so for the hard perturbative activity that gives rise to separate jets, for which p ⊥ 1 GeV. The emissions that possibly are simulated below 1 GeV can only give small wrinkles on the strings stretched between the main partons.
The comparison of invariant time generalizes to hold everywhere in an event, since time dilation works the same way for showers and hadronization. That is, a perturbative splitting at high energy and low p ⊥ may occur at large time scales as measured in the rest frame of the event, when hadronization already started in the central region, but still well before it will begin in the part of the event that could be affected by the splitting.
At the end of the Pythia showers, the total number of partons in a typical LHC event is roughly half of the number of primary hadrons later produced. Given that the size, in each of three spatial dimensions, is only a fifth for the partonic system compared with the hadronic one, it might seem that the partonic density is much higher than the the hadronic one, and that partonic close-packing would be a more severe issue than hadronic ditto. Partons don't have a well-defined size, however. A newly created parton could be assigned a vanishingly small size, and then the colour field surrounding it would expand with the speed of light. Thus the partonic size could be equated with the time since creation, multiplied by a standard time dilation factor.
At early times the partonic system of a collision therefore expands in size at about the same rate as the size of partons, and any net effect comes from the rise of the total number of partons as the cascade evolves from early times. Here the colour coherence phenomenon enters, however [38]. It is the obervation that the two daughters of a q → qg or g → gg branching share a newly-created colour-anticolour pair, that cannot contribute to the radiation until the partons are more separated than the wavelength of the further radiated partons. This gives a mechanism for close-packing avoidance, in event generators implemented in terms of angular or p ⊥ ordering of radiation.
Had the parton shower been allowed to evolve further than the current cutoff, the partonic multiplicity and the partonic overlap would have increased as the QCD scale of ≈ 0.3 GeV is approached. By then the naive size of partons would be of the order of 0.7 fm, which is about the expected transverse size of strings, and soft partons emitted at this stage form part of the emergent strings. We do not know how to model these late stages of the cascade, but any effects coming from them are included in the tuned parameters of the string fragmentation framework.
The picture painted here is based on studying one partonic cascade. Since protons are composite object, however, several partonic subcollisions can occur when two protons collide -MPIs. One therefore also should worry about the overlap of cascades from different MPIs -partonic rescattering. In part this issue has been studied [39], and shown to give small effects. That study only included the effects of parton multiplication by initial-state radiation, as encoded in parton distribution functions, and thus did not address the effects of collisions between partons from two separate MPI subcollisions. In general, however, MPIs occur at different transverse locations when the two Lorentz-contracted protons collide, and the products move out in different rapidities and azimuthal directions. Also here it is therefore plausible with only minor overlap at early times and large perturbative scales. (In a relative sense; most MPIs do not have all that large p ⊥ values.) The overlap becomes relevant at later scales, where colour reconnection is the currently favoured mechanism for interactions between the emerging colour fields.
Another issue that we would like to comment on is the folklore that "fast particles are produced early". This would seem to be in contradiction with the string picture, where hadronization begins with slow particles in the central region and then spreads outwards to faster particles at later times, roughly along a hyperbola of constant invariant time. But it is all a matter of what comparison one has in mind, and what production time definition is used [40]. Consider the "first" ("leading") hadron, i.e. the one closest to the quark end of a qq string. For it Γ i−1 = Γ 0 = 0, such that Eqs. (13) and (9) together give The faster the hadron, the earlier the string break in invariant time: Γ 1 → 0 for z 1 → 1. Also the time in the string rest frame, with E q the quark energy, is decreasing for increasing z 1 . This reasoning generalizes: an event with few, fast particles can only be obtained when the Γ values and the breakup times are small. Conversely, events with high multiplicities of lower-momentum hadrons require high Γ values and late hadronization times. Whether early or late invariant times, however, the hadronization will still start in the middle and spread outwards.

Hadron density studies
We now proceed to study the implications of the model presented so far. Toy studies are reported for a simple qq string, but most results are for pp collisions at √ s = 13 TeV, for inclusive inelastic nondiffractive events. Although the pp modelling is not yet complete, enough is in place to perform some first semi-realistic studies that form the basis for future development. Notably we will estimate the hadronic density in a few different ways, as a means to highlight the close-packing of hadrons and the need to consider the consequences of that.

Longitudinal and transverse distributions
Three definitions of hadron production points were presented in Sect. 3.1, to allow estimates of the uncertainty in the description. Here the three resulting longitudinal and transverse space-time distributions are compared. For the former y τ is introduced as a space-time correspondent to ordinary rapidity y: while the latter is shown as a function of r = x 2 + y 2 . Note that the longitudinal variable is dimensionless while the transverse one is expressed in units of fermi (fm). Although formally unrelated, the dynamics of string fragmentation introduces a strong correlation between y and y τ , as illustrated in Fig. 14 for the default "middle" definition of production points. The spread from the diagonal comes from a number of effects, such as the probabilistic fragmentation process, given by Eq. (12), and hadronic decays. Figures 15 and 16 display the longitudinal and transverse spectra for pp collisions at √ s = 13 TeV given by the "early", "middle" and "late" definitions of hadron production points, represented in green, red and blue, respectively. In the same figures, the spectra for a single string, at the same CM energy, using the "middle" definition are also illustrated in black. Both primary and secondary hadrons are taken into account.
The longitudinal spectra for the different definitions are very similar, as can be seen in Fig. 15. The largest disagreement is visible around y τ ≈ 0, where the spectra of the "early" definition peaks more, but "early" also has more particles at the very largest y τ values. In short, the "early" alternative maximizes the extreme behaviour of hadron production, whereas the "late" one minimizes it. The differences are not bigger than that we can consider the "middle" definition a fairly reliable one.
Similar conclusions are drawn from the transverse spectra, shown in Fig. 16. The spectrum for qq events is a consequence of the transverse smearing, Sect. 3.4, and of particle decays; otherwise primary production would all be at r = 0. In contrast, pp events are constructed out of a large number of strings stretched between the partons from hard collisions, parton showers and beam remnants, all of them intrinsically with a transverse motion. Therefore the smearing is important for the spectrum at low r values, as can be seen in the difference between the two "middle" r distributions, while the distribution at larger r values is rather insensitive. The difference between the "early", "middle" and "late" production points is larger than for the longitudinal spectra, but still sufficiently close as to give confidence that meaningful results can be obtained. In the following, all plots will be for the "middle" definition.

Temporal and radial evolution of hadron production
The number of hadrons is shown as a function of time for a single string with √ s = 20 GeV in Fig. 17. The red curve corresponds to the number of primary hadrons, formed by the string fragmentation, that have not decayed at the time, while the green curve represents the number of secondary hadrons, from particle decays. The total number of hadrons, illustrated in blue, is the sum of primary and secondary hadrons. The brown curve represents the number of final (i.e. stable) hadrons, see Sect. 3.6. Finally, the black curve depicts the number of hadrons with |z| < 0.5 fm, to be discussed in Sect. 4.3.
For the 20 GeV simple qq system in its rest frame, the string can at most extend 10 fm in the ±z direction (for κ = 1 GeV/fm). This happens at t = 10 fm, since the massless quarks move with the speed of light. The primary hadron production therefore must stop at this time, as visible in Fig. 17. Decays make the number of hadrons continue to rise also beyond this time, but only slowly. Actually many hadrons, like the ρ ±,0 ones, are so short-lived that they decay within some fm of having been produced.
Note that there are almost no hadrons in the system up until t ≈ 0.5 fm, since the string has to have time to begin stretching out before it can begin to fragment. This is further illustrated in Fig. 18, with the invariant time distribution of primary hadron production points in the qq system. By default, the parameters a and b in Eqs. (12) and (14) are set to a = 0.68 and b = 0.98 GeV −2 [41], giving rise to a suppression of small Γ values of breakup vertices, and thereby also of small hadron production times. In detail, the relation between Γ and τ , Eq. (9), implies P(Γ ) ∝ Γ a dΓ ∝ τ 2a τ dτ = τ 2a+1 dτ for τ → 0. Further- Because those aspects are typical of the fragmentation process, a similar behaviour is also observed in pp collisions. The time evolution of hadron production in 13 TeV pp events is shown in Fig. 19 for t ≤ 20 fm. Although the qualitative behaviour is similar to the one in qq systems, the temporal evolution is smoother and the number of hadrons generated per unit time increases more rapidly in the pp case. These effects are direct consequences of the presence of several string systems in pp events, possibly extending all the way out to 6500 fm from the origin. Figure 20 extends the pp description up to 10 15 fm = 1 m. As in the case of the qq system, the total number of primary hadrons increases until fragmentation is over, which now is at t ≈ 10 3 fm owing to the higher energy. Decays deplete the number of remaining primary hadrons but increase the number of secondary ones. The significant drop in the number of hadrons at t ≈ 10 8 fm is from electromagnetic decays of the π 0 , mainly π 0 → γ γ . Although the lifetimes of s, c and b hadrons typically are at the mm to cm scales (more long-lived ones, like K ± , being considered stable here), their decays are still ongoing at 1 m, owing to time dilation of the frequently fast-moving hadrons. Most of the expansion of the system is along the z axis, i.e. the |z| distribution of hadron production would look similar to the t one in Fig. 20, except for the lack of a suppression at z = 0. It is therefore interesting to show the radial evolution separately, Fig. 21, for the same t range. Overall the two figures resemble each other, but all the relevant features have been compressed owing to the lower radial velocities. The π 0 → γ γ decay is shifted from t ≈ 10 8 fm to r ≈ 10 6 fm, for instance. The impact of weak s, c and b hadron decays are better visible in the range between 1 and 100 mm; beyond that scale essentially all relevant decays have already occurred. At the other end of the scale, note that around half of the hadron production occurs in r < 1 fm; there is no equivalent dynamical suppression of small r as there is of small t.

Close-packing of hadron production in the central region
One of the key objectives of this article is to assess the spacetime density of hadron production, dN /dV . Eventually we will need to use Lorentz invariant quantities, but these will then hide the time aspect of the evolution. To begin with, we will therefore study the density for |z| ≤ 0.5 fm as a function of r and t, giving a measure of the hadronic densities as a function of radius. The r -integrated number as a function of t is shown in Figs. 17 and 19. This number only increases up to t ≈ 2 fm, a time after which the longitudinal expansion leads to a steady decrease. Therefore, in Fig. 22a, the r distribution is only shown for a few different t ≤ 2 fm. The hadron density at times t = 0.5 fm is extremely low both for 20 GeV qq systems and for 13 TeV pp events, since they hardly have had time to start hadronizing yet. From this point on, hadrons are generated from fragmentation and particle decays, giving an increasing hadron density in the central region. The maximal value is at t ≈ 1.5 fm, a value that relates well with typical hadronization time scales, and where the density at r = 0 approaches 2 hadrons per fm 3 . A proton has a volume V h = 4πr 3 p /3 ≈ 2.76 fm 3 if we use r p = 0.87 fm [35] so, assuming the same volume for all hadrons and disregarding potential Lorentz contraction effects, this implies that five hadrons overlap in the center of the collisions. That number increases rather slowly with the collision energy; it is around four hadrons at 2 TeV and seven at 100 TeV. Also other measures of close-packing are expected to display only a mild energy dependence, so our results at 13 TeV should offer guidance for a wide range of collider energies. In order to extend the previous analysis to a Lorentz invariant measure of hadronic density, instead the volume element d 3 x/t will now be used: In the last step r m is introduced as the median radius of the hadron creation vertices in the event and Δy τ is the full width at half maximum of the dN /dy τ distribution. Together r m and Δy τ thus define a characteristic volume over which much of the production will occur, and relate it to a typical maximum density. For instance, the |y τ | distribution is roughly triangular in shape, cf. Fig. 15, so N /Δy τ is about the height of the dN /dy τ distribution at its maximum. Note that the hadronic multiplicity studied here is different from typical experimental definitions, e.g. the charged multiplicity in vertex detectors. Since we are interested in the hadronization process, only strong decays should be taken into account in our analysis. This excludes electromagnetic and weak decays, such as the π 0 one, but furthermore decays with r > 10 fm are not taken into account, since beyond that hadronic densities have fallen to modest levels anyway. In order to avoid double-counting of a hadron and its decay products, all secondary hadronic decay vertices enter with a weight one less than the hadronic multiplicity of the decay. Counted this way, the average multiplicity of inelastic nondiffractive 13 TeV pp events is n had = 169.
Inside this sample, ten multiplicity ranges are defined such that each of them corresponds to roughly 10% of the events. The resulting longitudinal y τ and transverse r spectra are presented in Figs. 23 and 24, respectively. For the sake of clarity, some intermediate multiplicity bins are left out of the figures. By energy-momentum conservation the y τ (and y) spectra are more peaked around the middle for increasing multiplicities. Not so for the r spectra, where the distribution shifts towards larger values for the higher multiplicities. It is here useful to remind that the basic MPI framework implies that high multiplicities primarily come from having more MPIs, rather than e.g. from a single hard interaction at a larger p ⊥ scale, and that therefore p ⊥ (n charged ) is expected to be reasonably flat. The experimental observation of a rising p ⊥ (n charged ) actually was the reason to introduce colour reconnection (CR) as a key part of a realistic MPI modelling [2].
The effect of CR on the median radii r m is shown in Fig. 25, as a function of the median hadronic multiplicity n had of each multiplicity range. The red and blue curves represent results with and without CR, respectively, and these match very well with expectations from the p ⊥ (n charged ) behaviour; also the rise of r m is driven by the CR mechanism. Note that switching off CR gives higher event multiplicities, well above data. To this end also a green curve is introduced, wherein the p ⊥0 parameter of the MPI framework [3] is increased for the no-CR alternative until the average multiplicity is the same as in the default with-CR scenario. This gives a slightly larger r m than the naive no-CR setup, since the p ⊥ of MPIs is increased in the process, but otherwise is in line with the original observation.  Figure 26 shows the hadron density, defined as in Eq. (46), for the three same scenarios as above. The n had , r m and Δy τ are calculated in each multiplicity range. The space-time hadron density increases with hadronic multiplicity, but significantly faster in the two scenarios without CR, as a direct consequence of the inverse quadratic dependence on r m . The lower values with CR on may be partly misleading, however; only because strings are spread across a bigger transverse area when CR is on, it does not mean that there are strings everywhere in that area. The typical average density of 5 hadrons per Lorentz invariant space-time element should therefore be viewed as a lower estimate.

Close-packing analysis in the hadron rest frame
As a final measure of close-packing we will next check how many hadrons overlap with each of the hadrons of an event, as defined in the rest frame of the hadron at the time when it is formed. In detail, consider a hadron h 1 generated at time t 1 , where t 1 is defined in the rest frame of hadron h 1 . The other hadrons in the system are boosted to the rest frame of h 1 , where only the hadrons created at times t ≤ t 1 and which have not decayed at t 1 are taken into account. Their location at t 1 is calculated from the respective production point and fourmomentum, from which the distance to h 1 can be calculated. If this distance is shorter than 2r p , r p being the proton radius, the hadrons are considered to overlap, implying that already the production of h 1 could be affected by the presence of these other hadrons. Note that Lorentz contraction is not taken into account, which would decrease numbers, but then neither is the possibility of closer distances at t > t 1 , which would increase them. The analysis is done including or excluding the adjacent hadron on each side along the string of the hadron studied. The reason for the latter scenario is that any effects of same-string-neighbours already effectively should have been taken into account in the tuning of the fragmentation process, e.g. in Eq. (12).  The number of overlapping hadrons is shown in Fig. 27 for different hadronic multiplicity ranges, as presented in Sect. 4.4. Although close-packing also takes place in lowmultiplicity pp events, the number of hadrons overlapping with a newly created one is not so high. For high-multiplicity events, on the other hand, close-packing often arises with a significant number of nearby hadrons, likely leading to collective effects that are not taken into account in Pythia.
The overlap can be differentiated further. Generally, particles produced at large transverse momenta are not expected to experience close-packing as much as those at small ones. The reason is that, even if parton showers can generate many partons from each initial highp ⊥ parton, these daughter partons are spread widely in momentum space. Therefore, the fragmenting strings stretched between them also will have a modest overlap, unlike the accretion of lowp ⊥ strings from multiple soft MPIs. In order to isolate this feature, we study the overlap as a function of the hadron transverse momentum, using the same analysis procedure as above, with exclusion of adjacent hadrons along the string, Fig. 28, for "soft" and "hard" QCD events in red and blue, respectively. The former is the standard inelastic nondiffractive event sample, whereas the latter is for the subsample where a hard 2 → 2 QCD process has p ⊥ > 100 GeV. In both cases the overlap peaks for hadrons around p ⊥ ≈ 0.5 GeV, and then falls off at larger p ⊥ values. The level is somewhat higher for the hard-QCD events, consistent with such events being biased towards smaller impact parameters and therefore more MPIs, but the trends are consistent.

Summary and outlook
The motivation for this article is the mounting evidence for several collective effects in high-multiplicity pp collisions, similar to those usually associated with the formation of a Quark-Gluon Plasma in heavy-ion collisions. Whether we are witnessing QGP also in pp or not remains an open question, but the need to allow for some kind of collective mechanisms can hardly be in doubt. It should not even come as a surprise, given that already order-of-magnitude estimates of the size of the fragmentation region told us that strings would be formed close-packed and fragment into close-packed hadrons within any realistic MPI-based scenario. Colour reconnection was introduced as a partonic-state mechanism to describe some signals of collectivity, notably the rise of p ⊥ (n charged ). But the rising fraction of multistrange baryon production with event multiplicity implies that collective effects are needed also in or after the fragmentation stage, or both. To be able fully to explore various such scenarios it becomes important to understand the space-time structure of hadronization in more detail than hitherto. The aim of this article has been to develop the necessary framework, and implement it as part of the public Pythia event generator. Specifically, we have determined the space-time location of the string breakup vertices and compared three alternative definitions for primary hadron production points. Although the implementation of the space-time picture in a simple qq string topology is straightforward, the picture gets much more intricate when more complicated topologies are addressed.
To illustrate the usefulness of the new framework, some simple first studies have been presented, notably exploring space-time hadron densities. Initially, inclusive longitudinal and transverse space-time distributions were shown, and the production and decay patterns from fm to m scales were traced. Next the density in a central slice |z| < 0.5 was studied as a function of t and r . While not explicitly Lorentz invariant, it gave some first hints of close-packing problems. Moving from a volume element d 3 x to d 3 x/t gave access to Lorentz-invariant density measures. It was shown that the median radius of the fragmentation region is increasing with multiplicity, but almost only because of the colour reconnection effects. The flip side is that density is increasing significantly with multiplicity without CR, whereas it remains at an average of about five hadrons overlapping with CR included.
The close-packing of hadrons was finally analysed by counting the number of hadrons overlapping with a newly generated one in its rest frame, again for different event multiplicities. In this case, the number of nearby hadrons does increase with multiplicity, with CR included, implying that close-packing becomes increasingly important with multiplicity also here. The overlap is largest for lowp ⊥ hadrons, in the MPI-dominated region, whereas it drops for larger p ⊥ scales, dominated by hard QCD jets.
A few corners have been cut in the current pp implementation. Notably no space-time vertices have been assigned to the individual MPI collisions, although such assignments are implicit in the MPI impact-parameter and matter-profile framework [3]. A sensible space-time picture of partonshower evolution would introduce offsets, although presumably not major ones. Similarly, the CR between different MPIs implies that the two ends of a string may start out from different space-time points. For now, all such effects have implicitly been made part of the generic smearing step in Sect. 3.4.
To these minor corrections should be added the potentially much larger dynamical ones that could generate collective effects, be it before, during or after the string fragmentation stage. The shove and rope mechanisms are two examples for the first two stages, but the immediate continuation of the current article would be to study the consequences of hadronic rescattering in a dense hadronic gas. Models for hadronic rescattering already exist [42], such as UrQMD [43] and SMASH [44], and could possibly be interfaced. For better control, however, it would be useful to implement relevant aspects of such a framework as an integrated part of the Pythia program.
The longer-term expectation is that continued experimental studies will provide further information on all kinds of collective phenomena in LHC pp events, and that model building will try to rise to the challenge. Especially interesting is to figure out which phenomena can be explained without invok-ing QGP, and which cannot. This would then reflect back on the LHC heavy-ion program.
Fragmentation of an open string is modelled by allowing hadron production from either string end, until the remaining invariant mass of the system is only sufficient to generate the last two hadrons (see Sect. 3.1). At that point, a final breakup is generated between the last previous breakup on either side. For the energy-momentum picture, the final breakup occurs in a fictitious final region, created from the combination of all the unused parts of all remaining regions. This region does not have a space-time correspondence, however; in particular there is no concept of a space-time offset where this region is created. In this case we therefore have had to depart from the energy-momentum picture, to develop an unfortunately rather complex procedure.
To set the stage, consider a simple qq string, where the flavours of the final breakup helps define the two final hadrons. The transverse mass constraint of each of those hadrons is represented by hyperbolae, see Sect. 2.2.3. The two hyperbolae either do not cross at all or else cross in two different points. No solution can be found in the former case, and the fragmentation process then has to be repeated. The latter case is illustrated in Fig. 29, where the remaining region is depicted in red and the blue dots represent the two points where the hyperbolae meet. Since the two possibilities have different Γ values, the relative probabilities for them to occur are given by Eq. (14). For simplicity, only the exponential part is retained, i.e. P(Γ i ) ∝ exp(−bΓ i ) is used to pick either point. That choice made, the kinematics is fully defined.
Major complications are found in systems with several intermediate gluons between the q andq ends, specifically when the two old breakups are located in different regions. In those cases, knowing the region in which the final breakup is located is not always possible. Since that aspect is essential to calculate thex ± fractions, several methods have been tested before settling on the one presented here.  For notational convenience, the old breakup closer to the q (q) end will be called the positive (negative) breakup, at location v pos (v neg ) in the positive (negative) region, together with the final breakup forming the positive (negative) hadron. The first implemented method consists in projecting the positive/negative hadron four-momentum on to the longitudinal and transverse direction vectors of the positive/negative region [27]. If the two old breakups are in the same region then that is also the region of the final vertex. This situation is exemplified in Fig. 30, where the green point is the final breakup, the red and blue points are the old breakups, which correspond to the endpoints of the final region, and the grey squares represent the two final hadrons created. As can be seen in the figure, the x ± fraction of the positive hadron are x ± p,Had and thex ± of the positive breakup arex ± p,Old . Then, thex ± f of the final breakup can be obtained as, The same procedure can be followed with the negative breakup and the negative hadron, giving,