Dipole evolution: perspectives for collectivity and $\gamma^*$A collisions

The transverse, spatial structure of protons is an area revealing fundamental properties of matter, and provides key input for deeper understanding of emerging collective phenomena in high energy collisions of protons, as well as collisions of heavy ions. In this paper eccentricities and eccentricity fluctuations are predicted using the dipole formulation of BFKL evolution. Furthermore, first steps are taken towards generation of fully exclusive final states of $\gamma^*$A collisions, by assessing the importance of colour fluctuations in the initial state. Such steps are crucial for the preparation of event generators for a future electron-ion collider. Due to the connection between an impact parameter picture of the proton structure, and cross sections of ep and pp collisions, the model parameters can be fully determined by fits to such quantities, leaving results as real predictions of the model.

In the research program at the Large Hadron Collider (LHC) and the Relativistic Heavy Ion Collider (RHIC), collisions of ultra relativistic heavy ions are hypothesized to result in the creation of a quark-gluon plasma (QGP) with partonic degrees of freedom. One of the main avenues for investigating and characterizing this plasma consists of measurements of azimuthal correlations between particle pairs separated in rapidity, connecting particle emission angles to the initial geometry of the collision. Non-trivial correlations reflecting collective properties were first observed in gold-gold and copper-copper collisions at RHIC [1], but has since been investigated also in lead-lead (PbPb) collisions at the LHC [2][3][4]. Such non-trivial azimuthal correlations had at that point already been hypothesized to be a signal for hydrodynamic behaviour [5], or, even earlier, to involve microscopic dynamics of overlapping "quark tubes" or strings [6].
Similar results have been obtained in smaller collision systems such as proton-lead (pPb) [7], deuteron-gold [8], and, perhaps most surprisingly, in proton-proton (pp) [9]. Attempts to observe similar behaviour in even smaller collision systems, e.g. e + e − , has, while carrying interesting prospects, so far not produced positive results [10]. Even though the discovery of collectivity in pp is almost ten years old, the origin of such correlations in small collision systems is still highly debated (see ref. [11] for a recent review), and its resolution is among the top priorities for the future heavy ion program at LHC [12]. One possibility is that the correlations in these small collision systems are due to coherence effects [13] or initial state correlations [14]. Another is a repetition of the argument from heavy ion collisions, where the observed collective behaviour is a hydrodynamic response to the initial partonic spatial configuration [15]. A picture where a hydrodynamic "core" coexists with a non-hydrodynamic "corona" has been shown by the EPOS model [16,17] to provide good descriptions of collectivity even in small collision systems.
The possibility of a hydrodynamic (or in fact any other) response to an initial geometric configuration of partons, poses a challenge to the traditional strategies for pp event generators, such as Pythia 8 [18] or Herwig 7 [19], both based on perturbative QCD (pQCD) with no obvious way to extract a spatial configuration for which to calculate a response. Attempts to calculate such a structure [20][21][22] generally involve assuming a certain spatial distribution of partons in the proton and, using the eikonal approximation, then transferring this structure to a spatio-temporal structure of the multiple partonic interactions (MPIs). The immediate drawbacks of such an approach are that (a) such models will in general contain parameters which need to be fitted to the same type of particle correlations as they wish to predict, and (b) assuming a spatial distribution of partons in a proton will generally contain many ad hoc elements.
Even though the spatial distribution of partons in a proton cannot be assessed ab initio, the evolution of said distribution can be calculated perturbatively in the formalism of Mueller [23,24]. At high energies, average properties will retain little dependence on the initial configuration, i.e. be mostly dependent on the evolution. Since the transverse substructure of the colliding protons (or virtual photons) can be linked to total or semi-inclusive cross sections, any model parameters can be tuned to such quantities, and leave any further estimation of collective effects as real predictions of the model. Attempts to predict the elliptic flow in pp collisions using an implementation of Mueller's model was provided in 2011 [25], showing v 2,3 comparable to values found from PbPb at RHIC and LHC energies.
This paper is concerned with presenting a new Monte Carlo implementation of Mueller's model, study its description of cross sections in pp and γ * p collisions, in order to provide estimates on parton level geometries in pp, proton-ion (pA) and ion-ion (AA) collisions, linked to collective phenomena. Mueller's model has been implemented as a Monte Carlo several times before, as it is not only useful for calculating spatial distributions of partons, but in fact has much wider applications due to the equivalence of the Mueller formalism with B-JIMWLK (Balitsky, Jalilian-Marian, Iancu, McLerran, Weigert, Leonidov and Kovner) [26][27][28][29][30][31][32][33] evolution (see section 2). Such an implementation makes direct introduction of effects beyond the leading logarithmic approximation possible, e.g. conservation of energy and momentum without imposing kinematical constraints on the splitting kernel [34]. This makes the implementation attractive for estimation of basic quantities dominated by small x processes (e.g. cross sections) in cases where little guidance from data exists. In this paper (see section 7) we will also apply the formalism to extract Glauber-Gribov (GG) colour fluctuations in γ * p collisions, in order to take the initial steps towards a generation of electron-ion (eA) collisions within the Angantyr framework [35,36] -a possibility which is foreseen to aid the preparation of an eA program currently being planned [37].
Earlier implementations of the Mueller dipole model include the public Oedipus [38] and Dipsy [39,40] codes, as well as a private implementation by Kovalenko et al. [41]. All implementations treat only gluons in the evolution, as will this work. The implementation in this paper is similar to the implementation in Dipsy in some respects, but differs in other, while bearing less resemblance to the other two. The key differences between most of the used approaches, lies in the treatment of effects beyond leading order. Worth mentioning already here, is the treatment of sub-leading N c (number of colours) effects in the evolution, leading to saturation in the cascade. In Dipsy, this is addressed through so-called swing mechanisms [42,43], which suppresses the contribution from large dipoles in dense environments by replacing them with small dipoles. In this paper we consider only sub-leading N c effects in the collision frame, by including multiple interactions in a way consistent with unitarity. Thus we make no attempt at treating saturation in the cascade, as the focus is rather to study how well one can do with an approach that includes only a minimal set of sub-leading corrections. Effects included in this paper is energy-momentum conservation and recoil effects (which are beyond leading log) and confinement (which is a non-perturbative effect). This also separates our approach from the IP-Glasma approach [44], which includes gluon saturation effects in the initial configuration explicitly, and evolve using B-JIMWLK.
On a more technical note, the approach presented in this paper is implemented within the larger framework of the Pythia 8 Monte Carlo event generator. This first of all means that the implementation will become publicly available, 1 and to aid reproducibility and transparency, a large part of the manuscript, as well as appendix A, are devoted to the details of the implemented model. Our approach is simplistic in the sense that only a minimal amount of corrections to Mueller's original model has been added, and where ambiguities have arisen, the simplest possible choice has been taken.
The structure of the paper is as follows: After this introduction, the pQCD model of Mueller is introduced. Then follows a description on how observables are calculated within the Good-Walker framework as well as a definition of the observables related to the substructure of protons. The next section describes the overall features of the Monte Carlo implementation, before we proceed to the results on cross sections, eccentricities and colour fluctuations in processes with incoming virtual photons. Lastly, a section is devoted to conclusions and forthcoming work.

Proton substructure evolution
In this section we will outline the theoretical basis of the initial state evolution approach used in subsequent sections, and briefly review its relation to other approaches. The theoretical basis is the well known dipole QCD model by Mueller et al. [23,24].

Dipole evolution in impact parameter space
We consider in general a picture with a projectile with a dipole structure incident on a target. In the simplest case, the projectile is just a single dipole r 12 , spanned between the coordinates r 1 and r 2 , in impact parameter space. The probability at leading order for this dipole to branch when evolved in rapidity (y), is dP dy = d 2 r 3 N c α s 2π 2 r 2 12 r 2 13 r 2 23 ≡ d 2 r 3 κ 3 . (2.1) Here r 3 is the transverse coordinate of the emitted gluon and κ 3 is used as a short-hand for the splitting kernel. An observable O known initially, will after an infinitesimal interval dy have the expectation value (denoted by a bar), assuming unitarity: where ⊗ denotes the evaluation of the observable O in the two dipole system r 13 , r 23 . In the limit dy → 0 this becomes: Remarkably, eq. (2.3) allows for the evolution of any observable calculable in impact parameter space. In the case of S-matrices in impact parameter space, the evaluation in the two dipole system reduces to a normal product in the eikonal approximation. Thus O(r 13 ) ⊗ O(r 23 ) → S(r 13 )S(r 23 ). Changing to scattering amplitudes, T , by substituting T ≡ 1 − S, one obtains: This is the B-JIMWLK equation hierarchy [26][27][28][29][30][31][32][33] in impact parameter space, which, as shown already by Mueller [45], can be generated directly from eq. (2.1). Equation (2.4) includes a non-linear term, T 13 T 23 , and the treatment of this term is defining for many of the various approaches dealing with initial state evolution at low x.
Removal of the non-linear term yields the BFKL (Balitsky, Fadin, Kuraev and Lipatov) equation [46,47], which correctly sums all of the leading logarithms in energy (or, more precisely in rapidity (α s · y) n ) to all orders. Other than simply neglecting it, the simplest treatment of the non-linear term is by a mean-field approach, where it factorizes as: T 13 T 23 → T 13 T 23 . This approximation yields the BK (Balitsky and Kovchegov) equation [26,48].

The Mueller dipole model
Several approaches have been proposed to utilize the simple, but powerful evolution equation introduced in eq. (2.4). Here we will focus on the Mueller dipole model that neglects the non-linear term completely, but is particularly suitable for calculation of geometric quantities. Usually, eq. (2.4) is solved as an initial value problem: given a scattering matrix at small initial rapidity (y 0 ), it determines the resulting scattering matrix at any y ≥ y 0 . Note however, that eq. (2.3) is applicable for any type of observable calculable in impact-parameter space, notably observables linked to the geometry of the partonic initial state. As an example, consider the average vertex coordinate position, z , where z is either the x or y coordinate of a dipole. For a single dipole z = (z 1 + z 2 )/2, for the two dipole system z = (z 1 + z 2 + z 3 )/3, where the two dipoles has a common point z 3 , and directly: For more complicated geometric observables, such as eccentricity (see section 3.2), the analytic expressions become quite involved, and must be handled observable by observable. They are, however, quite easy to handle in a Monte Carlo, where any O can be evaluated event by event, and the expectation value extracted from a large statistics sample.
The starting point for the model, is the evolution of an Onium (or γ * → qq) state in transverse space and rapidity, following eq. (2.1). Instead of calculating average quantities directly from the evolution equation, Monte Carlo events are generated, by performing a probabilistic evolution of a given initial state, corresponding to a collision event performed by an experiment. The calculational details of performing such an evolution are deferred to section 4 and appendix A. It is, however, important to note here the approximation of this  Figure 1. Schematic view of two colliding gluon dipoles. The initial dipoles denoted r 12 and r 34 are allowed to interact via two-gluon exchange. This results in the creation of two new dipoles, r 14 and r 23 and a connection of the two dipole chains. The lines r 13 and r 24 are not drawn, but enters in eq. (2.6).
evolution, namely that all dipoles in the dipole-chain radiate independently, removing the non-linear effect from the cascade itself.
After a full evolution in rapidity, a single dipole will have evolved to a chain of dipoles, each of which are allowed to interact with dipoles from another evolved system through gluon exchange. The lowest order interaction between two dipoles, at amplitude level, is single gluon exchange, resulting in two gluon exchange at cross section level. This cross section can be related to the elastic amplitude (cf. section 3.1) through the optical theorem. The dipole-dipole cross section depends on the distances between the interacting dipoles (the enumeration of dipoles follows figure 1) as [54]: where the arrow indicates that the 't Hooft large-N c limit 2 is taken to reach line 2 of eq. (2.6), which then defines f ij . The 't Hooft large-N c limit is taken in order to ensure consistency with the leading logarithmic approximation in the (BFKL) evolution. The distances r ij are indicated in figure 1, except for r 13 and r 24 , the distances between (anti-)colour-(anti-)colour pairs (1,3) and (2,4). Where the dipole evolution comes with a factor of α s N c , the dipoledipole cross section is proportional to α 2 s . Thus in the 't Hooft limit where α S ∼ 1/N c , the dipole evolution is of order N 0 c ∼ 1, while the dipole-dipole interaction is of order 1/N 2 c . This means the dipole-dipole interaction is formally N c -suppressed compared to the dipole evolution [54].
A single collision can contain several dipole-dipole scatterings, equivalent to MPIs in a standard parton language. Assuming that the individual scatterings are uncorrelated, the contribution from each scattering exponentiates, resulting in the unitarized scattering ampli-tude for a single event (see section 3.1): (2.7) As each f ij comes with a factor of 1/N 2 c , the unitarized scattering amplitude correctly resums 1/N 2 c -suppressed terms in the interaction. In Regge terminology, each scattering f ij can be viewed as a Pomeron exchange. The first term in the expansion corresponds to single Pomeron exchange, and the latter terms to multi-Pomeron exchanges. The unitarized scattering amplitude can thus also be viewed as a resummation of all possible Pomeron exchanges in the collision frame.
An expansion of the exponent in eq. (2.7) into a power series results in factors of ( ij f ij ) n . To second order this results in a term quadratic in ij f ij , which corresponds to the mean-field approximation of the non-linear term from eq. (2.4). As this non-linear term corresponds to saturation, we note that the dipole framework does not include saturation in the evolution, but, when using the unitarized scattering amplitude, non-linearity is included in the interaction frame, and only there.

Dipole evolution beyond leading order
Significant formal progress has been made in the pursuit of systematic next-to-leading order (NLO) in α s corrections to the BK equation [55] and the full B-JIMWLK hierarchy [56][57][58]. Numerical studies of NLO BK [59] have, however, shown that the equation becomes unstable for some values of the initial conditions, making it yet unsuitable for a full Monte Carlo implementation. Recent work by Ducloué et al. [60] have shown that, for a specific choice of the initial scattering matrix, some problems of unphysical results can be overcome in the dilute-dense limit, by reformulating the NLO evolution equation w.r.t. rapidity of the dense target. This gives hope that a future improvement of the model, implemented as a Monte Carlo in this paper, could include formal improvements beyond leading order, but at this point it is not deemed feasible.
An approach for going beyond leading colour in the cascade, which is also suited for Monte Carlo implementation, is the so-called "swing" mechanism, introduced by Avsar et al. [43,61]. This can be understood as an extension of the identification of multiple interactions in the collision frame with Pomeron loops, as presented in the previous section. Since loops cannot be formed during the BFKL-like evolution, only loops cut in the collision frame are included. The problem is then posed as equivalent to forming 1/N 2 c suppressed dipole configurations in the evolution, by allowing dipoles to reconnect in such a way that the formalism becomes frame independent. This is another viable path for future extensions beyond leading order. Further work on the formalism is needed, however. Currently only a 2 → 2 dipole swing has been thoroughly studied, which is not enough to make the formalism fully frame independent. Going beyond 2 → 2 is a full study by itself, and not considered in the present paper.
In this paper we instead choose to include corrections beyond (formal) leading-log arising from energy-momentum conservation. It is well known that the leading-log BFKL equation, derived in the high-energy limit, will get sizable corrections at collider energies [62]. From studies of the full next-to-leading log BFKL [63,64], it is shown that contributions beyond leading log are very large, and a sizable amount are related to energy-momentum conservation [65]. In a Monte Carlo such corrections can be implemented directly, see details in appendix A. Related are non-eikonal corrections. Non-eikonal corrections arise due to the large but finite energy available during the cascade. In the CGC approach this can be understood as sub-leading effects to infinite Lorentz dilation of the projectile, which are troublesome but manageable analytically [66]. In a Monte Carlo implementation of the dipole model, the finite energy can be treated as recoil effects in the dipole splittings.
A non-perturbative effect from confinement is also included in our simulation. This must be done both in the cascade, where large dipoles must be suppressed, and in the interaction, where the range of the interaction must be limited to take confinement into account. Following ref. [42], this is done by replacing 1/p 2 g in the Coulomb propagator implicitly entering eq. (2.6) by 1/(p 2 g +M 2 g ), where M g can be taken as a confinement scale, or a fictitious gluon mass. This changes the expressions for the splitting kernel and the dipole-dipole interaction probability, the full expressions are written in section 4.

From substructure to observables
The following section is dedicated to the introduction of the framework used for linking partonic substructure to physical observables, such as cross sections and flow coefficients. First, we describe the Good-Walker formalism for calculating cross sections for particles with an inner structure, from obtained scattering amplitudes, and secondly, the apparent scaling of flow coefficients with initial state eccentricity seen in heavy ion collisions is explained.

The Good-Walker formalism and cross sections
The Good-Walker formalism is a method of calculating cross sections of particles with a welldefined wave function. It includes a normalised and complete set of eigenstates {|ψ i } of the imaginary part of the scattering amplitude (neglecting the real part, which is vanishing at high energies), denotedT ( b) (related to theŜ-matrix throughT ≡ 1 −Ŝ), with eigenvalueŝ T ( b)|ψ i = T i ( b)|ψ i . These scattering states have equal quantum numbers, but differ in masses. The wave function of the incoming beams can thus be expressed in terms of the above eigenstates, and written in short-hand notation as with |ψ p,t denoting the projectile and target wave functions, respectively, and c p,t the expansion coefficients. The scattered wave function is found by operating with the transition matrix on the incoming wave function, and from these definitions, the profile function for elastic scattering (at fixed Mandelstam s) can be defined: where we have defined an average over projectile and target states in the last equality and suppressed indices on T inside the average (which is done in all the following, unless specifically noted otherwise). Thus we obtain the cross sections and elastic slope in the eikonal approximation (again also at fixed Mandelstam s), In eqs. (3.4-3.6) we have implicitly assumed a particle wave function ψ|ψ = 1. In cases where the wave function is not normalizable, one has to take into account the wave function in the above cross sections. This includes processes with photons, where the wave function is well-defined in pQCD for high virtualities. The total γ * p cross section would thus require an additional integration over wave function parameters: with z the fractional momentum carried by the quark, r the distance between the quark and anti-quark, ψ L,T the longitudinal and transverse parts of the photon wave function and σ(z, r) the dipole cross section calculated from the elastic profile function, eq. (3.4). The photon wave function implemented in our approach is given in eqs. (4.1-4.2) and the discussion for γ * A is continued in section 7.

Eccentricity scaling of flow observables
Anisotropic flow is measured as momentum space anisotropies and quantified in flow coefficients (v n ), obtained by a Fourier expansion of the azimuthal (φ) spectrum: with Ψ n the symmetry plane of the nth harmonic. For a hydrodynamical expansion, it has been shown that v 2 and v 3 are proportional to the initial state eccentricity in the corresponding harmonic, v n ∝ n , to a very good approximation [67], with the constant of proportionality depending on the properties of the QGP transporting the initial state anisotropy to the final state. A similar relation may be expected when a pressure gradient is obtained without a thermalized or hydrodynamized plasma [22,68]. In the following, the eccentricities will therefore be taken as a proxy for flow observables, noting that the model imposed for the response may deviate from this linear scaling behaviour. In pp and pA collisions this type of behaviour becomes very apparent, due to the dominance of non-flow effects, 3 in particular at small event multiplicities. Non-flow mechanisms aside, it is clear that no matter what the actual response is, measurable observables will be affected by large deviations in predicted eccentricities.
We follow the usual definition of the initial anisotropy or participant eccentricity [69,70]: Here r and φ are usual polar coordinates, with the origin shifted to the center of the distribution. From eq. (3.9), higher order cumulants can be calculated: In nuclear collisions, the normal participant nucleon eccentricity is used as as a baseline. The notion of "participating" is, however, a model dependent statement. We use the definition from Angantyr [35,36], which defines participating nucleons as either "inelastically" or "absorptively" (inelastic non-diffractively) wounded, see appendix B for a brief review. For pp collisions, and for fluctuations in nuclear collisions, we follow Avsar et al. [25], and define a participant parton eccentricity (though somewhat modified from the cited exploratory work). Assuming that the hydrodynamic evolution takes place at the end of the perturbative parton cascade, the participant parton eccentricity should be evaluated at this point in the evolution. In section 6.1 this participant parton eccentricity will be compared to a more purist initial state approach, where the final state parton cascade is not included. This is meant to inform a discussion about what the notion of an "initial state" really ought to entail.
Parton level eccentricities are, however, not infrared safe. Consider the simple example of a soft gluon emission at the same impact parameter point as its mother. Such an emission will double count this spatial point at parton level, but disappear after hadronization, which will place two such partons inside the same hadron. To improve this, all contributions are weighted by a factor p ⊥ /(p ⊥ + p ⊥min ), where p ⊥min = 0.1 GeV ensures that considerably soft gluons will not double count.
Normalised symmetric cumulants will also be studied. Such quantities eliminate the dependence on the magnitude of the flow coefficients, and should thus remove the response factor between flow harmonics and eccentricities, and directly probe the substructure [71]. They are defined as: where the last approximate equality indicates the removal of the response. Especially interesting for this study is N SC (3,2), it being sensitive to initial-state fluctuations, namely the geometric correlation between 2 and 3 , the elliptical and triangular parts of the Fourier expansion.
Finally it is noted that, since the model is implemented in a full event generator able to generate full final states for pp, pA and AA collisions, it is possible to investigate the event geometry as a function of final state multiplicity with the same acceptance as the experiment.

Monte Carlo implementation
In this section, the Monte Carlo implementation of Mueller's model is briefly described. The full details are given in appendix A. First, the details of the various initial states are described, then some assumptions on the cascade and the interaction are described, and lastly, some geometric properties of the evolution are presented.

The initial states
The new implementation is applicable for both virtual photon and proton beams. A photon state is represented by a single dipole, with a wave function given as, where we include the three lightest (massless) quarks. Here z is the longitudinal momentum fraction carried by the quark, (1 − z) the longitudinal momentum fraction carried by the antiquark, r is the distance between them, Q 2 the photon virtuality and K i the modified Bessel functions. For protons, the wave function is not known. Instead it is represented by three dipoles in an equilateral triangle configuration and normalized to unity, shown previously to give the best description of data [61]. The lengths of the initial dipoles are allowed to fluctuate on an event-by-event basis, chosen from a Gaussian distribution with mean r 0 , and width r w .

The dipole evolution
To implement eq. (2.1) as a parton shower, it is modified by a Sudakov factor: allowing for a trial emission from each dipole in the cascade. The strategy of "the winner takes it all" is then employed, such that only the trial emission with the lowest rapidity is chosen as a true branching. This lowest rapidity then becomes the minimal rapidity in the next (trial) emission. The process is reiterated until none of the trial emissions are below a maximal rapidity, governed by the energy of the collision, with m p the proton mass, m 0 a reference scale set to 1 GeV, and W, √ s the γ * p and pp center-of-mass (CM) energies, respectively. Note that eq. (4.5) is an approximation to the actual rapidity available for the dipole formed by a virtual photon. The "true" rapidity is not well-defined for virtual photons, as it depends not only on W , but also on Q 2 and momentum fractions carried by the quark and anti-quark ends of the dipole. This introduces different rapidity ranges available for either end of the dipole, complicating the evolution further. Equation (4.5) was chosen as the simplest possible rapidity range.
If confinement is taken into account (as described in section 2.3), the evolution equation is modified accordingly: with K 1 the modified Bessel functions of the first kind and r max a maximal radius of the initial dipole, left as a tunable parameter.

Geometric properties of the dipole evolution
Given a specific parameter set, table 1, the probability distribution in rapidity for the first emission, dP/dy, is shown in figure 2. This distribution has a mean at around two units of rapidity. Thus, on average, a new emission is assigned a rapidity of roughly two units larger than the mother (or emitter) dipole. It is worth noting that the inclusion of confinement effects slightly increases the mean as compared to the unconfined distribution. This is caused by the additional suppression of large dipoles, requiring large dipoles to be discarded in the evolution and an emission at a larger rapidity to be tried.   In each step of the dipole evolution a mother dipole is split into two daughters. Figure  3 shows the distribution in sizes of the smaller and larger dipole, scaled w.r.t. their mothers' size for three different evolutions (y max = 4, 8,12). Here, it is evident that on average the larger dipole retains the size of the mother, while the distribution of the smaller is much broader. At lower y max there is a bump in the distribution at around 30-40% of the mothers size, while this bump is less pronounced at larger maximal rapidity. Figure 4 shows the corresponding average and standard deviation in the lengths of all the daughter dipoles scaled w.r.t. the length of their mothers', as a function of maximal rapidity of the evolution. As stated above, it is clear that while the larger of the two daughter dipoles can be taken to be identical to the mother dipole, the size of the smaller dipole has larger fluctuations. The average size of the smaller dipole is, however, fixed at roughly a third of the mother dipole for all y max .
After a full evolution, an initial proton consisting of three dipoles will have evolved to a   larger set of dipoles of mostly smaller sizes than the initial dipoles, cf. figure 5. From these two figures it is evident that the effect of confinement plays a large role in the evolution, effectively reducing the number of large dipoles in the final configuration. Thus, as σ dip ∼ r 2 , confinement is expected to play a large role when evaluating the cross sections. Confinement also introduces more activity -or hot spots -around the endpoints of the dipoles.

Pascal approximation for dipole evolution
The full dipole evolution can be approximated based on the geometric observation above. On average one dipole splitting happens per two units of rapidity, and the lengths of the two resulting dipoles after the splitting, are approximately equal to and one third the size of the mother dipole respectively. This behaviour is tabulated in table 2 for four generations of  Figure 4. The scaled lengths of the daughter dipoles w.r.t. the mother dipole as a function of maximal rapidity for unconfined (a) and confined (b) dipole-evolution. The parameters used in the dipole evolution are the same as presented in table 1. Figure 5. An initial state proton consisting of three dipoles in an equilateral triangle configuration after a full evolution at 7 TeV (corresponding to y max = 8.86). (a) has been evolved without confinement, while (b) has been evolved with confinement. The parameters of table 1 have also been applied in this evolution. evolution. Similar results have been observed within the Dipsy framework, although their dipole swing slightly increases the size of the smaller dipole in a branching [72] to half of the size of the mother dipole.
The number of dipoles in table 2 follows the coefficients of the binomial theorem, with the number in column n row k being equal to n k , and can thus be arranged to form Pascal's y = 0 y = 2 y = 4 y = 6 y = 8 r 1 Table 2. Approximate behaviour of dipole evolution for four generations of dipoles. The number of dipoles in column n row k is equal to the binomial coefficient n k .
triangle. The total number of dipoles after a given number of generation, as well as the number of dipoles of a certain size, can be quickly approximated this way. Knowing the positions of the initial dipoles and the emitted dipole sizes, positions of all dipoles can also be inferred.
To exploit further these simple relations in the dipole evolution, we have created an alternative toy-model denoted the Pascal approximation. Here, the step size in rapidity (∆y) and the size of the smaller dipole (r s = f r m ) in a branching are implemented as tunable parameters, with r m the size of the mother dipole and f a tunable fraction. The number of steps taken in total is calculated from the step size, N steps = y b max /∆y with b = p, γ * and y max given in eqs. (4.4-4.5). To mimic the recoil effects in the full dipole evolution, a Gaussian smearing of the daughter lengths is introduced with mean µ = r m , r s for the larger and smaller daughters, respectively. Knowing the lengths of the mother and daughter dipoles, they are placed in transverse space by calculating the angles of the triangle spanned by the endpoints of the three (connected) dipoles.
This simple approximation is useful for introducing toy-models for sub-leading effects, such as confinement and saturation, as basic quantities like total number of dipoles after a given evolution, can be calculated analytically. A crude model for confinement is introduced by requiring the length of the emitted dipoles to not exceed a tunable maximally allowed cutoff, r max . If this occurs, the branching is discarded and the next step in rapidity is tried. Once the full evolution has occurred, each of the dipoles are allowed to interact using the dipole-dipole scattering amplitude given in eq. (2.6) (or eq. (4.7) for the confined version).
In fig 6(a) the average number of dipoles in a single proton after a full evolution to maximal rapidity y max is shown. The same parameters as given in table 1 are used, while the parameters f, ∆y are extracted from figures 2 and 3. The unconfined Pascal model follow the same trend as the full dipole model, albeit having a slightly larger average number of dipoles at small maximal rapidity. It is evident that the confined Pascal model has a different slope than the full dipole model and the unconfined Pascal model, with the effect of the crude model for confinement clearly seen at large y max . Here, the confined Pascal model results in a smaller average number of dipoles as compared to the full dipole model.
In figure 6 (b) the dipole configuration of an evolved proton for y max = 8.86 is shown.  This has more features in common with full, unconfined dipole evolution than full, confined dipole evolution, with dipoles being more randomly distributed, than focused around hot spots.
A practical advantage of the Pascal approximation, besides being a toy model for testing models for sub-leading effects, is its computational speed. For simple cascade-quantities like numbers of dipoles, results can be calculated analytically. With inclusion of geometry, eventby-event results can be generated approximately a factor of 1000 faster, for large maximal rapidity (y max ≥ 10). It thus serves as a decent replacement for the full dipole evolution model for calculation of cascade properties with limited computational resources. For full calculations of amplitudes and cross sections, the efficiency gain is not nearly as large, as in that case the bottleneck is the calculation of all f ij in eq. (2.6).

Dipole-dipole interactions
The dipole-dipole interactions are defined to occur at rapidity zero and given by eq. (2.6). If confinement is introduced in the splitting kernel (eq. (4.6)), one also has to change the interaction probability in order to make the event generation consistent. This modifies eq. (2.6) to where K 0 is the modified Bessel function and r max the maximally allowed size for a dipole in the evolution.
The choice of collision frame, however, is not trivial. Obviously, no observables should depend on the frame-choice of the collision. In practice, the choice does matter, as no subleading color corrections are included in the dipole evolution. Previous studies have shown that for symmetric systems, e.g. pp collisions, the optimal frame choice is the center-of-mass (CM) frame [73]. This is also utilized in our approach, cf. eq. (4.4), where both beams are evolved the same distance in rapidity. In asymmetric systems, such as γ * p or pA systems, the CM frame lies more towards the heavier of the two objects, and it has been previously argued that the optimal frame here would be the rest frame of the heavier beams [73]. This, however, is not the choice we have taken. The maximal rapidity chosen in eq. (4.5) is found in what we call the center-of-rapidity frame. Here, both beams are evolved the same distance in rapidity, similarly to what is chosen in symmetric collision systems. As already stated, this work does not attempt to include sub-leading colour effects in the evolution, thus frame-independence is not possible to obtain. Hence the simplest choice has been made to use the same frame for all systems, i.e. the center-of-rapidity frame given in eqs. (4.4-4.5).

Assigning spatial vertices to MPIs
In order to utilize the formalism developed so far in real pp, pA and AA events, the dipole cascade is matched to the Pythia 8 MPI model [74]. This allows for evaluation of geometric initial state quantities, such as eccentricities (see section 3.2), at fixed number of charged hadrons in the final state, using a similar definition of charged particles as the experiments. The Pythia 8 MPI model considers pp collisions, treating all partonic sub-collisions as separate 2 → 2 QCD scatterings, which are uncorrelated up to momentum conservation. Other factors present in the MPI model is a rescaling of the parton density between each scattering, preservation of valence quark content and a sophisticated treatment of beam remnants [75].
In the MPI framework, the sub-collisions and their kinematics are selected using the normal 2 → 2 QCD cross section. But since this cross section diverges at low p ⊥ , the expression is regulated using a parameter, p ⊥0 : (4.8) For matching of vertices to each individual partonic sub-collision, it is also useful to note that MPIs are generated in decreasing order of p ⊥ , starting from a (process-dependent) maximal scale. This decreasing order is generated from a Sudakov-like expression of the form: with dσ/dp ⊥i given by eq. with the need to select the impact parameter consistently. 4 Furthermore, new partons are generated by initial-and final-state radiation.
Recently, a method of assigning space-time information to the MPIs in Pythia 8 was introduced [22]. Here, the transverse coordinates are sampled from a two-dimensional Gaussian distribution defined by the overlap of the mass distributions of the two colliding protons. The width of the Gaussian is a free parameter (which should not be too far from the proton radius) and a mean equal to the impact parameter chosen in the MPI framework. Initialand final-state radiation are then treated as small displacements of the selected anchor points of the MPIs. This introduces and additional smearing of an MPI vertex whenever a parton is radiated off from the partons involved in the MPI. The smearing is done using another Gaussian with a width of σ ⊥ /p ⊥ , where σ ⊥ is a parameter with default value 0.1 GeV·fm.
In this work, we utilize the dipole framework to generate the space-time vertices, instead of the default two-dimensional Gaussian distribution. We currently do not use the dipole model to generate the p ⊥ -spectrum for the MPIs, but retain the p ⊥ -distribution obtained internally with Pythia 8. This means that the dipole model is only used to obtain information on the spatial location of the MPIs. Using the dipole framework to generate space-time vertices requires (as with the Gaussian model) some assumptions, as this matching can not be derived from first principles. In order to obtain a reasonable matching, the following is noted: • Each branch of the (projectile) dipole cascade can be identified as a virtual emission, which goes on shell if, and only if, it collides with a corresponding virtual emission from the target.
• Each proton-proton collision has many potential sub-collisions between all combinations of virtual emissions. We order the sub-collisions in terms of contribution to total cross section, thus the MPI with largest p ⊥ is identified with the dipole-dipole scattering with the largest f ij .
The concrete matching is done by first generating two dipole cascades, and allowing them to collide with the same impact parameter used in the generation of the MPIs. This produces a list of possible dipole-dipole collisions, each with an interaction strength f ij . As the MPIs are generated (from hardest to softest), they are each assigned a vertex sampled from this list with a weight equal to f ij , normalised to the summed dipole-dipole interaction strength (and not the unitarized interaction). The vertex is simply given as the mean of the transverse coordinates of the dipoles in the interaction. Once a set of dipoles have been assigned to an MPI, they are both flagged as used, and cannot initially be re-used to ensure a reasonable spread. In cases where the list runs out of interactions containing only unused dipoles, the dipoles are allowed to re-interact, though not with the same dipole as initially.
As opposed to the default model, vertices are now selected from a distribution which event-by-event is asymmetric, and contains "hot spots" with large activity, as shown in figure 5 (b) for the full evolution including confinement effects.
The matching of largest f ij to hardest MPI requires further discussion, as one could argue the opposite. The dipole-dipole scattering amplitude is driven by the distances between the endpoints of the interacting dipoles, as indicated in figure 1. One can argue that small f ij corresponds to small distances, which in turn corresponds to large p ⊥ of the gluons emitted in the interaction. Hard MPIs would with this reasoning correspond to small f ij . This is indeed the choice made in the Dipsy event generator for exclusive final states [40], but opposite to the choice made above. We do, however, also note that the exclusive final states generated by Dipsy describes p ⊥ spectra of charged particles poorly, in particular the high-p ⊥ part of the distributions vastly overshoots data. We therefore currently refrain from associating the dipole sizes directly to the p ⊥ of emerging partons, but rather give larger attention to the cross section. We note that large f ij interactions dominate the cross sections. A guiding principle is therefore to ensure that such interactions are always identified with an MPI, by assigning it first.
There are several possible future improvements of the matching technique. A small improvement of the existing technique, could include to also identify initial state radiation with emissions going on shell, and assign them vertices from the cascade as such. Going beyond improvement of matching techniques, would be a full re-evaluation of the MPI model, with the dipole cascade and interactions as a starting point. The consequences of such an approach could be studied in a toy-model where the p ⊥ obtained with the dipole formulation could be utilized instead of the p ⊥ 's obtained within the Pythia 8 model. It would then be possible to study if this method gives rise to similar problems as the Dipsy MPI description has in the high p ⊥ tail.
Instead of creating a completely new model like Dipsy, it should be possible to use the dipole model to improve the existing Pythia 8 model. A first step would be to replace σ ND in eq. (4.9) with a dynamically calculated cross section, event-by-event. Secondly, the parameter p ⊥0 in eq. (4.8) could be re-evaluated in terms of the dipole model. The physical interpretation of p ⊥0 in the MPI model, is that of a colour screening scale. The perturbative treatment of eq. (4.9) would naively break down at some minimal scale ∼ /r p ∼ Λ QCD , where r p is the (colour screening) size of a proton, left as a free parameter. In the dipole model, this colour screening length could be identified as either the transverse size of the cascade after the evolution, or the length of the largest colour connected dipole chain. In that way the energy dependence of p ⊥0 would also come for free, instead of having to assume a power-law dependence, as is the default assumption in Pythia 8.

Heavy ion collisions
The method described above can be directly applied to heavy ion collisions as they are modelled in the Angantyr framework for heavy ion collisions in Pythia 8 (see appendix B for a brief review, and refs. [35,36] for a full description). In the Angantyr model, sub-collisions are chosen using a Glauber-like approach. Sub-collisions are in turn associated with one out of several types of pp events, depending on the properties of the sub-collision. Since all these events are generated using the MPI model described above, the generalization is only a matter of generating vertices for each sub-collision in its local coordinate system, and then moving them to the global coordinate system defined by the Glauber calculation.

Results I -comparing cross sections
In this section we present results on integrated cross sections for pp and γ * p collisions. For pp we present results for both the dipole evolution model and for the Pascal model, while for γ * p we focus our attention on the dipole model. The main purpose of this section is tuning: the model parameters have to be estimated by comparisons with data, preferably data that we do not aim to make predictions for in later sections.
It is thus not the aim of this section to be able to describe the cross sections perfectlybut more generally, to get an overall agreement between model and data, especially at LHC energies, where we aim to make predictions for the substructure observables.
More dedicated models are available to describe the cross sections at all energies, from the GeV range to the TeV range, results of which are shown alongside results from the dipole model in the pp section. The most widespread model is based on the 1992 total cross section fit by Donnachie and Landshoff (DL) [76] and the models for elastic and diffractive cross sections by Schuler and Sjöstrand (SaS) [77]. Another, more recent model by Appleby et al. (ABMST) [78] is more complex than SaS, and able to describe latest LHC data better. The models are both implemented in Pythia 8, with some additions to the original models [79]. In this paper we compare to the original models, and not those adapted to Pythia 8.

Results for γ * p
We begin with the results on photon-proton total cross sections. Here, we compare the dipole evolution model to data obtained from H1 [80] at different energies and virtualities. We note that the photon wave function implemented only includes the three lightest quarks, and none of the vector meson states present at low Q 2 . Thus we expect the results to be less precise at low virtualities, where the probability for the photon to fluctuate into a hadronic state becomes non-negligible. Similarly, the masses of the quarks should be taken into account if the argument of the Bessel functions become close to the squared quark masses, i.e. if occurring in the limits z → 0, 1 or if Q 2 small. The contribution from c-quarks are neglected for simplicity, the uncertainty arising from this approximation is discussed at the end of the section.
The H1 data presents results on the proton structure function F 2 (x, Q 2 ) at a large range of virtualities and energies. This is translated into a photon-proton total cross section as The total photon-proton cross section, σ γ * p tot , as a function of squared photon-proton center-of-mass energy, W 2 , for several virtualities (a). Note that the distributions for the two highest virtualities (Q 2 = 60, 120 GeV 2 ) have been scaled with a factor of 0.3, 0.1, respectively, for better visibility. (b) shows the ratio MC/data as a function of squared center-of-mass energy, W 2 , for the five different virtualities. follows: with the CM energy given as W 2 = Q 2 (1 − x)/x and c a unit conversion factor. It is evident from figure 7 that we undershoot data at low Q 2 . At intermediate virtualities the model does a fairly good job, while at the highest virtuality probed the prediction overshoots data with roughly 50%. In order to quantify the performance of the models, a χ 2 test has been performed, taking into account the errors of the measurement: with D denoting the cross section measured in data at a given squared energy W 2 , M the model prediction for that squared energy and σ 2 D,M the variance of the data and model predictions, respectively.
The model has been tuned with the Professor2 framework [81], and the parameters are shown along with the χ 2 /N dof in table 3. The parameters of the tune are reasonable, giving a initial dipole size roughly of order 1 fm with a width of the Gaussian fluctuations at around 0.1 fm. Adding confinement allows for a slightly larger initial dipole size, as the largest dipoles in the evolution will be suppressed as compared to the unconfined model, while also the upper integration limit on the photon is allowed to increase when turning on confinement. The width of the fluctuations and the strong coupling appear not to be affected  Table 3. The parameter values obtained when tuning to the σ γ * p tot data set and the χ 2 obtained for the two models.
by the confinement effect. Taking the full H1 data set into account, the confined model gives a reasonable χ 2 /N dof , and performs slightly better than the unconfined model.
Since the charm contribution to the γ * p cross section has been neglected, an assessment of the uncertainty arising from this approximation should be made. Adding massless charm quarks shifts the total γ * p cross section upwards by 67%, estimated by the ratio: This rise in cross section can be tuned away in a way similar to the procedure described above. Adding quark masses (lighter quark masses neglected) reduces the contribution. The reduction is larger for smaller Q 2 . The quantitative effect of adding masses was studied in ref. [72]. For small Q 2 the decrease compared to the massless charm case is ∼ 15% and for large Q 2 the decrease is ∼ 40%. Both represent un-tuned values. A conservative, un-tuned estimate of the uncertainty in figure 7 from neglecting (massive) charm quarks is therefore up to ∼ 25%. Retuning will allow for shifting the cross section upwards in the low Q 2 region where the values in figure 7 undershoots, improving the overall agreement.

Results for pp
In figure 8 we show the total cross section as a function of CM collision energy for both the full dipole model (a) and the Pascal model (b). Both figures show results using the confined (solid blue lines) and unconfined (dashed red lines) models as well as the ABMST (solid green lines) and SaS+DL model (solid magenta lines). It is evident that the full dipole model undershoots data at low √ s, whereas it agrees with data at roughly √ s ≥ 10 2 GeV, with the confined model having a smaller χ 2 /N dof (cf. table 4) than the unconfined model using only this data set. The Pascal model, figure 8 (b), shows an overall shift towards higher cross sections as compared to the full dipole model, thus describing the lower energies well while slightly overshooting the higher energies. With only this data set, both Pascal models have a lower χ 2 /N dof than the dipole models. As explained in section 4.4, the key difference between  the models, is the treatment of confinement as a hard cutoff. In both figures, it is evident that both the SaS+DL and ABMST models perform better, not surprising as these models have been created to reproduce (a subset of) this data.
In figure 9 we show the elastic pp cross section for the full dipole model (a) and the Pascal model (b). Neither of the dipole models are able to describe this cross section, being roughly 50% below data in the entire energy range, except for the very last bins, i.e. at LHC energies. The Pascal model, however, agrees with data at lower energies better than the full model. Also here, the two dedicated models describe the elastic data better than the dipole and Pascal models, with the SaS+DL model deviating from the data at LHC energies, while ABMST describes data in the entire energy range. This is partly due to a modification of the elastic slope in the SaS model, and partly due to the additional trajectories included in the ABMST model: where SaS+DL only contains a single Pomeron in the description of the elastic cross section, ABMST has two -along with additional terms not dominating at these energies. This of course introduces more freedom to the model, thus a better agreement with data at high energies.
The last result is the elastic slope at t = 0, shown in figure 10. Second to the total cross section, this is the most important distribution for us to describe, as this is sensitive to the internal structure of the proton, i.e. the actual impact parameter value used in the calculation, while e.g. the elastic cross section is only sensitive to the average impact parameter. Figure  10 again shows the results for the full dipole model (a) and the Pascal model (b). Here, both models are undershooting the data by roughly 50% in the entire energy range, except that the dipole model is able to describe data in the very last bin. Also here, the ABMST and SaS+DL models predictions are closer to the data than the dipole and Pascal models.  We expect that the introduction of a running strong coupling will aid in the description of the data. This introduction appears in two places: in the dipole evolution and in the dipole-dipole scattering. A larger strong coupling in the evolution decreases the average step size in rapidity and increases the average size of the emitted dipoles, thus allowing for a larger number of larger dipoles at the end of the evolution. This, along with the increased dipoledipole scattering cross section with increased strong coupling, would essentially increase all the cross sections, and thus also the elastic slope. The scale choice in such a running coupling would not be obvious, however, and we thus postpone the inclusion of a running coupling to future work.
The combined results on σ el , σ tot and the elastic slope deserves a further comment. From the optical theorem the differential elastic cross section is: Neglecting the real part of the amplitude puts ρ = 0. The left hand side is often approximated by an exponential: dσ el /dt = exp (B el · t), giving σ el = σ 2 tot /(16πB el ) when integrated over t. The results in figures 8, 9 and 10 are not in agreement with this simple proposition. This can either mean that the exponential approximation is not a good one (which is manifestly true for large |t|), or that the shape of T (b) in our model simply is too narrow, such that the elastic cross section and hence B el is not well described. If the latter is the case, a solution to the problem could be to include more dipoles than three in the initial proton state. This would increase the total and elastic cross sections at low √ s, while the effect at higher √ s could be tuned away by a slightly smaller value of r max or α s . Other studies of proton substructure [71] has indicated that the number of hot-spots required for a satisfactory qualitative description -25 - of LHC data is larger than three, but at the same time, previous studies of the Mueller dipole formalism in the DIPSY event generator [61] indicated that a triangle configuration of initial dipoles is the most suitable choice for a good description of the cross sections. A study of this sort will therefore likely have to rely on attemps of simultaneous description of both sub-structure and cross sections, and will be referred to a future study. Table 4 shows the parameters obtained when tuning to all three observables (σ tot , σ el , B el ) using Professor2. We also show the χ 2 /N dof for three data sets of various sizes. It is striking that the inclusion of the elastic cross section to the χ 2 -calculation swaps the behaviour of the full dipole model -without the elastic data set, the confined model has a lower χ 2 /N dof than the unconfined model, while the opposite is true with the inclusion of the elastic cross section. This swap is caused by the first two data points in the elastic cross section, where the unconfined version of the full dipole describes data slightly better than the confined version. The parameters of the dipole model obtained with the tunes show the behaviour also observed in γ * p: adding confinement allows for an increased initial dipole size and slightly larger fluctuations around this size. The initial dipole size seems reasonable for both the confined dipole model and the unconfined Pascal model, giving sizes of the order r 0 ∼ 0.7 − 1. fm also confirmed by Dipsy (r 0 ∼ 0.7 fm) and proton charge radii measurements (giving roughly r 0 ∼ 0.9 fm).
As already stated, the inclusion of a running strong coupling is expected to improve results for both the dipole and Pascal model. As we currently are aiming to describe proton substructure at the TeV scale, we can, however, ignore the small deviations from σ tot and B el at lower energies for the moment.  Table 4.
The parameter values obtained when tuning to the σ tot , σ el , B el data set and the χ 2 obtained for the different models.

Results II -eccentricities in small and large systems
In this section we turn our attention to predictions related to the geometry of an event. The parton-level eccentricities of both small and large systems are examined using the matching between the dipole model and the MPI framework described as in section 4.6. Results from the dipole model 5 are shown along with the default models of Pythia 8: in pp collisions, the default scheme of Pythia 8 is a transverse placement according to a 2D Gaussian, while for larger systems two default Pythia 8 methods are available -the usual Glauber approach and the 2D Gaussian pp model extended to larger systems. In order to compare to data, all events are hadronized with Pythia 8 after the parton-level eccentricities are calculated. Results are presented and in a single case compared to data from ALICE [82]. Parton level eccentricities were calculated with a p ⊥ weighting, cf. section 3.2, and events accepted if they passed the ALICE high-multiplicity trigger. Eccentricities and normalized symmetric cumulants are presented as a function of average central multiplicity (|η| < 0.8).
Recall from section 4.6 that Pythia 8 includes a p ⊥ -dependent Gaussian smearing of parton vertices in the initial-and final-state shower. It is not clear from first principles whether such effects should be included in the calculation of geometric quantities or not. Consider, on one hand, creation of a QGP at early times, right after the collision. Here a parton shower will not be able to influence the geometry of the event, before a hydrodynamic response should be taken into account. On the other hand, one can imagine a system with large gradients (such as a small collision system) which will take time to hydrodynamize, and will therefore be influenced by geometric fluctuations from the final state shower as well. It is important to note, however, that no QGP is assumed in any results presented below as no QGP is assumed in neither Pythia 8 nor Angantyr.
Opening up for a discussion, we show results in figure 11 with and without shower. It is evident that the eccentricities are vastly affected by the models. First consider the simplest case, i.e. placing all MPIs in the proton center and not introducing any shower smearing. This gives no eccentricity as expected, cf. solid black line in figure 11. Symmetric distributions, such as the 2D Gaussian shower smearing and the MPI vertex assignment, should in principle give no eccentricity. But, as we are sampling only a finite number of MPIs from such symmetric distributions, an eccentricity does appear for these models, cf. dashed red and dashed black lines of figure 11. The two methods overlap, thus the exact same effect can be introduced with either (a) no MPI vertex assignment with a Gaussian smearing from the shower or (b) a Gaussian MPI vertex assignment and no shower smearing. That the two overlap is not so surprising as both are Gaussian smearings, and applying such a smearing during the shower or assigning it to the MPIs should make no difference: both methods give rise to a more lenticular overlap region of the two colliding protons.
Applying Gaussian smearing twice, i.e. both in the MPI vertex assignment and during the shower smears the lenticular shape from the MPI assignment slightly, thus causing the eccentricity to drop, cf. the solid red line in figure 11. The largest effect on the eccentricity is seen when purely considering MPI vertex assignment with the dipole model, cf. the dashed blue line in figure 11. The eccentricity with the dipole model is approximately twice as large as with the Gaussian model, thus indicating that event-by-event asymmetries in the initial state gives rise to larger fluctuations and thus larger eccentricities. Adding the Gaussian shower smearing on top of the dipole model, solid blue line of figure 11, washes out some of these features -i.e. makes the almond shape of the overlap region rounder. Figure 12 shows the eccentricities 2,3 {2} in three different systems. Beginning with 2 we observe in pp that the dipole evolution gives rise to a larger eccentricity than the 2D-Gaussian. In the dipole evolution, the asymmetry is built in at the cascade level, where in the 2D-Gaussian, where MPIs are sampled from a symmetric distribution, asymmetry only arises due to the sample size. Proceeding to larger systems, pPb and PbPb, it is evident that the same trend is seen: the dipole model gives rise to larger 2 than the symmetric model. The Glauber model, however, is consistently larger than the other two models at low multiplicity, while all three models appear to approach the same eccentricity at higher multiplicities. Thus it becomes evident that the proton initial state does have an effect on eccentricities, and that it is especially evident in low-multiplicity events, e.g. peripheral PbPb collisions.
Unfortunately, the low-multiplicity events are often marred by large non-flow effects. Measuring the eccentricities with higher-order cumulants can remove some of the contributions from non-flow, thus making it easier to compare data to models. We present results for  The second (b) and third (c) order eccentricities using two-particle cumulants for pp, pPb, PbPb collisions (solid, dashed, dotted, respectively) using the Glauber (black), Gaussian (red) and dipole (blue) models.
Another feature seen in figure 12 is that the dipole model gives roughly the same results for 2,3 in both pp and pPb collisions. If one assumes that the response functions are the same for the two systems (however one may have obtained these response functions, QGP or by string-string interactions), the ratio of pPb to pp eccentricities should thus be comparable to the ratio of flow coefficients measured with the ALICE detector. This ratio is shown in figure  13 for the second-order eccentricity. Both the Gaussian and dipole models are compatible with the ALICE data, however, so we cannot presently discriminate between the two. Additional measurements of the flow coefficients in low-multiplicity events are thus required in order to discriminate between models in this observable. Figure 13 (b) shows the normalized symmetric cumulant, N SC (3,2). This has been constructed to study the correlations between the eccentricities and normalized to the uncorrelated eccentricities in order to remove the effects of the response function. ALICE reports that all three systems have the same N SC (3,2) at the same average multiplicity, indicating that the correlation between the flow coefficients are the same in different collision systems. We observe no such effect. Focusing on the dipole model, the correlations appear equal in magnitude for pp and pPb, but PbPb results are consistently below the smaller systems.
Results for the Gaussian model shows no similarities at all between systems, as the pPb N SC(3, 2) is positive, while pp and PbPb are negative. Thus the normalized symmetric cumulants for pPb systems would be an ideal place to discriminate between the symmetric and asymmetric initial state. PbPb results for all three models are in agreement with IP-Glasma predictions presented in the ALICE paper. The main difference between the dipole model and the IP-Glasma approach is the inclusion of saturation in the cascade of the latter. As the two approaches give similar results, we do not find that saturation plays a large role in this observable.

Flow fluctuations in pPb collisions
Recently, CMS presented results on multi-particle correlations using higher-order particle cumulants in pPb collisions [83]. Ratios of the flow-coefficients based on these cumulants were presented, including the first measurements of the ratio of v 3 {4}/v 3 {2} in pPb. In figure  14 we show the predictions for the ratios with the confined dipole model and the default Gaussian model as a function of multiplicity. Both models reasonably reproduce the shape seen in the elliptical ratio, figure 14 (a) showing v 2 {4}/v 2 {2}, while the normalisation of the dipole model is slightly better than with the Gaussian model. For the triangular ratio, figure  14 (b), both models appear to undershoot data at high multiplicities, where data is available. As opposed to model predictions presented in the CMS paper [84,85], our predictions have not been applied a 10% ad hoc increase in normalisation. And where the model predictions presented in [84,85] predicts roughly the same ratio for both 2 and 3 , neither the dipole nor the Gaussian model predicts the same normalisation for the two ratios, cf. the height of figure 14 (a) and (b) differs. Figure 15 shows the higher-order cumulant ratios for elliptic flow as a function of the lower-order ratio presented in figure 14 (a). For higher order cumulants, the Gaussian model predicts purely imaginary values for even powers of the cumulants, hence it has been left out of the figures. The dipole model, however, is able to describe data reasonably well. The dipole predictions decrease with decreasing v 2 {4}/v 2 {2} ratio in figure 15 (a), while being roughly constant at unity in figure 15 (b). This is in accordance with the model predictions presented by CMS [86], assuming a non-Gaussian model for the initial state. We note that the eccentricities presented with the dipole model here are (a) based on a pQCD model, and (b) related to final state multiplicities calculated in the same acceptance as the experiment.

Results III -dynamic colour fluctuations in Glauber calculations
A general feature of several models describing both collisions of protons and of nuclei, is the notion of interacting nucleons and nuclear sub-collisions, calculated in the formalism of Glauber [87,88]. The basic formalism is mainly concerned with calculating the full AA  Figure 15. Correlations between higher order flow harmonics as measured by the CMS experiment, compared to correlations between higher order eccentricity ratios calculated in the dipole model.
scattering matrix or amplitude from knowledge of the nucleon-nucleon amplitude and spatial positions. Multiple interactions between nucleons factorize in transverse coordinates, so in the eikonal limit the S-matrix for scattering between two nuclei A and B becomes: where i and j denote the individual nucleons, b is the nucleus-nucleus impact parameter and b ij is the nucleon-nucleon impact parameter. We will here consider the simplifying case where only one projectile (either p or γ * , called n below) collides with a nucleus (A), which reduces eq. (7.1) considerably: If no fluctuations in the interaction are included, the projectile-nucleon elastic amplitude can be inserted, and the total and elastic cross sections can be calculated directly from eqs. (3.4-3.5). If fluctuations for projectile and target are included, as calculated for example in the dipole model, the amplitude will depend on the states of target (t i ) and projectile (p) respectively. As shown in section 3.1, the elastic amplitude can be calculated as an average over all states. In ref. [89] it was pointed out that in the evaluation of such an average, the projectile must remain frozen in the same state throughout the passage of the target. Similar to eq. (3.3) the elastic profile function (at fixed Mandelstam s) for a fixed state (k) of the projectile scattered on a single target nucleon (all states) becomes: where previously suppressed indices k and t on T are spelled out for clarity. For a projectilenucleus collision, with the projectile frozen in the state k, the relevant projectile-nucleon (nN i ) amplitude becomes: T In the short hand notation on the right hand side, the average over the repeated index t is suppressed. This is the amplitude used to determine which nucleons are "wounded" in a collision. If the purpose is to determine which nucleons participate in the collision either elastically or inelastically, the differential wounded cross section can be calculated with the normal differential pp total cross section as an ansatz, dσ tot /d 2 b = 2 T p,t from eq. (3.4). Since the projectile should be frozen in the state k, the expression for T from eq. (7.4) is inserted to the differential pp total cross section. This just recovers the normal total projectile-nucleon cross section: dσ tot In a Monte Carlo, the number of wounded nucleons can then be generated by assigning each projectile or nucleon a radius of σ tot /2π, where the expression in eq. (7.5) has been integrated over d 2 b to give σ tot . Normally one is not interested in the number of wounded nucleons including elastic interaction, but rather those that contribute to particle production (i.e. where there is a colour exchange). A usual approach is to just use the inelastic cross section in place of σ tot in the Monte Carlo recipe. This does, however, not account fully for colour fluctuations, as the inelatic cross section is modified when averaging over target states with a frozen projectile. Instead of directly using the inelastic cross section in the Monte Carlo, the modified cross section should be used. This cross section was dubbed the "wounded cross section" in ref. [35], and can be constructed by generalizing the inelastic cross section, using eq. (7.4). The inelastic cross section can from eqs. (3.4-3.5) be directly written down as: dσ inel When the frozen projectile is taken into account by inserting T from eq. (7.4), the usual expression is now not recovered, but the average over targets must be made before squaring the second term: with internal indices again suppressed in the last equality. In a Monte Carlo this can be generated as above, now only by inserting σ w in place of σ tot .
Generalizing this procedure to γ * A collisions requires additional considerations. Starting from the elastic profile function for γ * p, a contribution from the photon fluctuating to a dipole state must be included. Examining only the hadronic (non-VMD) components of the photon state, gives: |γ * ∼ c 1 |qq + c 2 |qqg + higher order Fock states (7.8) where quark helicities have been neglected. We keep only the first (leading order) term, as the higher order Fock states are included in the dipole evolution. Thus with a photon wave function given in eqs. (4.1-4.2), we obtain: with |ψ I a dipole state. The elastic profile is now: The wounded cross section for γ * A collisions can now be defined. The first interaction is calculated using the photon wave function in the elastic profile function, leading directly to: This first interaction has now turned to photon from a superposition of all dipole states into a single, specific dipole (or vector meson). This is the state that the projectile should be frozen to throughout the passage of the nucleus: the first interaction chooses a specific dipole state |ψ I z, r with given z and r. This reduces the elastic profile function for the secondary interactions to the well known eq. (3.3), from which a differential wounded cross section has already been calculated (eq. (7.7)). Thus, in a Monte Carlo, the number of wounded nucleons can be generated with the following method: • First by selecting, for each event, a dipole with r and z corresponding to the wave function weight, w γ in eq. (A.35) • Secondly, testing if any nucleons are hit including the photon wave function normalization proportional to α em (i.e. according to eq. (7.10)) • If any nucleons are hit, then subsequently testing all (other) nucleons, w.r.t. the dipoletarget weight (i.e. eq. (3.3)) In the following section, colour fluctuations from the introduced dipole model (where T ( b) can be evaluated directly from eq. (2.7)) are compared to a parameterized approach for fluctuations in pp collisions and γ * p collisions, and finally for γ * A. and a log-normal fit (eq. (B.2)). (b) Fluctuating cross section in γ * p at W 2 = 5000 GeV 2 and various Q 2 , calculated with the dipole model (the double peak structure for Q 2 = 20 GeV 2 is a statistical fluctuation). The cross section is shown on a logarithmic horizontal axis, to assess the log-normal approximation (cf. eq. (B.2)).

Colour fluctuations in pp, γ * p and γ * A collisions
Fluctuations in the pp cross sections, to estimate the influence of fluctuations in pA collisions, are often parametrized using [90][91][92]: where σ 0 and Ω are parameters, and ρ is a normalization constant. In ref. [35] is was found that a log-normal distribution (see eq. (B.2) in appendix B) describes fluctuations generated by a dipole approach better. In figure 16 (a) both parametrizations are compared to the fluctuating total cross section in pp at √ s = 5 TeV, integrated over d 2 b. While the log-normal distribution does better in capturing the skewness of the distribution, none of the two parametrizations fully describes the distribution. The problem increases in γ * p for several reasons. First of all, any parametrization must include the correct dependence on DIS kinematics, which changes the average cross section, cf. figure 16 (b). Here is shown the cross section distributions for three values of Q 2 all with W 2 = 5000 GeV 2 with a logarithmic first axis. This allows for a by-eye assessment of the validity of a log-normal fit, as a log-normal distribution is Gaussian with such choice of axes. It is seen directly that fluctuations in the high-σ tails are too large to be described by such a parametrization.
Instead of the parametrization approach, the wounded cross section can be calculated directly from the dipole evolution. This also allows for simultaneous calculation of both the part including electromagnetic contributions, and the pure dipole part (given z and r), as introduced in the previous section. This allows directly for a calculation of the distribution of wounded nucleons in a γ * A collision, as shown in figure 17. In the figure, the nucleus is taken to be Au-197, colliding with a virtual photon with W 2 = 5000 GeV 2 for a range of Q 2 values, compatible with projected EIC design [37]. Two methods of calculating wounded nucleons are presented: the full treatment using a frozen wave function, where the photon wave function has collapsed to a dipole state when probed by the first collision, and the naive black disk approach, where the photon re-forms after the full collision and no fluctuations are included.
In such a treatment, the cross section for additional collisions has an additional factor α 2 em compared to the frozen treatment, from the normalization of the wave function. It is directly visible that a full treatment is necessary in order to provide reasonable phenomenological projections for a new collider.

Conclusion and outlook
One of the main challenges for the understanding of collective effects, is to grasp how the wellknown understanding of flow results from heavy ion collisions can be transferred to collision of protons with protons and heavy nuclei. In this paper we have presented a Monte Carlo implementation of Mueller's dipole model with several sub-leading corrections, and with all parameters of the model fixed to total and semi-inclusive cross sections calculated within the Good-Walker formalism. This model thus allows for the calculation of proton substructure without tuning any model parameters to observables sensitive to said substructure. The current implementation of the model includes: • BFKL evolution of projectile and target states, be it protons or photons, in rapidity and impact parameter space.
2. Non-eikonal corrections in terms of dipole recoils.
3. Confinement effects by the introduction of a fictitious gluon mass.
• Projectile-target interactions using the unitarized amplitude, which in a Regge field theory language corresponds to multi-Pomeron exchange and Pomeron loops.
• Matching to the Pythia 8 MPI model, in order to assign spatial vertices to produced partons in pp collisions.
• Generalization to heavy ion collisions through the Angantyr framework.
Besides the implementation of the dipole model, a simpler version has been provided, based on the geometric properties of the dipole evolution. This model, denoted the Pascal approximation, allows for easy insertion of toy-models of sub-leading effects, thus giving a handle on the importance of such effects.
We have shown that given simple, but reasonable, assumptions of a final-state response (from e.g. hydrodynamics or interacting strings), the eccentricities produced with the implementation provides a reasonable description of flow data from the ALICE and CMS experiments. This includes non-trivial observations such as ratios between pA and pp flow coefficients at fixed event multiplicity, normalized symmetric cumulants in different systems, and ratios between different order flow harmonics in pA collisions. All are signatures which cannot be described in a simpler model, where the spatial structure of MPIs are assumed to be distributed according to a rotationally symmetric distribution. We want to stress that even though we have here chosen flow-type observables to illustrate the effect of the space-time structure of the initial state on observations of collective effects, effects linked to enhancement of strangeness and baryon production [93][94][95] and even modifications of jets in high multiplicity pp collisions [96,97] are expected to be influenced as well.
Lastly, we have provided the initial steps towards the generation of fully exclusive final states in electron-ion collisions, by determining the importance of colour-fluctuations in the collisions with virtual photons. We have shown that previous parametrizations from pp collisions do not fully capture the colour-fluctuations predicted by the dipole formulation of BFKL evolution, and thus argue that it is better to calculate the cross sections directly from the dipole model -which has not been possible in the Angantyr model before this work. Secondly, we stress that the collapse of the photon wave function at first interaction provides a larger number of wounded nucleons as compared to the black disk approximation. Each of the wounded nucleons are expected to give rise to final state activity, thus more complicated final states are expected with the proper treatment as opposed to the naive expectations. The implemented dipole model can be improved in several ways, including: • Running α s in the dipole evolution and in the scattering, which will capture some of the NLO corrections in α s .
• On longer term, an inclusion of full NLO-BK should be the goal, though further theoretical development is needed first.
• Gluon saturation effects in the cascade such as those included in the CGC formalism.
To maintain the current treatment of the effect of gluon branchings in the cascade (as opposed to CGC), this could be included by the introduction of a simple swing mechanism, e.g. a mock 2 → 1 dipole recombination.
• Several improvements are expected w.r.t. the initial dipole configuration in protons and photons, as well as in the wave functions of these particles. This includes adding the VMD contribution to the photon wave function, to be able to study lower Q 2 and vector meson production in various processes.
• New ways of treating MPIs in pp collisions by fully merging the dipole approach with more traditional approaches are foreseen. It is our hope that this could provide new tools to improve understanding of particle production mechanisms across collision systems.
Detailed understanding of the interplay between the proton geometry and the response of final state interactions in hadronic and heavy ion collisions, is crucial for the understanding of collectivity and particle production mechanisms. Since detailed understanding requires tools which are both accessible and transparent, it is our hope that the detailed treatment presented here, and the accompanying open Monte Carlo implementation, can help facilitate this process. We here work with light cone momenta, and can thus define the rapidity as with the latter equality valid for massless particles. Hence we can express the lightcone momenta in terms of dipole p ⊥ and rapidity, The p ⊥ of a dipole can be related to its size through p ⊥ ∼ /r. The dipole-dipole scatterings are defined to occur at rapidity zero. Thus the evolution of the beams begin at rapidity y = ±y max and evolve to zero, i.e. with negative rapidity steps. For technical reasons, the actual evolution is easier to implement with positive steps in rapidity. Thus the internal rapidity used in the code (and in the next section) is negated w.r.t. the rapidity defined in eq. (A.2): where in the forthcoming sections we will skip the subscript MC.

A.1 Mueller's dipole branching
We begin by examining the dipole splitting function, where r is the transverse location of the emitted parton 3 from the original dipole spanned by partons 1, 2 and r ij the length of the dipoles, also shown in figure 18. In order to turn this into a dipole evolution, a Sudakov factor, ∆(y min , y), restricting emission between y min and y, has to be introduced. For event generation to proceed, we need to find an overestimate for the above splitting probability. The Sudakov factor is included via the veto algorithm, and is thus neglected in the expressions below. First we sample a transverse location of the emitting dipole. Assuming partons 1 and 2 located in the (x, y)-plane at r 1 = (0, 0) and r 2 = (1, 0) with length r 12 = 1 fm, while the emitted parton is located at r 3 = (r x , r y ), we can write the splitting probability as, where in the second step we have inserted the values for r 2,x = 1 fm and r 2 12 = 1 fm 2 , but suppressed dimensions. These dimensions are suppressed throughout the section. This distribution is symmetric around r x = 1/2 fm and r y = 0 fm, so the limits of integration can be changed from r x/y ∈] − ∞, ∞[ to r x ∈ [−∞, 1/2] and r y ∈ [−∞, 0]. The above splitting probability can be overestimated by the function, dP 1 dy dr x dr y = N c α s 2π 2 2 (r 2 x + r 2 y )(r 2 x + r 2 y + 1 4 ) .

(A.9)
Changing to cylindrical coordinates we obtain, dP 1 dy r dr dφ = N c α s 2π 2 2 r 2 (r 2 + 1 4 ) , (A.10) which can be integrated from a minimal dipole size, ρ. Thus we obtain, Without energy ordering, the minimal dipole size ρ has to be fixed to a number larger than zero to avoid the distribution from blowing up. Here, energy the ordering is introduced by ordering of positive lightcone momenta [39]. Again relating the transverse momentum of the dipole to its size, gives an expression for ρ related to the kinematics of the parent dipole (p), This expression is then used in eq. (A.11), This overestimate cannot trivially be integrated, so we find yet another, which is both integrable and invertible. We take into account the Sudakov factor by using the Veto algorithm, and thus the rapidity can be sampled from this distribution by where R 1 is a uniformly distributed random number. From eq. (A.11) we can sample both r and φ, with R 2 , R 3 two new random numbers. Here we should note that we've changed the integration limits, such that any r x = r cos(φ) > 1/2 must be rejected in the event generation. Half of the remaining events should be mirrored to r x → 1 − r x , and this should be taken into account in the overestimate dP 1 /dy dr x dr y as well, such that dP 1 dy dr x dr y = N c α s 2π 2 2 (r 2 x + r 2 y )(r 2 x + r 2 y + 1 4 ) → N c α s 2π 2 1 (r 2 x + r 2 y )(r 2 x + r 2 y + 1 4 ) The events are weighted to the correct distributions with, such that if w r w y < R 4 the event is rejected and the process is reiterated. The evolution of an initial dipole thus goes as follows. Firstly, a trial emission from the initial dipole is performed according to eq. (A.6). If the rapidity y 0 of this emission is below the maximally allowed rapidity, then the trial branching is accepted, thus two new dipoles are created. Trial emissions are then allowed from each of these dipoles using y min = y 0 in eq. (A.6). This creates two new emissions with rapidities y 1,2 . But here only the dipole with the smallest rapidity is accepted. Thus after the second iteration we have three dipoles, from each of which trial emissions are created and only the emission with the smallest rapidity is accepted, thus creating an additional dipole. The process is reiterated until no trial emissions are produced below the maximally allowed rapidity. The process is visualized in figure 18.
The choice of p ⊥ of the emitted parton is not obvious. Here we assign the parton the largest p ⊥ of the system, . (A.21)

A.1.1 Ordering of lightcone momenta
We here rely on approximate energy conservation through ordering of p + . This has already been discussed in the above, where we found the cutoff for small dipoles in the event generation of r, eq. (A.12). Thus we have implemented energy conservation as which implies a rapidity-dependent cutoff for smaller dipoles. Momentum conservation is introduced through the ordering of p − , where it should be noted that this requirement is applied after the recoils have been taken into account. This choice also sets an upper bound for the dipole size through

A.1.2 Recoil effects
The recoil of the emitted parton is shared equally between the partons spanning the emitting dipole. Energy conservation requires that the energy of the emitter after the emission of a new dipole equals the energy of the emitter before the collision minus the recoil, The recoil cannot be determined from first principles thus have to make an ansatz. The choice here is also from [39],

A.2.2 Photons
The wave function used in this work is presented in eqs. (4.1-4.2). The full cross section for γ * p is then given as dφ |ψ L (z, r)| 2 + |ψ T (z, r)| 2 σ(z, r), (A. 30) with σ(z, r) the dipole-dipole scattering cross sections given in equations (3.4-3.6). The dipole-dipole scattering cross section goes roughly as the square of the size of the largest dipole, σ(z, r) ∼ r 2 , thus an overestimate of the γ * p cross section can be found by sampling the parameters from the following distributions, we obtain The maximal value obtained in the sum is kept to accept or reject the integrand in the algorithm, where first and z i , r i are chosen and then accepted w.r.t.
If this weight is less than a new random number, w γ < R 4 , the event is rejected. If kept, the event is given a weight w = σ O γ * p /r 2 i to take into account the overestimation of the dipole-dipole scattering cross section.

B Appendix: The Angantyr model for heavy ion collisions
The Angantyr model for heavy ion collisions is based on the following four components: • Firstly, the position of the nucleons inside the nuclei has to be determined.
• Secondly, the number of interacting nucleons and binary N N collisions has to be calculated within the Glauber-Gribov (GG) formalism.
• Thirdly, the contribution to the final state of each interacting nucleon has to be determined. Here Angantyr uses the wounded nucleon model by Bia las, Bleszyński and Czyż [98].
• Lastly, any hard partonic subcollision has to be modeled, thus introducing the concepts of primary and secondary absorptive interactions.
Each of the four components will here be shortly reviewed. For the full explanation, see [35,36].
The nucleon distribution is generated using a Woods-Saxon potential: ρ(r) = ρ 0 1 + wr 2 /R 2 1 + exp ((r − R)/a) (B.1) with ρ(r) the radial density of the nucleons, R the radius of the nucleus, a the skin width and w the Fermi parameter, introducing a varying density but set to zero in Angantyr. The A nucleons are thus generated randomly according to P ( r i ) = ρ( r i )d 3 r i , assuming isospin invariance, such that p = n. Angantyr uses the hard core assumption, such that a new position for a nucleus is tried if the distance to its neighbours falls below twice the hard-core radius R h . Once the nuclear distributions are set up, the impact parameter of the collision is sampled using a Gaussian distribution. This information is then passed the GG framework, which determines the fluctuations of the target and projectiles. The fluctuations arise because of fluctuations in the proton wave function. Because the wave function enters in the cross section calculations and because it is assumed that the projectile state is frozen during its interaction with the target, these fluctuation are then translated into fluctuating cross sections. In the dipole model, the probabilistic nature of the dipole evolution gives rise to different dipole configurations before the collisions, thus giving rise to different dipole-dipole interactions and hence integrated cross sections. Angantyr uses a probability distribution for the cross section in pA extracted from Dipsy: T 0 (r p + r t ) = 1 − exp − π(r p + r t ) 2 σ t α , (B.6) where P (r) determines fluctuations in the nucleon radius r p and r k . T ( b, r p , r t ) again describes a slightly modified elastic amplitude with an opacity T 0 depending on the radii of both the target and projectile. Here, the parameters σ t , α, k, r 0 are tuned to data. The number of wounded target nucleons in pA collisions is then determined by dσ Wt with S pt the S-matrix for a given target (t) and projectile (p) state. Subscript on the brackets determines averages over one side only. In AA collisions Angantyr distinguishes between absorptively and diffractively wounded nucleons, with the former dominating given by, dσ abs 8) and the latter determined by generating the auxiliary states p , t for both target and projectile, and from these determining the number of wounded target states with either t or t from eq. (B.7), i.e. using either S pt or S pt in the derivation. Non-negative probabilities are ensured by shuffling when to use t, t . Once the number of wounded target and projectile states has been determined, the wounded nucleon model is used to create final-state partons, dN ch dη =w p F (η) + w t F (−η), (B.9) with the functions F (±η) determined from the MPI framework of Pythia 8. Each nucleon in the target (and projectile) is allowed to interact several times, similar to an ordinary pp collision containing several MPIs. Thus the pairs of projectile-target nucleons are ordered w.r.t. their impact-parameter b µν . The list is iterated over several times in order to determine which pairs give rise to a primary absorptive scattering, and which are secondary. Once a pair has been selected, the MPI framework of Pythia 8 is used to generate an event, and the pair is marked as having interacted in a primary interaction. If the pair is again chosen to interact, it will be marked as a secondary interaction. After the determination of the absorptive interactions, the diffractive ones are chosen by iterating the list several times, thus creating primary and secondary diffractive interactions. An already wounded nucleon cannot be further excited, but an unwounded nucleon can participate in several diffractive interactions, until itself becomes wounded. After the determination of the absorptive and diffractively primary and secondary interactions, each of the events are passed to Pythia 8 and the parton-level events are stacked on top of each other. The Pythia 8 description of single-diffractive events are modified to look like non-diffractive ones, to describe the secondary absorptive events, while diffractive primary and secondary events remain unmodified. We are thus left with a large set of partonlevel events that can be passed to the hadronisation framework of Pythia 8 and further analysed.

C Appendix: Additional eccentricity figures
In this section, we show additional eccentricity figures not presented in the main body of the text. Figure 19 (a-c) shows 2 using higher-order cumulants in the evaluation. It is   evident that the eccentricities are the same regardless of the number of particles used in the calculation, except for the effects from lack of statistics in the high-multiplicity tail for both the pp and pPb figures. Figure 19 (d) shows the normalized symmetric cumulant N SC (4,2). This cumulant is positive in the entire multiplicity range, consistent with measurements in ALICE. Here, it is evident that discrimination between models would be possible in both pp and pPb collisions, as opposed to N SC (3,2) where discriminatory power was not evident in pp collisions.
For completeness, we also show the eccentricities 1,3 obtained in pp collisions with and without shower smearing in figure 20. Both are shown to give an estimate of the effects on the size of the additional terms in the Fourier expansion of the flow coefficients.