Theory, phenomenology, and experimental avenues for dark showers: a Snowmass 2021 report

In this work, we consider the case of a strongly coupled dark/hidden sector, which extends the Standard Model (SM) by adding an additional non-Abelian gauge group. These extensions generally contain matter fields, much like the SM quarks, and gauge fields similar to the SM gluons. We focus on the exploration of such sectors where the dark particles are produced at the LHC through a portal and undergo rapid hadronization within the dark sector before decaying back, at least in part and potentially with sizeable lifetimes, to SM particles, giving a range of possibly spectacular signatures such as emerging or semi-visible jets. Other, non-QCD-like scenarios leading to soft unclustered energy patterns or glueballs are also discussed. After a review of the theory, existing benchmarks and constraints, this work addresses how to build consistent benchmarks from the underlying physical parameters and present new developments for the PYTHIA Hidden Valley module, along with jet substructure studies. Finally, a series of improved search strategies is presented in order to pave the way for a better exploration of the dark showers at the LHC.

Model (SM) by adding an additional non-Abelian gauge group. These extensions generally contain matter fields, much like the SM quarks, and gauge fields similar to the SM gluons. We focus on the exploration of such sectors where the dark particles are produced at the LHC through a portal and undergo rapid hadronization within the dark sector before decaying back, at least in part and potentially with sizeable lifetimes, to SM particles, giving a range of possibly spectacular signatures such as emerging or semi-visible jets. Other, non-QCD-like scenarios leading to soft unclustered energy patterns or glueballs are also discussed. After a review of the theory, existing benchmarks and constraints, this work addresses how to build consistent benchmarks from the underlying physical parameters and present new developments for the pythia Hidden Valley module, along with jet substructure studies. Finally, a series of improved search strategies is presented in order to pave the way for a better exploration of the dark showers at the LHC.

Introduction
As the experimental program of the LHC searches for physics beyond the Standard Model (SM) is maturing, the community has started devoting significant effort to investigating alternative models and their associated phenomenology, especially those providing exotic signatures which would not have been directly addressed yet in the existing searches. Of interest here is the case of a strongly coupled dark sector or hidden sector, which extends the SM with an additional non-Abelian gauge group. Considering the non-trivial structure of the SM QCD, we should be open to the idea of a potentially com-plicated dark sector via non-Abelian gauge groups. These extensions generally contain matter fields, much like the SM quarks and gauge fields similar to the SM gluons. There are no a priory expectations on the gauge group dimension (number of colors), or that of matter fields (number of flavors) that the theory may have.
When the dark sector confines below some confinement scale ( D ) dark hadrons are formed, and depending on the symmetries of the theory, some of them could be stable leading to dark matter candidates. To allow for the production of dark states at the LHC, which could be either dark quarks or hadrons, the dark sector is coupled to the SM via a portal. The realisation of associated LHC phenomenology of such dark/hidden sector is however very much dependent on the details of the model. Nevertheless, some generic expectations can be set. For example, at the LHC, cases where dark quark masses (m q D ) and corresponding confinement scale are much smaller than the collider centre-of-mass energy (m q D D √ s) lead to spectacular signatures in terms of emerging or semi-visible jets [1][2][3]. Increasing D implies heavier bound states, which in turn decreases the final state multiplicities for a given √ s, as the allowed phase space decreases. This means that as the limit D ∼ √ s is approached, depending on the relevant production mechanisms, 2 → 2 SM initial state to dark meson final state processes become prevalent and resonance-like searches for dark bound states may prove useful [4][5][6][7]. Finally, cases where m q D D , m q D √ s lead to unusual signals known as quirks [8][9][10]. If the strongly-interacting sector is non-QCD like, other signatures such as Soft Unclustered Energy Patterns are also possible [9,10]. For a review pertaining to this discussion see [6].
These hidden valley models differ from most other Beyond the Standard Model scenarios because the infrared (IR) parameters of the theory can not be computed from ultraviolet (UV) definitions using perturbative techniques. This is in contrast to many other models e.g. MSSM, which have a well-defined relationships between UV and IR, even if their parameter space is high dimensional. The other well known approach to characterise new physics is the use of simplified models which do not rely on such top-down priors but are defined by their minimality; these may not be effective for hidden valley scenarios either, due to the inherent dependence on UV parameters in strongly interacting theories. Therefore, neither principles used otherwise to analyse new physics scenarios apply to hidden valley models.
While on the one hand these considerations motivate avenues for model-building with applications to shortcomings of the SM such as e.g. dark matter, LHC searches for hidden valleys are primarily motivated by the exotic phenomenology as stated above. Concretely, this means that we do not have a strong theory prior on e.g. the number of colors and flavors, the mass hierarchies amongst the matter fields, or the possible patterns of flavor breaking. Out of the multiple choices at hand, however, it is nevertheless possible to try and build internally coherent models and develop tools to predict their phenomenology and guide the searches.
Despite the complexity and the challenges in analysing such non-Abelian new sectors, there has been an increased activity in the recent years that has focused on understanding the signature parameter space of such models, both on the theory and the experimental sides. This report presents some of these developments, particularly concentrating on jet-like signatures, and puts in perspective the efforts necessary to make systematic progress in understanding, classifying and searching for such non-Abelian scenarios. Throughout this report we will consider dark sector scenarios where the dark quarks are uncharged under any SM group and the dark sector communicates to the SM via an additional mediator.
The report is organised as follows. QCD-like dark-sector scenarios are first reviewed in Sect. 2, addressing the theories in the s-and t-channels, and the existing benchmarks and limits for such models. Section 3 will address two possibilities of dark sector beyond the QCD-like scenarios: soft unclustered energy patterns (SUEP) and glueballs. Section 4 will be devoted to simulation tool limitations and how to build consistent benchmarks from the underlying physical parameters for semi-visible jets. After a discussion of consistent parameter setting, some improvements to the pythia Hidden Valley module and their validation will be presented, followed by some phenomenological studies on the jet substructure effects of varying the physical parameters. Finally, a series of improved search strategies will be discussed in Sect. 5, based on event-level variables, deep neural networks, autoencoderbased anomaly detection or better triggering algorithms.

QCD-like scenarios of dark sector
In SM QCD, the strong coupling constant α s becomes weaker as the energy increases. This is known as asymptotic freedom. New non-Abelian sectors which display such asymptotic freedom fall into the category of QCD-like scenarios.
This section outlines possible exotic signatures that QCDlike dark sector scenarios may exhibit, including a discussion of benchmark models that have been or are currently being employed by the community. We also discuss existing limits in the context of these benchmarks. These signatures and results in turn provide a strong motivation behind detailed studies of such scenarios, motivating further theory effort, as will be discussed in Sect. 4.

Contributors: Timothy Cohen and Christiane Scherb
As an organizing principle, we will assume that the dark sector communicates with the Standard Model via a so-called portal. The Standard Model admits three renormalizable portals, in that there are three Standard Model gauge singlet operators with mass dimension less than four: the dark sector can couple to the field strength of the Hypercharge gauge boson B μν [49], to the Higgs bi-linear |H | 2 [50], or to the neutrino via H L [51]. In this paper, we will focus on introducing a new mediator particle that serves as the portal. This could be a Z which would mediate s-channel production [52][53][54][55][56][57][58][59][60][61][62][63], as was proposed in the original hidden valley paper [36], or it could be a new scalar bi-fundamental which would mediate t-channel production [20,21,[64][65][66][67]. In both of these cases, in the limit that the mediator mass is large, it may also be appropriate to integrate it out which would induce a contact operator. There are of course many options beyond these two examples, but to keep our scope finite we will only discuss these s-and t-channel production models.
Dark sector particles can then be produced at hadron colliders via the portal. Similar to SM quarks, dark quarks shower and hadronize and form dark jets. The properties of dark jets are determined by the dynamics of the dark sectors, namely the coupling strength, the ratio of unstable to stable dark hadrons inside the dark jets, and the mass scale of the dark hadrons. Production of dark sector particles at hadron colliders lead to a broad class of exotic signatures: Depending on the lifetime of the dark hadrons final states can contain semi-visible jets, lepton jets, emerging jets, soft bombs, quirks, etc. [1,2,9,63,[68][69][70][71][72][73][74][75].
Our focus will be on characterizing exotic signatures that could result at the LHC. The space of possible dark sector models is vast, and furthermore many models can yield essentially the same LHC phenomenology. For this reason, we work with a simplified model-like parameterization of the dark sector. The phenomenology can be largely determined by specifying the dynamics of the dark sector shower (the number of dark colors, dark quark flavors, and the dark confinement scale), the mass spectrum, and decay patters of the dark mesons. We will largely frame the phenomenological implications of having a strongly coupled dark sector in terms of these variables.

s-channel
As discussed above, we take as an organizing principle that the dark sector communicates with the visible sector via a portal. If the portal is heavy, then one can describe it by integrating out the mediator to obtain a so called contact operator [76][77][78], for example L ⊃ c i jαβ q i γ μ q j q Dα γ μ q Dβ , where q are SM fermions, q D are dark sector quarks, c i jαβ are O(1) couplings encoding a possible flavor structure, and is the scale of the operator. Generally we use Roman indices as SM flavor indices and Greek indices for the dark sector flavor indices. We are assuming that the portal couples the dark sector to the Standard Model quarks. Therefore, the observables of interest will be jets (which are expected to have non-QCDlike features) and missing energy that is likely to be aligned with the jets. One way to organize thinking about the possible signature space is in terms of the average fraction of invisible particles that are contained within a final state jet r inv ≡ #stable dark hadrons #dark hadrons .
We will present results in terms of this variable in what follows.
One option for UV completing Eq. 1 is to introduce a so-called s-channel mediator. In such models, pairs of dark quarks can be produced via a heavy resonance Z , that also couples to SM quarks via [52][53][54][55][56][57][58][59][60][61][62][63] L ⊃ −Z μ g q q i γ μ q i + g q D q Dα γ μ q Dα , where g q,q D are the respective coupling constants. In general, Z can also couple to other SM particles, which would lead to many possibilities in the final state. For concreteness here, we will focus on dark showers that result in SM jets + missing energy signatures. Therefore, we limit ourselves here to coupling the Z to quarks, see Fig. 1. We will also simply give the Z a mass, and will not worry about the associated Higgs mechanism or related effects. We will not discuss the additional particle content needed to cancel anomalies, nor the Z − Z mixing structure needed for g q D = g q of models with a heavy Z here (c.f. e.g. [79]), but will simply focus on the phenomenology. Heavy resonances Z are produced at a hadron collider in Drell Yan processes and will have a non-trivial branching ratio to decay to two dark quarks, which shower and hadronize in the dark sector. Then some of the dark sector hadrons are assumed to decay back to Standard Model quarks, which subsequently shower and hadronize as usual. Consequently, the phenomenology of s-channel models is governed by the following parameters: the Z mass m Z , its couplings to visible and dark quarks g q and g q D , the dark sector shower (governed by the number of dark colors, dark flavors, and the scale of dark sector confinement D ), the characteristic scale of the dark hadrons m D , and the average fraction of stable hadrons that are aligned with the visible jet r inv . While the coupling to SM quarks determines the Z production cross section, the other parameters determine the final state. The details of the shower and mass scale of the dark hadrons determine how many dark sector particles are produced, and r inv determines the amount of missing energy in a dark jet. Depending on these parameters several interesting signatures and search methods can be defined, e.g. semi-visible jets and searches for dark matter in the jet substructure.
There are two dominant strategies to search for the signatures of these models, that largely depend on the choice of r inv . When r inv is small, most of the final state associated with the resonance is visible, and so a normal bump hunt strategy can be employed. Then as r inv gets larger, it becomes advantageous to perform a bump hunt using the standard transverse mass variable M 2 . Then, once r inv approaches unity, the final state is essentially dominated by missing energy, and so a "mono-jet" style strategy is most sensitive. This is illustrated in Fig. 2 [63], where we show the projected limits on the s-channel model for these different strategies. This approach relies on very simple criteria, and so there is clearly much room for improvements that rely on additional characteristics of these models. Existing limits will be discussed in Sect. 2.3, while strategies for improvements will be addressed in Sect. 5.

t-channel
Another simple option for UV completing the contact operator is to introduce a so-called t-channel mediator. The tchannel UV completion is determined by the following inter- Fig. 2 Estimates of the projected limits for the s-channel model. The 'contact' limit uses a mono-jet search strategy, where φ is the angle between the missing transverse energy and the closest jet. Figure taken from [63] action [20,21,[64][65][66][67] which can be realized in either Minimal Flavor Violating models, where in addition to the SM flavor symmetry the dark flavor symmetry U D (3) of the dark quarks q D is introduced [80][81][82], or by enhancing the SM gauge group by a dark flavor symmetry SU D (N c D ) and introducing N f D dark quarks q D [1,21,83]. Then, the visible and dark sector communicate via a scalar bi-fundamental mediator charged under both the SM and the dark flavor symmetry and q R represent right-handed up-type and down-type quarks. We consider the case where is an SU (2)-singlet, and so it will not have any couplings to the left-handed quark doublets. (We note that generally SU (2)doublet mediators coupling to left-handed SM quark doublets are also possible.) Depending on the hypercharge of , the dark sector communicates with either the up-(Y = 1/3) or the down-type quarks (Y = −2/3). In the following we will always use N c D = N f D = 3. In addition, we assume m q D < D , so that the pseudo-Nambu-Goldstone bosons (we will denote them dark pions in the following) of the spontaneously broken dark chiral symmetry are parametrically lighter than other dark hadrons. Consequently, heavier dark sector states, e.g. heavier dark hadrons or glueballs, will decay into dark pions and the dark pions will govern the phenomenology of such models. It is also worth pointing out that for N f D > 3 an unbroken SU (N f D − 3) symmetry leads to one or more stable dark pion [83].
A particularly interesting feature of such models is the fact that the dark sector inherits the SM flavor structure via κ αβi j . The coupling κ αi can generally be expressed as and V and U Hermitian 3×3 matrices. For m Q αβ = δ αβ m Q αβ the resulting dark flavor symmetry U d (3) can be used to rotate V away. Finally, U can be decomposed into with U i j the rotational matrices for i j.
Here, we will focus on mediator pair production. The diagrams for this process are shown in Fig. 3. Both mediators decay subsequently to a visible and a dark quark, which undergo showering and hadronization, forming a SM and a dark jet. Heavier dark hadrons decay promptly into the lightest dark hadron, the dark pions. Dark pions decay back into SM particles. Depending on the lifetime of the dark pions three different final states are possible: • The dark pions decay promptly and the final states consists of four prompt jets, • The dark pions have intermediate lifetimes (cτ ∼ 0.001−1 m) and form emerging jets, • The dark pions are stable on collider scales and are recorded as missing energy.

Fig. 4
Emerging jet signature for t-channel dark QCD models in comparison to the displaced dijet signature. Figure taken from [1] While in the first and last case the final states consists of typical SM objects, emerging jets provide a very distinct signature: Each pion of a dark jet will decay at a different length due to the boost of each dark pion depending on its individual momentum and the exponential distribution of the actual decay points for a given lifetime. Therefore, from a radial perspective, a dark jet deposits very little energy at the interaction point and then emerges with every dark pion decay into visible particles. The topology of an emerging jet is shown in Fig. 4 in comparison to the displaced dijet signature.
The emerging jet signature was first studied in [1] for a dark sector coupling to right-handed down-type quarks and a first search for this signature was performed with CMS [88]. In [87] this search has been combined with recasts of four jet and two jet plus missing energy. (For more details on the recast c.f. Sect. 2.3.5.) It was found that the dark pion mass does not change these bounds in a significant way. The obtained limits can be shown in the usual dark matter massmediator mass frame. To do so assumptions about the ratio of the dark pion and dark matter candidate, here taken as the dark proton ( p D ), must been made.In Fig. 5 the exclusion limits, combined with the constraints from direct detection experiments are shown as a function of the dark matter mass and the mediator mass for m p D = 10m π D for two different choices of couplings: The left panel corresponds to κ = diag (1, 1, 1),
Due to the connection to the SM flavor structure such models also contribute to flavor processes such as neutral meson mixing and flavor violating Kaon B and D decays. The impact on flavor physics for dark sectors coupled to the down-type quarks has been studied in [80,82,83], and for couplings to up-type quarks in [81,82], as well as models with CP violation [89], and for a simplified model in [86]. In both cases the parameter space is largely unconstrained for dark pion masses above a few GeV. For the case of couplings to the up-type quarks this region could for example be probed via emerging jets from flavor violating top decays (c.f. [90]).

Contributors: Elias Bernreuther, Florian Eble, Alison Elliot, Giuliano Gustavino, Simon Knapen, Benedikt Maier, Kevin Pedro, Jessie Shelton, Daniel Stolarski
Several attempts have been done in order to parametrize QCD-like dark sector theories in the literature. These are partly motivated by observations of parametric relationships between the confinement scale and rho and pion masses in the SM, and partly by inferred relationships between e.g. quark masses and meson masses within the pythia 8 HV module. We list below some of the efforts in this context. It is important to note that these do not necessarily imply consistent UV and IR parameters of the underlying non-Abelian dynamics itself, however they have been useful in providing first phenomenological insight in the behaviour of such theories to guide the experiments.

CMS emerging jet search
The CMS emerging jet search [88] follows the class of models introduced in Ref. [1]. The specific process investigated is pp → , → qq D , depicted in Fig. 3, where is a bifundamental scalar mediator with Yukawa couplings κ αi between dark quarks q Dα and SM quarks q i , as shown in Eq. 4.
The parameters of these models are briefly summarized here: The first two parameters are inspired by the dark matter model from [21]. These two along with the next two are pythia parameters that control the shower but are not directly observable. The mass of ρ D is set such that ρ D → π D π D is kinematically allowed, and that decay will be dominant. The last three parameters in this list are treated as free parameters with their values varied as indicated.
The production cross section is controlled by m . The distinct emerging jets phenomenology is controlled by the dark pion decay length cτ π D , treated as a free parameter encapsulating variations of both the dark pion decay constant f π D and the Yukawa coupling κ αi : Because only pair production of is considered, κ αi does not influence the production cross section. The event generation and hadronization are done with the pythia 8.212 Hidden Valley module, modified to allow running of the dark coupling constant α D . Only the Yukawa couplings to down quarks are non-zero. The dark rho mesons decay promptly to pairs of dark pions with branching fraction 0.999 or directly to pairs of down quarks with branching fraction 0.001, while the dark pions decay exclusively to pairs of down quarks with decay length cτ π D . The parameter P ρ D , the probability of producing a vector rather than pseudoscalar meson during hadronization, is set to its default value of 0.75. Dark baryons are expected to be stable and can act as dark matter candidates as in the model of [21], but are expected to be produced rarely compared to dark mesons (∼10% for SM QCD [1]), so their presence is not simulated.

Flavored emerging jet model
The class of models described in Sect. 2.2.1 was extended to the case where the dark sector has a flavor structure related to SM QCD [83]. The result is multiple scenarios in which different dark mesons have different lifetimes, varying over a wide range of values. In particular, the following parameters are considered: The last condition implies that heavier dark mesons decay promptly to lighter dark mesons, so only the latter influence the final state kinematic behavior. The spectrum of lighter dark mesons can be understood in analogy to SM pions and kaons: π "0" D (q Dα q Dα ), π "±" D (q D1 q D2 , q D2 q D1 ), K "0" D (q D2 q D3 , q D3 q D2 ), K "±" D (q D1 q D3 , q D3 q D1 ); the quotation marks indicate that the superscripts do not represent actual charges. The mass splittings between these different dark meson species are taken to be negligible.
In the simplest version of this flavored model, called the "aligned" scenario, there is no mixing of neutral dark mesons. This scenario can be generated using a custom modification of pythia 8.230 1 that gives the correct proportions of the dark meson species listed above. The dark pion and kaon 1 https://github.com/kpedro88/pythia8/tree/emg/230. decay widths can be calculated as follows, for dark quark content q Dα q Dβ and SM quark products q i q j :

CMS semi-visible jet search
The CMS search for semi-visible jets [91] is based on the class of models introduced in Refs. [2,63]. The specific process investigated is pp → Z → q D q D , where Z is a leptophobic vector mediator with couplings to SM quarks g q and couplings to dark quarks g q D , as shown in Eq. 3. The parameters of the models used in the CMS search are summarized below: The last four parameters in this list are treated as free parameters with their values varied as indicated.
The unstable dark pions, as pseudoscalars, decay via a mass insertion preferentially to the most massive allowed SM quark species (bottom quarks unless m π D < 2m b ). The branching fractions for the mass insertion decays are calculated with quark mass running included. The unstable dark rho mesons, as vectors, decay democratically to pairs of any allowed SM quark. All decays to SM quarks are assumed to be prompt.
The treatment of the dark coupling constant α D , or equivalently the dark scale D , follows a relationship that is derived based on the behavior shown in Fig. 6. The former can be expressed in terms of the latter: The benchmark value for the scale is chosen according to the empirical fit peak D = 3.2 (m π D ) 0.8 . This maximizes the production of dark hadrons in the dark showers, because the effect of D depends on the dark hadron mass m π D . The corresponding value α peak D is then varied by ±50% to give α low D , α high D . It is important to note that the results generated from pythia for D < m q D may not be physically accurate. This is discussed further in Sect. 4.1. The event generation and hadronization are done with the pythia 8.230 Hidden Valley module. The invisible fraction r inv is implemented by reducing the branching fractions of dark hadrons to SM quarks, so the "decay" of a dark hadron to invisible particles represents a stable dark hadron. A Z 2 symmetry filter is enforced to reject events with an odd number of stable dark hadrons, as they would always be produced in pairs in a complete model. The exact pythia settings used in Ref. [88] can be found on HEPData [92]. Studies are in progress to extend the CMS semi-visible jet program to low-mass boosted resonances and t-channel production via the bifundamental scalar mediator . To generate these processes, depicted in Fig. 7, MadGraph5 2.6.5 is used. (MadGraph5 can also generate the processes in Figs. 1 and 3.) The FeynRules definitions are obtained from Ref. [93], associated with Ref. [63]. The pythia Hidden Valley module is still used for hadronization. To obtain accurate results, several additional steps are needed:

Further semi-visible jet models under study by CMS
• Ensure that mediator decay widths are properly computed (set param_card decay [id] auto) • Increase the number of events when making gridpacks from 2000 to 10,000 (to overcome instability in t-channel phase space integration) • Convert PDG IDs to pythia conventions in LHE output.
Both MadGraph5 and pythia have some limitations for handling dark shower generation and hadronization. For pythia, some useful future additions would be: • Add processes like qq → q D → q D qq D , gg → q D q D (non-resonant) that are currently only available via FeynRules, to facilitate generator-level studies.
• Include more theory uncertainties as event weights: PDF variations, renormalization and factorization scale variations, and Hidden Valley-specific parameters such as those that control the hadronization models. • Allow user control over the dark hadron spectrum for studies of flavored models (Sect. 2.2.2, Ref. [83]). • Add dark baryons for completeness (currently only dark mesons are explicitly produced).
Updates to the pythia Hidden Valley module made in the context of this Snowmass project and described in Sect. 4.2, are intended to address the last two points. For MadGraph5, some useful updates and improvements would include: • Better fixes for the items mentioned in the previous paragraph. • Central support for common processes such as t-channel production, to reduce the need for manual FeynRules implementations. • Ability to add new SU (N ) gauge groups in a complete and consistent way, in order to model Hidden Valley radiation explicitly. A Hidden Valley jet matching procedure would also be needed to avoid overlap with radiation from pythia during the hadronization process.

Semi-visible jet models under study by ATLAS
The semi-visible jet models under study are based on those introduced in Refs. [2,63]; the fully visible jet models described in [72] are also considered. The Z and portals introduced in Sects. 2.1.1 and 2.1.2 are considered, with the following parameter settings: The r inv and m Z parameters are varied through the values as displayed in the above list. The α D coupling is chosen to be a running constant as it is in SM QCD.
The generated models differ between s-channel and tchannel in terms of the jet matching applied. For the tchannel, an MLM matching scheme is used, while for the s-channel, a CKKM-L matching has been used.
The semi-visible jet models are generated in Mad-Graph5, and pythia is used for showering. Fo the s-channel, MadGraph5 2.9.3 and pythia 8.245 are used. For the tchannel, MadGraph5 2.8.1 and pythia 8.244 are used. The fully visible jet models are entirely generated with pythia 8.

Aachen model
This model was introduced in Ref. [61] and subsequently used in Refs. [73,94]. It is designed to satisfy cosmological constraints and reproduce the observed dark matter relic abundance. The dark sector is connected to the SM by a heavy vector mediator Z arising from a new U (1) gauge group.
The baseline parameters of this model are summarized below: The particle masses m q D , m Z , m π D , and m ρ D are treated as free parameters that are set to the benchmark values indicated above. In the benchmark model, only the ρ 0 D mesons decay (where 0 indicates the U (1) charge), while all other dark mesons are stable on the scale of the detector. Dark baryons and other dark bound states, such as dark eta and dark omega mesons, are assumed to be too heavy to be produced frequently. The study in Ref. [73] includes variations m π D = m ρ D = 5−20 GeV and r inv = 0.1−0.9. MadGraph5 2.6.4 with FeynRules 2.3.13 is used to generate the studied events, with pythia 8.240 used for hadronization. For the benchmark case with r inv = 0.75, the pythia Hidden Valley default settings are modified to P ρ D = 0.5 in order to obtain the correct proportion of unstable dark mesons. When r inv is varied, it is implemented in the same manner described in Sect. 2.2.3. In this case, all unstable dark hadrons are assumed to decay democratically to pairs of u, d, s, or c quarks. (The assumption that only ρ 0 D mesons decay is relaxed.)

Decay portals
Hidden valley models differ from most other models in two crucial aspects. The first is the lack of a very well-defined theory prior on what the model should look like, as already mentioned earlier in this document. The second complication is that, even if we somehow had a strongly preferred model, it would still be very challenging to extract accurate predictions for all observables due to the non-perturbative dynamics in the dark sector.
Both these challenges indicate that we may be best served by a suite of searches that is as model-independent as possible. This is easier said than done however, and some theory priors are always needed to design an experimental analysis, especially in the initial phases of this program. Reference [95] has advocated to inject these theory priors in the decay portals that allow the dark sector to decay back to the Standard Model. This approach has the following advantages: 1. The number of plausible options is relatively limited once one restricts to the set of decay portals that do not introduce dangerous flavor-changing neutral currents. A systematic survey is therefore very feasible. 2. We have good theoretical control over these decay portals, in terms of both the dark particle's decay length and its allowed branching ratios. This is in contrast to the process of dark sector hadronization, where we must resort to parameterizing our ignorance as best we can. 3. If one moreover insists on a relatively minimal UV completion and/or no more than moderate fine tuning, one can moreover derive an approximate lower bound on the lifetime of the dark mesons. This gives the models a little bit more predictive power. The resulting lifetimes and branching ratios of the visibly-decaying dark particle are crucial for the design of experimental search strategies and a systematic survey of models featuring a minimal suite of decay portals can therefore be a good starting point for a comprehensive experimental search program.
The portals considered in Ref. [95] are summarized in Table 1. To maintain compatibility with the pythia Hidden Valley module at the time, a simplified dark sector was con- Table 1 Overview of the decay portals considered in [95]. The η D and ω D represent respectively the lightest spin-zero and spin-one meson in the dark sector, and F μν is the field strength for an elementary dark photon A . The decay portal column indicates the operator(s) that allow the unstable dark meson to decay, defining the model

Decay portal
Decay operator 8 Approximate lower bound on the proper lifetime of the visibly decaying particle (VDP) in the dark sector, for the decay portals in Table 1 sidered with a single (pseudo)scalar η D and vector ω D meson. For the dark photon portal an additional elementary dark photon A was added. For each of these portals the branching ratios and lower bound on the lifetime of the visibly decaying particle were computed (see Fig. 8).
For the gluon portal one assumes that the dark sector pseudoscalar meson (η D ) decays through a dimension 5 coupling to the Standard Model gluons, leading to very hadronrich final states. This coupling requires a low-scale UV completion, especially because η D itself is a composite state and the η D GG coupling should be suppressed by dimensional transmutation in models with perturbative parton showers. Such a UV completion moreover requires the presence of new, colored particles, which could be produced at the LHC. The bound in Fig. 8 was obtained by assuming that any such states must have a mass 2 TeV. This assumption can be relaxed by devising an elaborate extension of the model to hide the new colored particles from existing searches at ATLAS and CMS.
The photon portal works in the same way as the gluon portal, except that the η D couples to the Standard Model photons instead of the gluons. This portal leads to very photonrich final states. Its lifetime bound is informed by collider constraints on new charged particles, which are required in the UV completion of this portal.
The vector portal assumes that the Standard Model photon mixes with the dark vector meson ω D , similar to the photon-ρ mixing in the Standard Model. In this portal it was assumed that the η D was absolutely stable, leading to a semivisible jet phenomenology. The UV completion of this portal requires the introduction of an additional, elementary Z vector field. The lifetime bound in Fig. 8 was informed by the bounds on the mass and coupling of the Z from direct searches and electroweak precision observables.
In the Higgs portal scenario, the η D decays by mixing with the Standard Model Higgs, leading to a heavy flavorrich phenomenology. In this scenario, no new particles are needed in the UV completion. The H † H operator does however contribute directly to the masses of the constituents of the η D mesons, which leads to the lower bound in Fig. 8. This bound can be evaded by fine-tuning the masses of the constituents of the η D .
Finally, the dark photon portal is inspired by the Standard Model η → γ γ decay, as it assumes a light, elementary vector field (A ) which couples to the η D through a chiral anomaly. The A itself can then mix with the Standard Model photon, resulting fairly lepton-rich final states. With this portal, the lifetime constraint is very mild and comes exclusively from direct searches for the A itself.
All assumptions and branching ratios are encoded in the public python script https://gitlab.com/simonknapen/dark_ showers_tool which can generate pythia configuration cards for all five decay portals.

Contributors: Giuliano Gustavino, Steven Lowette, Kevin Pedro, Pedro Schwaller, Andrii Usachov, Carlos Vázquez Sierra
The discussion so far has not only established that QCDlike dark sectors are theoretically interesting, but that they can also lead to exotic signatures for the experiments. In light of this observation, several searches at the LHC have already been carried out, with many other studies also underway. The search results are currently reported using one or more of the benchmarks discussed in Sect. 2.2. In this section, we illustrate some of the public search results, and existing constraints on the theory landscape. Given that the searches rely on generic signal characteristics, it may be possible to reinterpret the results in terms of other, UV/IR coherent models.

ATLAS search program
While no direct constraints have been published so far by the ATLAS Collaboration on these models, a broad set of semivisible jet scenarios in both t-and s-channel production pro-cesses are being studied by the collaboration as mentioned in the previous section. Besides possible dedicated searches, the recasting of previously published results in other channels might prove useful in constraining the parameter space. Indeed, existing exclusion limits obtained in analyses looking at di-jet [96] and E T +jet [97] final states should already be able to constrain a phase-space predicted by dark QCD models in different r inv ranges. Furthermore, some studies also focus on scenarios with emerging jets, where dark hadrons have a non-negligible lifetime.
The search requires four highp T jets and triggers on such events using the scalar sum of their momenta, H T . Per-jet quantities indicating displaced tracks -including the 2D impact parameter, the 3D impact parameter significance, and the fraction of the p T from prompt tracks associated with the primary vertex -are used to identify or "tag" jets as emerging. The signal region definition requires either two jets tagged as emerging, or one jet tagged as emerging along with substantial missing transverse momentum. The latter option increases sensitivity to models with larger cτ π D values where many dark hadrons decay outside of the detector. The misidentification rate for this tagging procedure is measured and used to estimate the QCD multijet background. Heavy flavor jets from B hadrons are found to be misidentified more frequently, as expected because of their non-negligible decay lengths. Multiple signal regions with different selection requirements are defined, and for each model, the signal region with the highest expected sensitivity is used.
The search results are shown in Fig. 9. Using a 13 TeV dataset with 16.1 fb −1 of integrated luminosity, no significant excess above the SM prediction is observed. This search excludes models with 400 < m < 1250 GeV for 5 < cτ π D < 255 mm at 95% confidence level (CL). It is found that the limits are similar over the range of m π D values explored. Work is ongoing to incorporate the remainder of the LHC Run 2 dataset, up to 138 fb −1 , and to improve the sensitivity to other models, such as Ref. [83], which is summarized in Sect. 2.2.2.

CMS search for semi-visible jets
The CMS search for semi-visible jets [91] is based on the class of models introduced in Refs. [2,63], as described in Sect. 2.2.3.
The search requires two highp T wide jets and triggers on the jet p T and the H T . This dijet system is com- Fig. 9 95% CL limits on the cross section for pair production of leading to emerging jets, in the plane of cτ πD versus m (here called cτ πDK and m XDK ) where m πD (here called m πDK ) is set to 5 GeV. Reproduced from Ref. [88] bined with the missing transverse momentum to compute the transverse mass M T , which has a falling spectrum for the SM backgrounds, while the signal has a kinematic edge at the Z mediator mass. The QCD multijet background is rejected by requiring high values of the transverse ratio R T = E T /M T , while the electroweak backgrounds (tt, W(→ ν)+jets, Z(→ νν)+jets) are rejected by vetoing identified and isolated leptons (e, μ) and requiring a small minimum angle between the jets and the E T . Various sources of instrumental background and misreconstruction, which introduce artificial E T , are also rejected. Two signal regions are defined in terms of the R T variable to provide additional sensitivity.
The full 13 TeV dataset of 138 fb −1 is analyzed and no indication of a resonance in the M T spectrum is observed. Both model-independent and model-dependent results are obtained, as shown in Fig. 10. The former excludes models with 1.5 < m Z < 4.0 TeV and 0.07 < r inv < 0.53 at 95% CL, depending on the other model parameters. The latter uses a boosted decision tree (BDT) that combines jet substructure variables to tag jets as semi-visible. It extends the exclusions to 1.5 < m Z < 5.1 TeV and 0.01 < r inv < 0.77 for the specific models described in Sect. 2.2.3 that were used to train the BDT.

CMS search for SIMPs as a link to signatures of trackless jets
Dark sector models could give rise to experimental signatures where jets are formed with visible regular hadrons arising from decays of hidden-sector particles that are long-lived, and thus make these hadrons appear displaced within the jet. With sufficient displacement, jets can arise in which none of the tracks of the constituent charged hadrons can be reconstructed, thus making the jets appear neutral. Such neutral jets are extremely rare among high-momentum jets from regular standard-model quarks or gluons, and can thus be a very sensitive probe of physics beyond the standard model from dark sectors. The CMS collaboration has performed a search for a pair of such trackless jets [98] using 16.1 fb −1 of integrated luminosity recorded in 2016. The signature probed consisted of a pair of back-to-back high-momentum trackless jets, where experimentally the trackless nature was sought by looking for the ratio of the jet energy carried by charged particles to the energy carried by neutral particles to be less than 5%. To illustrate the effectiveness of this requirement in suppressing standard model QCD background jets, a background rejection of over 10 5 in data was reported for this 5% requirement.
This CMS search for trackless jets was inspired by and interpreted in a model proposing a new interaction through a low-mass mediator with a new dark matter fermion [3]. The interaction leads to very high interaction cross sections, which are not necessarily excluded, though many model assumptions need to be made to avoid the many cosmology, particle physics and astrophysical constraints. The model considered is not a dark sector model per se, as indeed there is no decay back from a dark sector leading to missing charged hadrons, but rather the jets constitute solely of SIMP particles interacting in the calorimeters, thus generating neutral jets. The aspect of a displaced decay is thus missing, though the similar experimental signature does potentially impact dark sector searches.
The interpretation of the CMS trackless jets search in the SIMPs model has been found to be difficult at large SIMP masses, above ∼100 GeV. At such high masses, the modeling of the SIMP-nucleon interaction is complicated by the SIMP mass, and the approximation used in the CMS analysis, which involved treating the SIMP in the Geant simulation as a massive neutron-like object, becomes exploratory. As such, the strong exclusion limits on the SIMP pair production cross section as a function of the SIMP mass, obtained by CMS in absence of an excess of trackless jets in the analyzes data, shown in Fig. 11, are reported with this caveat above 100 GeV.

Collider constraints on t-channel models
A plethora of new physics searches are ongoing at the LHC and have pushed the limits on the masses of new particles above the TeV scale in many cases. While dark showers are a spectacular and unique signature, existing new physics searches still retain some sensitivity in regions of parameter space where the signal looks SM-like. For t-channel dark sectors [1,21,83], often the leading production mode is pair production of the heavy mediator, which gives rise to a signature with two ordinary QCD jets and two dark showers, as shown in Fig. 3. It is clear that missing energy searches should become efficient in the limit where the dark sector particles become very long lived, while searches for prompt multijet signals can probe the regime of very short lifetimes. Due to the stochastic nature of particle decays, these searches also retain some sensitivity in the intermediate lifetime regime.
A recast of a di-jet plus MET search [99], a search for paired prompt di-jet resonances [100] and of the dedicated Constraints on the mass of a t-channel mediator (here called X ) as a function of the dark sector particle lifetime. See text for details. Figure taken from [87] emerging jets search [88] was performed in Ref. [87]. As can be seen in Fig. 12, the dedicated emerging jets search performs best in the intermediate lifetime regime, while the recast searches can probe all the remaining range of lifetimes. As expected, mediator masses below the TeV scale are already strongly constrained. In the short lifetime regime, the constraints are weaker. This is partially because the published search uses only a limited amount of data, but also because fighting the QCD multi-jet background is hard. Another option to constrain the t-channel mediators in the regime of short dark sector lifetimes is through their contribution to angular correlations in dijet events [101], when the mediator is exchanged in the t-channel instead. This constraint however will depend on the magnitude of the Yukawa coupling of the mediator.
The t-channel mediators naturally allow couplings that break SM flavor symmetries. While most flavor violating couplings are constrained by low-energy flavor observables [83,86], the LHC has the potential to constrain flavor changing couplings of the top quark [90]. Such scenarios could also be of interest for searches for dark showers produced in association with tops or even stemming from exotic top decays.

Existing constraints and projections from LHCb
The LHCb experiment, originally designed for heavy-flavor physics, has shown its potential as a general purpose detector in the recent years. An excellent secondary vertex resolution, low momentum thresholds and particle identification capabilities make LHCb a natural candidate to search for dark QCD signatures in the low-mass region. These searches are now becoming part of the LHCb physics program, where world-leading constraints have already been set for hidden valley scenarios, with reinterpretations in other dark sector models as well as a large number of very encouraging sensitivity projections, described in the following paragraphs.
LHCb has published a search for low-mass resonances decaying into pairs of muons, using 5.1 fb −1 of data collected at 13 TeV [102]. In this article, a model-independent search of both prompt and displaced resonances X is performed. For the displaced case, the secondary X → μ + μ − decay vertex is required to be transversely displaced from the primary vertex in the range 12 < ρ T < 30 mm, allowing for the resonance to become a long-lived signature in the detector. Then, limits on the cross-section σ (X → μ + μ − ) are placed, and interpreted in various production models. One of these models in the regard of dark QCD is that of a hidden valley scenario, where constraints are set on the kinetic mixing strength γ − Z H V between a heavy hidden valley boson Z H V with photon-like couplings, and a photon, fixing the average multiplicity of hidden valley hadrons to a value of 10. These are the most stringent constraints placed up to date, for masses of a composite hidden valley vector boson, X , up to 3 GeV, as presented in Fig. 13.
These results have been also re-interpreted in the context of Z-initiated dark showers, assuming various benchmark scenarios [103]. In Fig. 14, projections for one of the scenarios are shown, showing the capabilities of LHCb to probe Z branching fractions down to 10 −7 during the high-luminosity phase.
In a more general sense, the capabilities of LHCb to probe other dark QCD models are summarized in Ref. [104], in the context of benchmark scenarios featuring a range of dark hadron and mediator masses, for different assumptions on the average dark hadron multiplicity in the dark sector. The projections described in this major report show the outstanding potential of the LHCb experiment to place very stringent constraints in the low-mass range, in complementarity with ATLAS and CMS.

Contributors: Cari Cesarotti, Carlos Erice, Karri Folan DiPetrillo, Chad Freer, Luca Lavezzo, Christos Papageorgakis, Christoph Paus, Matt Strassler
Dark showers produced in hidden valley models need not result in collimated jets like in SM QCD. Events with spherically-symmetric, large multiplicities of low momentum charged particles, are also a possible phenomenology of strongly-coupled hidden valley models. This section discusses the motivation for these so called soft-unclusteredenergy patterns (SUEPs), tools available for simulation, and typical phenomenology. Experimental challenges for SUEP searches at LHC general purpose detectors are also summarized.

Theoretical motivation
Although the production of quarks in a QCD-like confining theory leads inevitably to jets of hadrons, the details are not always the same. The width of the jets, the hadron multiplicity per jet, and the jet multiplicity all depend on the value of the running coupling. More specifically, jets arise from the partonic shower, which depends on the running 't Hooft coupling λ ≡ α D N c D (α s N c in QCD), evaluated at and somewhat below the energy scale at which the jets are produced.
In QCD-like theories, like in QCD, jets at lower energy, closer to the confinement scale and thus at larger λ, are broader, because gluon radiation at larger angles is more common with the large coupling. One might then imagine that if λ could somehow be taken large without reducing the energy of the collisions, then the jets produced might become so broad and numerous as to blur together, creating a smooth distribution and a non-jetty final state. In QCD-like theories this question is almost academic, since λ only becomes large near the confinement scale. In e + e − collisions near the confinement scale, only a small handful of hadrons are produced, making it hard to define jets in the first place. Jets were only identified in e + e − collisions at scales ∼ 10 GeV, where λ < 1, and gluon jets only at 27 GeV.
However, in 1998 it was shown [105] that certain classes of supersymmetric conformal field theories (roughly speaking, these are theories with whose beta functions are all zero and are thus scale-invariant) are equivalent to string theories on certain curved spaces. These theories can have arbitrary values of α D (now constant) and N c D . The duality can be used to compute many of properties of these theories when Fig. 14 Projections at 90% CL of LHCb sensitivity to a muon rich dark showers initiated by the decay of a Z into dark quarks (here called ψ ). In this benchmark scenario, a dark pion mass (here calledπ ) of 650 MeV, an average multiplicity of 7 and a branching fraction of the dark pion into two muons of 96%, are assumed. More details can be found in Ref. [103] α D N c D 1 α D . It was noted in [106] that rapid pdf evolution in this regime leads to an absence of hard partons inside hadrons. Since similar dynamics controls jet evolution, this naturally suggests that jets will be absent in this regime as well. Several groups [107][108][109] argued that there should be no jets in this limit, and it was proven convincingly in [109] that the correlation function of energy operators indicates that a partonic shower in this regime, allowed to proceed over a wide range of energies, will approach a spherically symmetric distribution on average. The distribution of momenta of these partons is not determined, however.
Note, importantly, that a spherically symmetric shower is not a consequence of conformal symmetry. A conformally invariant theory at small λ will exhibit jets much like QCD itself; indeed, QCD at high energy is nearly conformally invariant, with scaling violated only by small logs (a fact which played an important role in the discovery of quarks.) Only at when λ 1 α D are roughly spherical showers expected, and corrections to the spherical shape are of order 1/ √ λ. Thus, for events that typically differ from spherical by < 10%, one probably must have N c D 100. A conformally-invariant hidden sector is generally observable only as E T (except for small rare processes discussed in the unparticle literature) as the energy will be shared down to massless partons. Interesting hidden valley signatures arise only if the conformal invariance is broken at some scale Q c much lower than the production scale M. The shower that follows production of hidden partons is converted at Q c to a large number of hidden particles of small mass m D Q c . If some of these are able to decay to the SM, then this can lead to a signature of many particles which are roughly spherically distributed in some frame of reference, not necessarily the lab frame [9,107]. This signature is defined as a Soft Uncorrelated Energy Pattern, or SUEP.
We note that there have been disputes in the theoretical literature concerning whether the SUEP arising in this context is related directly to cascades of 5d KK states, with different points of view taken by [68,110] versus [107,111]. From the phenomenological point of view this may not matter very much in the near term, but the issue could arise if in future KK-cascade-based generators are created and assumed (perhaps incorrectly) to describe the same phenomenon as SUEP.
If the conformal symmetry breaking involves the complete Higgsing of a gauge group, it is clear that a near-spherical shower leads to a near-spherical SUEP. If the conformal symmetry breaking involves confinement, this is less clear and has not been proven; it may be that it is somewhat modeldependent. We will nevertheless assume it is the case, and that SUEPs can arise from the SM decay products of dark hadrons produced from a spherical shower of dark gluons.
We will further assume that those dark hadrons can resemble those from a QCD-like spectrum, with low-lying spin-0 and spin-1 mesons and with limited roles for heavier mesons. Baryons should have mass of order N c D 1 and will play no role.
Since techniques to calculate SUEPs, even in the few models where they are known to occur, do not yet exist, the simulations [9,112] to study them are necessarily somewhat ad hoc. The assumption is made that particles are produced spherically according to a thermal distribution with an unknown temperature T D of order Q c , the conformal breaking scale; see Sect. 3.1.2. (Note m D , the typical hadron mass, may be of order Q c or may be much smaller, especially for pion-like states; in the latter case, multiplicities may be surprisingly low.) The value of T D /Q c should be varied and treated as a quantifiable uncertainty.
Less clear is how to treat the uncertainties of the thermal approximation in the first place. A thermal model for hadronization in QCD works moderately well; perhaps this reflects the tendency for complex statistical systems to approach thermal ones. However, there is no proof that it would work for all or even most confining theories. One way to approach the uncertainty might be to vary the thermal spectrum in one or another way, using insights from studies of how systems equilibrate.
Finally, an uncertainty arises from the fact that in realistic theories the spherical approximation will suffer corrections, and at the current time those corrections have not been characterized theoretically nor incorporated into the simulation tools. Some effort to quantify this uncertainty ought to be undertaken.
Since SUEPs and the simulation packages used to simulate them represent a certain idealized situation, real signatures may differ from this idealized model. In this regard, the following should probably be kept in mind: • Only a small number of theories of this class have been proven to exist, all of them supersymmetric and with N f D N c D , where N f D is the number of quarks in the fundamental representation.
• The statement that expected distributions are spherical receives substantial corrections, of order 1/ √ λ at the confinement scale. For an observably spherical SUEP, one may need N c D ∼ 100. Because the particles of the hidden sector can appear in loops, coupling a mediator to a theory with N c D ∼ 100 or more can have large consequences for the mediator or even the Standard Model sector; care must be taken in defining a consistent model.
• Event-to-event fluctuations can lead to large deviations from the SUEP idealization. When the multiplicity of visibly-decaying hadrons is low, Poisson fluctuations are large. For instance, if sixty dark hadrons are produced near-spherically but only ten decay visibly, the observed event will be far from spherical and potentially very asymmetric.

Simulation tools
In order to search for these novel signatures at high energy colliders, it is essential to generate events that capture the phenomenological characteristics of the energy pattern. Perturbation theory breaks down in the large coupling regime, and standard approaches to event simulation are unreliable. Novel methods are necessary to generate SUEP events. Several simulation methods that can generate quasispherical energy patterns exist, such as black hole generators [113][114][115] or simplified 5d models [111]. However, these tools are not ideal for developing new analyses or triggering strategies. LHC experiments already have extensive search programs for black holes [116][117][118][119][120]. The latter tool does not have an obvious portal to connect to the SM. We will therefore discuss the utility of another tool in this section: the SUEP generator [112].
The simplified model used by this generator is described in Ref. [9]. In this framework, a hidden valley (HV) of new physics with confining dynamics is accessed via a heavy scalar mediator. A wide class of mediators can be used to connect the SM and HV, but a scalar portal from gluon fusion was chosen to both explore a triggering 'nightmare scenario' as well as study a potential rare Higgs decay mode. The scalar then decays into a high multiplicity of light HV mesons of a single flavor, of mass m D , that follow a Maxwell-Boltzmann distribution An illustration of this process is shown in Fig. 15. The user is free to select the temperature, T D , and m D , but for reasonable results the two parameters should satisfy T D /m D > 1 such that the final state is high multiplicity. Note that in a confining theory like QCD, the mass of the lightest meson can be much less than the confinement scale D ∼ T D , so T D m D is a motivated and physical choice. For a fixed scalar and meson mass, a higher temperature will correspond to fewer particles with a more significant boost. The tool is only intended to work in the high multiplicity limit, which means that the user must set m S T D , m D for consistent results. A numerically small amount of momentum conservation violation may be observed in some events.
After production and showering, the dark mesons must decay back to SM for a visible signal. In this benchmark model, it is assumed that the dark mesons couple to a new U(1) gauge boson A γ that kinetically mixes with SM hypercharge. The dark mesons decay into a A γ A γ pair. Each A γ then decays into SM particles, for example dilepton (e + e − Fig. 15 A visualization of a generic SUEP event. A heavy scalar S accesses a confining HV and showers into light mesons (e.g. π D , here denoted φ) before decaying to SM particles. Figure adapted from Ref. [9] and μ + μ − ) or hadronized final states. The A γ mass and branching ratio to Standard Model particles is configurable by the user.
This tool has the potential to inform and test diverse future analysis techniques. The phenomenology of the model depends on the choices of scalar mediator mass m S , the dark meson mass m D and temperature T D , as well as the decay branching ratios of the new particles. Depending on the signal of interest, the user can configure these values to achieve different multiplicity or species final states. Simple extensions of the package could include different possible mediators and decay portals.

Phenomenology
The classification of 'SUEP' is on the final state signature rather than a specific type of model. A SUEP (previously called 'soft bomb' [9]) is usually an event with a high multiplicity of soft particles distributed quasi-isotropically in their rest frame. The underlying physics that produces such events can be varied. As discussed earlier, if the new particles interact strongly over a wide energy window, the shower develops by soft and isotropic emissions [36,106,109]. However, SUEPs can also develop from kinematics due to phase space arguments. If a new physics model includes many unstable particles with small mass splittings [110,111], subsequent decays are unboosted and can approximate a spherical energy distribution for sufficiently large particle multiplicity. A common example of such event is R-parity violating SUSY [121,122].
Since many different new physics scenarios can produce a SUEP signature, it is compelling to generically search for such events at colliders. As their common underlying feature is their global radiation pattern, event shape observables can serve as useful analysis tools. By studying the global event shape, it is both possible to quantify new physics and distinguish signal from background. Event shape observables have been used to study QCD and measure α s [123][124][125][126][127][128][129][130][131][132][133][134][135][136][137]. There have been several observables developed to quantify the degree to which a collider event is isotropic versus jet-like, including thrust [123,138,139], sphericity [140,141], spherocity [142], and the C-and Dparameters [143][144][145]. While these observables have provided indispensable insights to QCD, they are most sensitive to deviations from dijet events rather than a robust probe of isotropy. To capture the phenomenology of a SUEP event it is necessary to define new observables which probe the opposite regime.
A new observable called event isotropy aims to study deviations from truly isotropic events [146]. This observable is defined using the Energy Mover's Distance (EMD) [147,148], the particle physics application of the Earth Mover's Distance [149][150][151][152][153]. Event isotropy quantifies how 'far' an event is from isotropic, with smaller values indicating an event is more isotropic. Figure 16 shows event isotropy for SUEP benchmark models with different mediator masses and temperatures. Figure 17 shows results from a related study that demonstrates how event isotropy is not strongly correlated with final state multiplicity, or reconstructed number of jets. Additionally, it correlates with canonical event shape observables much less than they correlate with each other in the quasiisotropic regime. While there can be correlation with traditional event shapes, both types of observables can be used to better characterize the underlying physics.

Experimental aspects
The diffuse low momentum nature of SUEP events strongly resembles soft-QCD backgrounds at the LHC. Without high momentum final state particles, SUEP events with allhadronic final states can easily be mistaken for pile-up collisions, and pose extreme challenges for general purpose detectors such as ATLAS and CMS.
The first and most challenging step for any SUEP analysis is to identify a trigger strategy. The trigger systems of ATLAS and CMS operate in a two-step process to determine which events are saved for analysis. The first stage, Level 1, makes a fast decision incorporating coarse calorimeter and muon information. The second stage, the High Level Trigger (HLT), makes use of refined calorimeter and muon information and adds limited tracking. SUEP events typically have low efficiency for traditional triggers, which are designed to reject pile-up. However, there remain several possible analysis strategies utilizing data already collected during Run 2 of the LHC.
In order to characterize the difficulty of observing lower mass mediators, several benchmarks points are used: medi- Correlations between event isotropy and other canonical event shape observables calculated on quasi-isotropic events generated using the framework of Ref. [111], such as particle multiplicity (left top), jet multiplicity (right top), and the maximum eigenvalue of the sphericity tensor (left bottom), which is closely related to the C and D parameter in quasi-isotropic radiation patterns. For reference, we also include the correlation plot of the same samples in thrust and maximum eigenvalue (bottom right). The contours enclose 99% of the events for all of the samples. Figures adapted from [154]  Here, the choice of the 125 GeV benchmark is motivated by the observed Higgs boson, which may itself serve as a portal to the hidden sector. The mediator is assumed to be produced via gluon fusion (GF). For the Higgs portal benchmark, associated production (ZH) also considered, where the Z boson decays leptonically. All signals are produced assuming dark mesons have a mass of 2 GeV, and the system has a temperature of 2 GeV. These dark mesons subsequently decay into a pair of dark photons. Multiple dark photon decays are explored for GF production using dark photon branching ratios; Trigger strategies based on the scalar sum of hadronic activity (H T ) in the event can be used to target mediators produced via GF. In order to prevent extremely high background rates from QCD, ATLAS and CMS typically required H T > 500 GeV at Level 1, and H T > 1 TeV at HLT throughout Run 2. The vast majority of signal events which pass these requirements involve a mediator recoiling against an initial state radiation jet with high p T . Trigger efficiency significantly decreases as the mediator mass decreases and the total energy deposition decreases. Figure 18 shows the H T distribution as well as the number of tracks for different SUEP mediator masses.
pending on the production mechanism and branching fraction of the dark photon, SUEP events may contain multiple final state muons. In these cases, muon triggers designed to target vector boson or b-physics processes can provide much higher trigger efficiency than H T triggers, especially for lower mass mediators. Figure 19 shows the number of muons and the leading muon p T for several Higgs portal SUEP scenarios. Signals with larger leptonic branching ratios result in large multiplicities of moderate momentum muons, and can be targeted with tri-muon triggers. Leptons produced via associated production can be targeted with single and double-muon triggers.
The upcoming Run 3 at the LHC offers an opportunity to design new triggers that specifically target SUEP signatures. One possibility is to use a standard H T or multi-jet trigger at Level 1, with a large multiplicity of tracks in the High Level Trigger. SUEP events are likely to be boosted after passing the Level 1 trigger, with SUEP decay products recoiling against Standard Model jets from initial state radiation. Figure 20 shows how events become less isotropic with increasing p T requirements. As a result, using event shape observables would be suboptimal at HLT. In contrast, it is possible to choose an HLT track multiplicity requirement that is highly efficient for SUEP events, and also reduces backgrounds such that the H T requirement can be kept as low as Level 1 thresholds.
A trigger which requires H T > 500 GeV and at least 150 tracks per event would yield a QCD efficiency one order of magnitude smaller than the currently employed H T > 1050 GeV trigger, while recovering nearly all the SUEP events that already pass the H T > 500 GeV selection. Note that this study assumes tracks can be reconstructed with nearly full eff iciency and associated to the primary vertex. The muon with the highest p T is shown on the right for different SUEP production mechanisms and dark photon decay branching fractions If track reconstruction is too computationally expensive to run in the High Level trigger or has sub-optimal performance, it is possible to design a trigger which counts the number of hits in the innermost layers of the ATLAS and CMS tracking detectors. This approach was studied in Ref. [9]. In addition to being less computationally expensive, hit counting is sensitive to even the softest particles in SUEP events. Charged particles with p T > O(10) MeV can reach the innermost layer of the tracker and produce a hit. One trade-off is that hits cannot be associated to a primary vertex, and pile-up collisions increase the number of hits per event for background.
Hits from SUEP events are likely to be localized in z near the primary vertex, and this can be used to discriminate against pile-up.
There are also several potential strategies to trigger on SUEP events directly at Level 1. When SUEP events are not boosted, final state particles create a band which spans all φ and are centered around a definite value of η. It may be possible to reduce thresholds by designing a trigger which looks for high H T in one η-slice compared to the rest of the event.
Additional Level 1 strategies depend on the final state particles in the SUEP event. An all-electron or all-photon signal would create an abnormally high ratio of energy deposited in the Electromagnetic Calorimeter compared to the Hadronic Calorimeter. This method could be further refined by requiring Electromagnetic energy be centered around a single ηslice. Alternatively, SUEP events with dark matter particles in the final state, semi-visible SUEP could potentially be accessed via a missing-transverse-momentum trigger.
Both ATLAS and CMS will upgrade their detectors and trigger schemes at the High Luminosity LHC, offering further opportunities to design new SUEP triggers. The projected CMS trigger at the HL-LHC will reconstruct charged particles with p T > 2 GeV at Level 1. This new scheme would enable even more optimization of the trigger design at the L1 level. One could optimize a selection on the number of tracks at L1 to recover an even greater quantity of low H T events. This strategy would be most effective for higher temperature scenarios.
Once a trigger strategy has been determined, it is essential to reconstruct the low-momentum charged particles associated with SUEP signatures. Standard ATLAS and CMS track reconstruction is highly efficient for charged particles which traverse roughly 8 layers of the silicon trackers [155,156]. In CMS reconstruction is roughly 90% efficient for charged particles with p T ≥ 1 GeV, and roughly 60% efficient for particles with p T ∼ 300 MeV. In ATLAS, nominal track reconstruction has a minimum requirement of p T ≥ 500 MeV. Standard reconstructed tracks typically have impact parameter resolutions on the order of 100 µm [157], which enables tracks to be associated to the primary vertex (proton-proton collision of interest). This primary vertex requirement ensures that the computed track multiplicity is not biased by nearby pile-up collisions.
There are additional possibilities to reconstruct charged particles with even lower momenta. In ATLAS, tracks with p T > 100 MeV can be reconstructed for a small subset of events [158]. These events would be processed with an additional pass of tracking, using leftover hits, in a region of interest of z ∼ 1 mm around the primary vertex. To access even lower momentum charged particles, p T ∼ 10 MeV, it is possible to count the number of hits in the inner most layers of the detector. While this strategy would increase the acceptance for extremely low-momentum SUEP particles, it is impossible to associate hits to a particular primary vertex. This strategy also requires accessing low level data which may not be possible due to storage constraints.
For SUEP signatures with higher temperatures, a significant number of final state Standard Model particles will have moderate momenta, p T > 3 GeV. For these scenarios, it is possible to identify final state particles as muons, electrons, or hadrons. Events with a large multiplicity of low momentum leptons would be a particularly striking signal, and this information could be used to further improve signal to background discrimination.
After reconstruction, the unique properties of SUEP events can then be used to separate against signal from QCD background. The two most powerful observables include the characteristic high track multiplicity, and the isotropic distribution of such tracks.
For analyses using data that were collected during Run 2, trigger strategies will likely rely on the presence of additional objects produced in association with the SUEP shower, either based on a high quantity of ISR QCD or a massive gauge boson decaying to triggering objects. In both cases, a simple procedure to recover the SUEP shower can be followed. First, the triggering object can be identified with standard reconstruction techniques and subtracted from the overall event representation. Second, the remaining tracks can be boosted against the triggering object. In case the later is produced back-to-back with the mediator, these would allow to recover both the track multiplicity and spherically symmetrical distribution of the SUEP shower.
Recently, the anomaly detection techniques as a generic way to search for new physics have been incorporated to SUEP searches. A proposal based on autoencoders [159] shows the strength of such techniques for low mediator masses and temperatures. Additional details on this approach are presented in Sect. 5.4.

Glueballs
Contributors: David Curtin, Caleb Gemmell, Christopher B. Verhaaren In the N f D = 0 limit for SU (N c D ) Yang-Mills theories, the only hadronic states that form below the confinement scale are glueballs, composite gluon states. This limit is unique because there are no light degrees of freedom below the confinement scale. In the absence of such light states color flux tubes cannot break via the creation of quark-antiquark pairs. This process is essential to present QCD hadronization models [160,161], thus the usual understanding of hadronization does not directly apply to the N f D = 0 limit. This qualitative difference has hindered efforts to study dark glueball showers in scenarios with pure Yang-Mills dynamics.
Sectors with N f D = 0 SU (N c D ) Yang-Mills descriptions commonly appear in neutral naturalness theories, such as Twin Higgs models [11,12], Folded Supersymmetry [162] and many others [163][164][165][166]. The glueballs of a hidden confining sector have also been considered as dark matter candidates [23,26,167,168]. Thus, N f D = 0 SU (N c D ) Yang-Mills models are motivated by possible solutions to the Little Hierarchy problem and the unknown nature of dark matter. Current ignorance of the pure-glue hadronization process has left these models in an largely unstudied corner of motivated parameter space. The hadronization process determines the final state multiplicity and energy distribution of dark glueballs, which are essential for both collider and indirect detection studies.
The properties of the glueballs themselves are relatively well-known, having been studied in lattice gauge theory [169][170][171][172][173]. In the absence of external couplings, these studies have established a spectrum of 12 stable glueball states characterized by their J PC quantum numbers. When considered as part of a dark sector, these glueballs can be stable or decay through a variety of portals to the SM [43,45], with possibly long lifetimes on collider or cosmological time scales. This spectrum can entirely be parameterised by the confinement scale of the theory, or equivalently lightest glueball mass, m 0 ∼ 6 D . These glueball properties are also known for several N c D = 3 which paves the way for studying exotic dark sectors outside the standard dark SU (3) case [23,167,174,175].
Recently, efforts have been made to enable quantitative studies of pure Yang-Mills parton showers and hadronization [75]. This includes the creation of a new public python package, GlueShower. 2 This package allows users to simulate dark glueball showers produced from an initial pair of dark gluons. GlueShower combines a perturbative pureglue parton shower with a self-consistent and physically motivated parameterization of our ignorance regarding the unknown glueball hadronization behaviour. Two qualitatively different hadronization possibilities are included: a more physically motivated jet-like assumption, and a more exotic plasma-like option that accounts for the possibility that color-singlet gluon-plasma-states are created by hypothetical non-perturbative effects far above the confinement scale. Each such plasma-ball then decays isotropically to glueballs in its restframe, somewhat akin to dark hadron production in SUEP scenarios [9,107,159]. Within each hadronization option, two nuisance parameters control the hadronization scale and the hadronization temperature, mostly controlling the glueball multiplicity and relative abundance of different glueball species respectively.
The study [75] defines a set of 4 benchmark points for these nuisance parameters in each option to represent the range of physically reasonable glueball hadronization possibilities. For phenomenological studies, the range of predictions spanned by these hadronization benchmarks can be interpreted as a theoretical uncertainty on predictions for glueball production. Despite the wide range of possibilities for hadronization that are considered, most glueball observables are predicted within an O(1) factor. This is in large part due to the modest hierarchy between the glueball mass and the confinement scale, which makes most inclusive glueball observables dominantly dependant on the physics of the perturbative gluon shower. However, exclusive production of certain glueball states can vary by up to a factor of 10 depending on hadronization assumptions, a range that accurately represents our current theoretical uncertainty.
In summary, to support the increasing interest in dark showers, efforts are being made to ensure the possibility space is covered. Until recently the N f D = 0 limit has been largely ignored due to difficulties in studying pure-glue hadronization. However, such hidden sectors are motivated by both naturalness concerns and as a possible dark matter particle. The GlueShower tool aims to effectively facilitate studying the N f D = 0 limit, even without a full understanding of the underlying non-perturbative hadronization physics. Despite the current theoretical limitations, this demonstrates that quantitative studies of and searches for glueball signatures can be reliably conducted if the underlying uncertainties are accurately accounted for. 2 github.com/davidrcurtin/GlueShower.

Contributors: Suchita Kulkarni, Seán Mee, Matt Strassler
The non-perturbative nature of QCD-like strongly interacting scenarios makes it impossible to set consistent UV and IR parameters based purely on perturbative analysis. In this section, we address this problem, going beyond existing efforts in the literature. First, we sketch the importance of lattice calculations to set low energy bound state masses given UV parameters. Second, we illustrate the importance of portal phenomenology and associated symmetry breaking patterns, and use chiral Lagrangian techniques to set the interactions of the low energy bound states among themselves and to the SM final states. Finally, we comment on the hadronization parameters necessary for LHC phenomenology and make some observations for a subset of them. After this, we turn to the simulation of benchmark models consistent with the above observations. We describe the recent improvements to the pythia 8 Hidden Valley module, and present a few benchmarks which are used in later sections for studies.

UV scenarios: SM extension with non-Abelian gauge groups
We suppose the Standard Model is extended with a new sector, consisting of an additional non-Abelian gauge group SU (N c D ) with N f D degenerate Dirac fermions q Dα in the fundamental representation, with current mass m q D . We will refer to the new sector as the "dark sector," though we note this is fully equivalent to a "hidden valley" as defined in [36]. This sector has global symmetry broken by the mass term to a diagonal SU (N f D )×U (1). 3 We will assume N c D , N f D are such that the theory confines and has a chiral q Dα q Dα condensate, which in the absence of m q D would spontaneously break the chiral symmetries and lead to Nambu-Goldstone bosons. Instead, as in QCD itself, we have pseudo-Nambu-Goldstone bosons with masses that are proportional to √ m q D .
If the only connection between this sector and the SM is through a mediator (or "portal") which is either massive or ultra-weakly coupled, then typical confining Hidden-Valleytype phenomenology inevitably results. Specifically, production of the "dark quarks" leads to production of dark hadrons. These will be collimated in jets if the theory is QCD-like and the invariant mass of the produced q D q D pair is far above the dark confinement scale. We will specifically consider a mediator in the form of a heavy U (1) leptophobic Z mediator between SM quarks and the dark quarks. These are the same models introduced in Sect. 2.1.1, used widely in the semi-visible jet searches and are also considered in the dark matter working group [176]. The process qq → Z → q D q D allows dark hadrons to be produced, while the Z also allows some dark hadrons to decay to SM quark-antiquark pairs, leading to an all-hadronic signal. We will assume throughout that m Z is much larger than the confinement scale, by at least ∼ 30, so that the physics in the dark sector actually leads to jets of dark hadrons. 4 Because a fraction of the dark hadrons are typically stable, at least on LHC-detector time scales, the observable signal usually consists of at least two relatively fat jets with considerable substructure, and often a high multiplicity of SM hadrons, along with roughly collinear E T . These "semi-visible jets" (SVJ), introduced in Sect. 2.1.1, are the target of the searches in question here.
The mediator's couplings to the dark sector will break the SU (N f D ) × U (1) flavor symmetry to a smaller subgroup G f . Without this breaking, the majority of the dark hadrons would be charged in the adjoint of SU (N f D ) and would be unable to decay to a SM final state, which would be a singlet under this symmetry. (Note that decays with both dark and SM particles in the final state would still be permitted.) If the couplings are vector-like, we assign the dark quarks charges Q α under the U (1) , and define the charge matrix Q as a diagonal matrix with eigenvalues Q α ; the group G f is the subgroup of SU (N f D ) × U (1) which commutes with Q. (In chiral models the left-and right-handed quarks have different charges Q α andQ α , and get their masses from a Higgs field with charge Q α −Q α . We will not consider such models in detail here.) The precise choice of Q has a significant impact on the phenomenology.
The ultraviolet Lagrangian for the hidden sector is where G D,μν is the gauge field strength tensor and m q D is the current mass of the dark quarks. This part of the theory has two discrete parameters N c D and N f D , and two continuous parameters, the running gauge coupling α D (μ), with μ a renormalization scale, and the "current" quark mass m q D .
Since neither of these parameters has direct contact with the observable phenomena, we replace them with the confinement scale D , or some proxy for it, such as the one-loop dimensional transmutation scale, and the mass m π D of the light pseudoscalar mesons π D .
The interaction of this sector with the SM via a Z mediator takes the form (12) if the Z couplings are vectorlike and the q Dα have charge Q α . (If the couplings to the hidden sector are chiral, separate charge Q α andQ α must be assigned for left-and righthanded hidden quarks.) We will assume all SM quarks have the same charge under U (1) for simplicity, though in realistic models one must account for the differences, which can affect observables, such as the SM heavy flavor fractions in the SVJs.
One other issue of importance is whether the Z and the q D obtain their masses from the same source, in such a way that the longitudinal polarization of the Z mixes with the π D from chiral symmetry breaking. Similar mixing of the SM charged pion with the W bosons allows the classic decay π → μν. By analogy, the Z mixing with the π D affects the decays of the latter to the SM.

From ultra-violet theories to infrared parameters
The SU (N c D ) confines at around the scale D and various dark hadronic bound states are produced. In the exact SU (N f D ) limit, the lightest hadrons consist of the spin-0 flavor-adjoint π D , which are pseudo-Nambu-Goldstone bosons (PNGBs), the spin-1 flavor-adjoint ρ D , the spin-1 flavor-singlet ω D , and the spin-0 flavor-singlet η D . In general m π D < m ρ D m ω D , while the η D mass depends on the anomaly, which scales like N f D /N c D when it is dominant. Thus for N f D flavors, the theory contains N 2 f D − 1 mass-degenerate pions and an equal number of degenerate rho mesons, along with an omega which will be slightly heavier than the rhos, and an eta-prime which may be neardegenerate with the pions for N f D N c D but is much heavier for N f D ∼ N c D . In addition there are baryons and antibaryons with mass of order N c D D (bosons for N c D even, fermions otherwise), except for N c D = 2 in which case they are exactly degenerate with the pions and are themselves PNGBs. We note from these remarks that N f D = 1 and N c D = 2 are special cases, which must be treated with care.
There are ambiguities in defining the scale D . One way to define it is via the running gauge coupling α D (μ) at one loop. As pointed out by 't Hooft, physics in the large N c D limit depends mainly on α D (μ)N c D , up to 1/N c D corrections. Lattice results show that N c D = 3 is already close to N c D = ∞, and moreover the physics of a QCD-like shower has very small 1/N c D corrections, a fact that the pythia showering routines take advantage of. With this in mind, the one-loop running coupling can be written in a form familiar from QCD: This form emphasizes that the physics of this theory is really a To this end in Fig. 13(left), we show the running of α D at one lop for several values of N c D /N f D for a fixed D = 1 GeV. Although the one-loop formula for D is currently used in the pythia 8 HV module and by us throughout this document, this situation should be viewed as temporary. Inevitably, this method indexes non-perturbative hadronic masses to a scale which is perturbative, and this may pose challenges for interpreting results from lattice gauge theory in which D is often defined non-perturbatively, for instance through the string tension. Moreover, the connection between this one-loop estimate of D and the physical confinement scale becomes less and less accurate as N f D /N c D increases; two-loop effects become important, with the effect that non-perturbative definitions of D will be much smaller than the one-loop definition. It seems likely a two-loop perturbative definition would be significantly closer to non-perturbative ones. To illustrate for this effect in Fig. 13(right) we demonstrate the effect of two loop correction for running of α D (solid lines) in comparison with the corresponding one loop correction (dashed lines), for two different values of N f D = 3, 6 with fixed value of N c D = 3 keeping D = 1 GeV. In order to derive this result, we have used the procedure as described in [177] with beta function as defined in [178].
In fact for sufficiently large N f D /N c D the theory will no longer confine because its running coupling reaches an infrared fixed point. The values for N c D , N f D where this occurs are not precisely known. For various N c D , estimates of the value of N f D above which the theory is believed not to confine are computed in e.g. [179] and tabulated in Table 2. These results suggest that theories with N f D < 3N c D likely confine, and we will refer to them as "QCD-like" theories.
Again, we have at this time only used the one-loop formula above to define D , and we use the same definition for the parameter Lambda in pythia 8, which is perhaps reasonable for N f D /N c D ∼ 1 or below. But it is important to note that for N f D /N c D → 3, the one-loop running is inaccurate and at a minimum a two-loop formula (within which the fixed points at large N c D , N f D can be observed) ought to be used (Fig. 21).
The masses and couplings of the low-lying bound states are a direct consequence of UV parameters, but are not calculable analytically. Rough estimates of these quantities can be obtained by combining lattice gauge theory calculations with the chiral Lagrangian for spin-0 mesons, extended by including spin-one mesons as though they were flavor-symmetry gauge bosons. It is convenient to set the overall scale of the dark hadrons using a non-perturbative definition of the confinement scale, which we will call˜ D and specify later, and to express all other dimensionful dark-hadronic quantities in terms of this parameter. Once we have fixed˜ D as the overall hadronic scale, and specified the ratio (where the relation m π D ∼ √ m q D follows from the chiral Lagrangian), everything else should be computable in principle. Such lattice calculations for mass degenerate fermions in the fundamental representation of SU (N ) gauge theories are available in abundance, albeit in the quenched approximation, see e.g. [180][181][182][183]. A particularly useful resource is [184], which summarises the spectrum of mesons in the large-N limit of QCD-like theories. These calculations can be used to determine the ratios of the dark hadron masses as a function of the hidden sector parameters.
Using lattice calculations and fits plotted in Fig. 19 of [185], we can relate the dark quark mass to the dark pion mass and the dark rho mass. We express these in terms of˜ D defined as the chiral limit (m π → 0) of the ρ mass divided by 2.37. (In terms of the physical units used in Fig. 19 of [185], this puts the analogue of˜ D for physical QCD at 300 MeV.) These relations are concretely shown in Fig The fit functions and coefficients shown in Eq. 14 are appropriate for small m π D /˜ D , though they work far beyond this expectation, and begin to differ from lattice computations by > 10% only for m π D /˜ D > 2.3. The relation between the perturbative D of Eq. 13 (or its higher-loop version) and the non-perturbative˜ D , defined in terms of the chiral limit of m ρ D (or some other similar definition), is not established. Although they are proportional, the proportionality depends on N c D and N f D (mostly on N f D /N c D .) In what follows, we will assume they are the same. The uncertainties and inaccuracies that result from this choice can only be reduced in future through more careful matching between perturbative and non-peturbative quantities.  Fits given in Eq. 14 for the ρ D mass (left) and π D mass (right) to results from lattice simulations [185]. The left panel also indicates the kinematic thresholds for ρ D to decay to π D π D and π D π D π D The spin-1 singlet is expected to be nearly degenerate with the spin-1 adjoint hadrons, so there is little importance in giving it a different mass. 5 By contrast, the spin-0 singlet is expected to be heavier than the spin-0 adjoint, possibly by a large amount if N f D ∼ N c D , due to the axial anomaly. Some analysis of QCD hadrons using the chiral Lagrangian, which we will present elsewhere, suggests Thus for N f D ∼ N c D , as in SM QCD, the splitting of the spin-0 singlet from the adjoint is large. Note that the factor of 3 in front of˜ D depends on the precise definition of the corre-sponding˜ QC D , and will retain some uncertainties until this definition is handled more carefully. In any case, our current benchmark models presented below do not account for the anomaly term, and instead treat the η D as degenerate with the π D . However, the new version of pythia 8 includes a parameter HiddenValley:separateFlav, described in Sect. 4.1.4 below; when it is set "on," the η D mass can be set separately from that of the π D states. For the scenarios we are interested in, a hard process leads to production of dark quarks which shower over a wide energy range; the shower then subsequently hadronizes. Hadronization in the dark sector is a far more challenging problem, because neither lattice calculations nor effective field theory methods are applicable to this process. All we know of it arises from studies of QCD data, and in particular through the use of phenomenological models (such as the Lund string model used in pythia, or the clustering model used in Herwig) whose parameters are tuned to fit experimental results. Currently there is no theoretical insight into how these models or their parameters should be adjusted for different values of N c D , N f D , or m π D / D . Consequently we take existing parameterizations from data as a starting point. One must vary these parameters within reason to obtain a sense of uncertainties.
In the present context, we are using pythia 8's HV module, whose four main parameters HiddenValley:aLund, HiddenValley:bmqv2, HiddenValley:rFactqv, HiddenValley:sigmamqv parallel those of the corresponding QCD hadronization routine. The dimensionless HV parameters are set to exactly the same values as the dimensionless QCD parameters, while those with dimensions are scaled by the ratio of constituent quark mass parameter mqv = 4900101:m0 (which is not the current quark mass m q D but rather a phenomenological parameter, of order D for small m q D ) to the constituent quark mass in QCD, which for u, d quarks is 330 MeV. There are other parameter tunes proposed by pythia 8 experts, see for example the Monash tune [186]. It is probably wise to try two or more tunes that are known to work in QCD as a means of estimating a minimum systematic error from this source. However, we have not studied this, and so further investigation is needed before an informed recommendation can be made.
There are three other hadronization parameters that are currently in use in the HV module. About the parameter HiddenValley:probVector, which gives the probability that a new meson formed in the hadronization should be assigned to spin-1 rather than spin-0, we have two pieces of information. Were spin-0 and spin-1 mesons massdegenerate (appropriate for m π D / D 1 and bordering on unphysical for the Lund model), we would expect probVector = 0.75 based on spin counting (three spin-1 states versus one spin-0 state.) Data from QCD, with m π / QC D ∼ 0.5, suggests use of probVector = 0.5, downweighting spin-1 presumably because of phase space. 6 From this we learn that the appropriate probVector is a slowly increasing function of m π / QC D . It would be reasonable to choose a phase-spacemotivated functional form for this function, with a smooth m π D → 0 limit, but we have not made an effort to do this. Little is known about the limit m π D → 0; it is not even clear that the Lund model is accurate there.
When the parameter HiddenValley:separateFlav (included in the new version of pythia 8 and described in Sect. 4.1.4) is set "on", the parameter HiddenValley: probKeepEta1 downweights the probability of producing a singlet η D meson relative to other diagonal mesons. This should be set to 1 when N c D N f D since in the large-N c D limit (with N f D fixed) the axial anomaly is negligible and the η D is like the adjoint-flavor bosons. Conversely it should be set to a small value when N f D is of order or greater than N c D and the η D is heavy, as it is in QCD; in pythia 8, the corresponding QCD parameter StringFlav:etaPrimeSup is set by default to 0.12.
Finally, the option of allowing baryons in hadronization for N c D = 3 can be controlled with the parameter HiddenValley:probDiquark; this determines the likelihood of pair-producing diquarks, which, for N c D = 3 only, combine with a quark to form a baryon. We have not validated this parameter and recommend that for now baryons (at most a 10% effect, which is probably smaller than hadronization uncertainties) should not yet be used.

Decays of dark hadronic bound states
The dark hadronic bound states are either stable, undergo decays within the dark sector, or decay to final states that include SM particles. The decay patterns depend on the charge matrix Q and the dark hadron mass hierarchy. In particular, when the mass m ρ D is larger than twice m π D , the ρ D decay to π D π D . In the regime where such decays are not allowed some of the ρ D may decay back to the SM via mixing with the Z . The details of these decay modes however are determined by the group algebra and need careful treatment. We outline below the salient considerations in setting such decay modes. Region 1: Dark sector decays 2m π D < m ρ D When the decay channel ρ D → π D π D is open, it dominates all other decays since g ρ D π D π D is large compared to any other coupling (Fig. 23 left). The width where f abc are the structure constants of SU (N f D ), is nonzero for all ρ D mesons, and large unless N c D is enormous. 7 Without any mixing between the Z and the π D , the latter is stable and invisible, so we assume that we are considering a model where such mixing occurs, allowing the decay This mixing typically arises because a Higgs field (whose scalar is assumed too heavy to be of interest here) gives mass to both the Z and the quarks q D , typically along with some additional flavor violation. The exact details of the mixing and corresponding lifetimes depends on precise modelbuilding. Because the decay in Eq. 17 is helicity-suppressed, the width for this process is of order |y q | 2 m π D 5 /m 4 Z or smaller, where y q is the Yukawa coupling of the Standard Model quark; the heaviest kinematically-accessible quarks dominate. Note this width is parametrically small and low-mass π D will have displaced decays. To determine if a particular π D decays promptly, its lifetime needs to be calculated in a consistent leptophobic model, but to our knowledge the relevant model building has not been done.
Furthermore, we do not treat the flavor singlet η D in detail here as our analysis is not yet complete. For N f D N c D it is heavy, as in QCD, and rarely produced. Similarly, since baryons are only available for N c D = 3, where they are a small effect, we do not discuss them here.
We conclude by noting that one should keep in mind that the Z charge assignments also determines the decay branching fractions of the Z . Especially when dark hadron multiplicities are small in Z decays, this introduces a small but significant correlation in the flavors of the dark hadrons, This will become more important in future studies with nondegenerate quark masses. Region 2: Dark sector decays 2m π D > m ρ D In this case we will assume that there is no mixing between the Z and the π D -that the quarks and the Z get their masses from separate sources. The hidden pions charged under G f are then stable and invisible. The precise fate of the other singlets needs further investigation. Some may be stable due to a discrete symmetry. Others are obviously unstable due to standard flavor anomalies, though the details depend on the matrix Q. Decays to one or two SM qq pairs have small widths because of powers of 1/m 2 Z factors along with loop factors or phase space factors. In general these particles will be very long-lived on LHC detector time-scales, and thus a source only of E T . This statement may however be modeldependent and so one must be careful to compute the lifetimes for these states in a particular model. We assume here that all π D are LHC-detector stable.
The decays of the ρ D , however, can be observed. First, there can be mixing between the Z and the ρ D mesons which are singlets under the group G f . These decays ρ D → (Z ) * → qq are not helicity suppressed and are thus faster than the corresponding π D decays that we discussed in the previous section (Fig. 23 middle).
For those ρ D that are non-singlet under G f , flavor symmetry would not prevent the decay (Fig. 23 right) ρ D → π D + (Z ) * → π D q q.
The ρ D and the π D in this decay have the same G f quantum numbers, while the qq are a flavor singlet. This decay would be prohibited by the naive symmetry π D → −π D in the chiral Lagrangian, but this symmetry is violated by the usual chiral anomaly that mediates π 0 → γ γ decay in QCD, and allows a ρ D ρ D π D coupling in this context. Mixing between a G f -singlet ρ D and a Z then induces a ρ D Z π D coupling, which permits this decay to proceed. Specifically where d abc appears in the anti-commutator [187] {T a , Importantly, however, Tr({T a , T b } Q) can vanish. If this occurs, then this decay channel is not available. For instance, if T a is the matrix whose (α, β) entry is 1 and whose other entries are all zero, then the above trace is proportional to Q α + Q β . Equal and opposite eigenvalues in Q then assure that the corresponding ρ D does not decay via the anomaly. Although this does not guarantee that this particle is stable against decay via higher-order processes, it does mean that it has a very long lifetime and is likely LHC-stable.
Because this decay has a 3-body phase space and because by assumption m ρ D − m π D < 1 2 m ρ D , this decay is heavily suppressed, and will lead to displaced vertices if m ρ D /m Z is too small. In the limit D m Z as we have assumed in this section, In addition, we have used and g ρ D π D π D = m ρ D /( √ 2 f π D ). The former relationship and in particular the factor of N c D arises from SU (N c D ) symmetry [188] while the latter is KSFR relationship [189,190], and e D , g q are defined in Eq. 12. 8 For these decays to be prompt, the ratio m 11 ρ D /m 4 Z must not be too small. It should further be noted that f π D also includes a mild N c D dependence [181].
As above, we do not discuss the η D , the ω D or baryons here.

Updates and inputs for PYTHIA 8 hidden valley module
The pythia 8 hidden valley module has received an update in version 8.307, after having been stable for some years. One update is substantive; the previous versions were overproducing very soft hidden hadrons (mainly pions) at low p T . This bug fix slightly affects many plots, as we will see in Sect. 4.3; for example it affects the total multiplicity of hadrons. Fortunately, the methods used in previous SVJ analyses are not very sensitive to this effect, which leaves total visible energy in a jet, the E T aimed in its direction, and most substructure variables roughly unchanged. The possible exception is for variables which are not infrared safe, most notably p T D; see 4.3. The other main change has been to increase the flexibility of the module. Depending on a newly introduced flag separateFlav, the simulation in each regime may proceed in two ways. An imperfect but often sufficient simulation, which was already available in pythia 8.150, is available with separateFlav=off; in this case the full adjoint multiplets of spin-0 and spin-1 mesons are each simplified into two states, one flavor-diagonal and one flavor-offdiagonal. This division is not consistent with most choices of Q, as it requires that G f = U (1) N f D . However, as long as all dark hadrons are stable (on LHC timescales) or decay promptly, it is possible to mock up other choices of Q, where for instance only a fraction of the flavor-diagonal states would decay visibly, by assigning the flavor-diagonal meson a probability to decay to a visible SM state and a corresponding probability to decay invisibly.
Alternatively, the setting separateFlav=on allows full control over all the spin-0 and spin-1 states; separate lifetimes and decay modes can be assigned to each. The particle ID number for the spin-0 (spin-1) meson with quark i and anti-quark j is 4900ij1 (4900ij3), at least for i = j. For i = j the situation is more complicated since the diagonal mesons are flavor mixtures; for example, with N f D = 3, the pion is a uu − dd state and the η is a uu + dd − 2ss state (ignoring normalizations). Typically one may order the diagonal flavoradjoint mesons in a canonical way, through the increasing number of quark flavors appearing in their wavefunctions (or equivalently by the increasing number of non-zero entries in the corresponding diagonal SU (N f D ) generator). The flavorsinglet state is always (1/ N f D ) α q Dα q Dα and is always assigned particle 4900FF1 (4900FF3) where F ≡ N f D . This is important because this state has special status, see below.
This setting then requires the user to create a full decay table for of order N 2 f D dark hadrons. Although we will comment on the settings for this below, we have not yet auto-mated this task, so at this time we have no benchmarks for separateFlav=on. It also permits the hidden quarks to have different masses, but we have not yet validated this capability and more studies are needed.
Other changes, not utilized below, are in the treatment of the flavor singlets, especially for spin-0, and baryons for N c D = 3. Since the flavor singlets can be given different masses with separateFlav=on, this allows for a more accurate spectrum. The masses of the singlets should be assigned to the spin-0 and spin-1 particle ID codes 4900FF1 and 4900FF3. This is especially important for spin-0 because the singlet can have a much larger mass than the adjoint due to the axial anomaly, as for the η in QCD. As we mentioned above, an additional parameter probKeepEta1, which can be chosen between 0 and 1, has been added; this reduces the probability of producing of the η D relative to other spin-0 mesons in the hadronization process. Meanwhile the routines for producing baryons in the SM sector have been activated for the HV sector as well, but only work for N c D = 3. (For N c D > 3 this is not a concern since baryon production would be highly suppressed. For N c D = 2 a special routine must be written, because baryons, antibaryons and mesons are all degenerate; this is why the current HV module should not be used for N c D = 2, at least not without careful consideration of how to reinterpret its results). For now, only one type of diquark is produced, that of q D1 q D1 . All the baryons produced are assumed to have spin 3/2 and to have one of the N f D quark flavors i combined with a single flavor of diquark, with particle ID code 490i114. For separateFlav=off, all of these states are conflated into the state with i = 1.
We have mentioned that pythia 8's hadronization routine cannot simulate a theory with N f D = 1 or N c D = 2, but it may fail for other reasons. For any choices of N f D and N c D , one should avoid overly small or large values of m π D / D . At small values approaching the chiral limit, theoretical understanding of hadronization is lacking, and the Lund string model used in pythia 8 may not function in any case; meanwhile at large values other hadrons (glueballs, in particular) will become as important as pions or rhos, but are not included in the Lund string model. To be conservative, we suggest limiting studies to 0.25 < m π D / D < 2 until there has been further theoretical work on this issue.
We now turn to the pythia 8 parameters that must be set to simulate the models discussed above. We begin with those that are independent of whether separateFlav=off or separateFlav=on.
• HiddenValley:Ngauge, HiddenValley:nFlav -These are N c D and N f D ; the former should always be set greater than 2 and the latter should always be set greater than 1. (For N c D = 2 or N f D = 1, pythia 8 is currently missing essential dark hadrons and gives an inaccurate simulation.) • Constituent dark quark mass 4900101:m0 -The quark mass defined in pythia 8 is the constituent quark mass, not the current quark mass. This quantity has never been given a theoretical definition, but may be roughly defined by m q const ≈ m q D + O(1) × D . For definiteness we will use this relation with the coefficient fixed to 1, namely m q const ≡ m q D + D . • Confinement scale HiddenValley:Lambda -This can be defined in multiple ways, but we take it for now to be the scale at which the running gauge coupling constant diverges at 1-loop order, since currently the PYTHI8 HV module has implemented the running coupling at one loop. As we have mentioned above, further consideration of this definition is warranted. The associated behaviour is illustrated in Fig. 21 Next, if separateFlav=off, only two additional parameters must be defined.
• Dark pion mass 4900111:m0, 4900211:m0. These are the masses of the bound state spin-0 multiplets; they should always be taken equal. 9 Within the chiral regime, these may be related to the confinement scale D and current quark mass m q D via Eq. 14. However, we advise taking this observable as an input parameter and viewing m q D , which is scheme-dependent, as an output. 9 The spin-0 states also include the flavor-singlet η D , which, as discussed in Eq. 15, can be relatively heavy. However there is no way to take this into account for separateFlav=off.
• Dark rho mass 4900113:m0, 4900213:m0. These are the masses of the bound state spin-1 multiplets, and should always be taken equal. 10 Within the chiral regime, these are related to the confinement scale D and the dark pion mass 4900111:m0 using Eq. 14.
In addition, decay channels and lifetimes for these four states must be defined by the user. If instead separateFlav=on, then even for the massdegenerate case, all spin-0 and all spin-1 mesons must have separately defined masses, 4900ij1:m0, 4900ij3:m0 for N f D ≥ i ≥ j ≥ 1. Note the flavor singlets have particle ID codes 4900iis with i = N f D and s = 1, 3; the user may wish to change probKeepEta1 which can be used to suppress the spin-0 singlet production. Again the user must define all lifetimes and decay channels, now for a much larger set of particles. Depending on the model, it may be very important to ensure that the flavor structure of the decays is precisely specified, as is emphasized in the earlier Eqs. 16 and 18.

Proposed benchmarks
We have created benchmarks for the purpose of the studies in Sect. 4.3. We are implicitly assuming dark hadron lifetimes are short enough to be considered prompt, as appropriate for the SVJ signatures. For low-mass dark hadrons, this is far from obvious. Lifetimes need to be calculated in the context of complete models, but constructing such models is no simple matter in the context of a leptophobic Z because of potential U (1) gauge anomalies that would make the theory inconsistent. We are not aware of any complete calculations of dark hadron lifetimes in this context, so we must warn the user that some of these benchmarks, especially those with light π D , may not be realizable theoretically.
Let us first note what all the benchmarks have in common. In each case Versions of the benchmarks with separateFlav=on would be more accurate in their treatment of flavor-singlets, but will have to be created at a later time.
We have several benchmarks with m π D < 1 2 m ρ D .
• All have N c D = N f D = 3.
• Because of this choice, the parameter probVec is taken to be 0.5, since m π D / D is similar to its value used in real-world QCD. • Three choices of D are considered: 5 GeV, 10 GeV and 50 GeV. • For each of these, the number of stable diagonal spin-0 mesons is k = 0, 1 or 2, with 3 − k decaying to the SM; since the six off-diagonal pions are stable in this model, this gives r inv = (6 + k)/9. • The dark pions are assumed to decay promptly and only to cc (charm being the heaviest kinematically-allowed SM quark for the smaller values of D ).
The choice of k depends on mixing among the singlet and diagonal adjoint pions and the Z . The details, especially the interplay between mixings and lifetimes, require careful model-building. We are not aware of any papers in which this has been done. Note that the use of separateFlav=off means that we do not treat the SU (N f D ) flavor singlets separately from the other mesons. For N c D = 3 the η D is much heavier than the other states, and this is not correctly modeled. In particular, it leads to a small correction to r inv . For N c D N f D , the splitting between the flavor adjoint and singlet states becomes small, so the use of separateFlav=off is less problematic there.
For m π D < 1 2 m ρ D , we have so far defined only one benchmark stable. • All spin-0 mesons are assumed to be stable on LHCdetector time-scales. • All diagonal spin-1 mesons (including the singlet, which we do not treat carefully) decay to all available SM qq pairs. • All off-diagonal spin-1 mesons decay to SM qq plus an invisible spin-0 meson. Table 3 summarises our current benchmarks. To compose benchmarks with separateFlav=on, there are a number of additional steps needed. For m π D < 1 2 m ρ D , the decays ρ a D → π b D π c D need to be correctly programmed. For example, for N f D = 3, ρ 3 D , the diagonal member of the rho isotriplet (particle ID 4900113), decays to spin-0 bosons π i j D (particle ID 4900ij1) in the following pattern: ρ 3 D (4900113) → π 12 D π 21 D (66%), π 13 D π 31 D (16%), π 23 D π 21 D (16%), the 4:1:1 branching ratios reflecting the relative isospinssquared of these spin-0 states. All of these details need to be correctly laid out in the pythia decay table in order that spin-0 mesons be produced in the right abundances. It is also important to decide how to treat the singlet states, especially the spin-0 singlet whose mass and production rate in hadronization may be quite different from the others. Finally, all the spin-0 meson decays and lifetimes must be separately entered into the pythia decay table.
For m π D > 1 2 m ρ D , similar efforts are required to ensure that the flavor structure of the diagonal and off-diagonal ρ D decays are correctly implemented in the decay table.

Final remarks
Before we proceed with phenomenological studies using the benchmarks proposed in this section, we would like to emphasize that we have only laid out an initial road for defining consistent phenomenology in the context of semi-visible jets. We have considered the leptophobic Z SM-DS portal widely used in the semi-visible jets literature, and pointed out the crucial role of charge assignments in determining the phenomenology, though we have not worked out the details. Dark hadron masses may potentially be extracted from a combination of lattice simulation of the hidden sector and general theoretical considerations. But their decay channels and lifetimes are highly model-dependent, and calculating them involves careful consideration of the detailed charge assignments of the Z , its mixing with various dark hadrons, and the spectrum and interactions of the dark hadrons (including anomalies) as obtained from symmetry considerations and the chiral Lagrangian. These sometimes intricate calculations must be performed in each model, unless an over-arching theoretical treatment, covering all models in this class, can be given.
In the context of semi-visible jets, the lifetimes of the various states are particularly important. This signature is defined to be one in which all objects either are stable, producing E T , or decay promptly to SM-hadronic final states. Long-lived particles with lifetimes greater than a few centimeters and less than 10 meters (in the lab frame) would move the signature into a different regime, outside the semivisible jet framework. It is therefore imperative to identify all unstable dark hadrons and calculate their lifetimes correctly. We have estimated lifetimes and have moderate confidence that all particles in our benchmarks decay promptly or are stable on LHC detector scales, but we have not by any means done a thorough analysis.
We would also like to note that there are still significant issues with hadronization that we have not begun to address. We have made a few observations about the hadronization Table 3 Current benchmarks for m πD > m ρD /2 and m πD < m ρD /2 regimes. In the former case all π D are stable and source of& E T , while for the later, the ρ D mesons decay to π D which further decay to cc final states at the LHC. The benchmarks assume that the decays of ρ D , π D are prompt  3 10 Various

Sample generation
The signal process considered for the validation of new Hidden Valley (HV) Module of pythia 8 [192,193] consists of semi-visible jets [2] produced in the s-channel via a heavy Z mediator. A set of signal samples has been produced with different versions of pythia 8 for proton-proton collisions (and also electron-positron collisions for completeness) at the benchmark centre-of-mass energy of 13 TeV (1 TeV). Namely, in order to test the new implementation of the HV Module we have produced three main groups of samples as illustrated in Table 4 for different dark sector color charge N c D and dark quark flavor N f D choices.
In particular, the first type of samples have been produced with an older pythia 8.245 release [194], 11 while the second one have been generated with the new pythia 8 version 8.307 [195].
In the new pythia 8 release, it is now possible to set the masses of all 8 dark quarks and associated 64 mesons for each pseudo-scalar and vector multiplet individually. Even if this allows to consider mass split scenarios, we consider only mass degenerate dark quarks since a consistent treatment for UV to IR settings in mass split scenario is not yet available. As an outcome of this choice, flavor symmetry leads to mass degenerate pseudo-scalar and vector multiplets. However, it is still crucial to have the possibility to set all dark mesons properties individually since the lifetimes of these different states can differ according to the model and mediator. Following these necessities, compared to pythia 8.245 (8.306) release, in the newest version a more detailed handling of dark hadrons is possible with the setting HiddenValley:SeparateFlav = on. As shown in Table 4, a third sample has been added in order to test this new option. In particular, using the flavor splitting option, each of the quark and meson flavors are shown explicitly. The quark names now are q Di , with i ∈ {0, . . . , N f D }. Similarly, meson names are π Di j and ρ Di j , where i = j are the flavor-diagonal mesons, and else i > j, with j representing the anti-quark. The identity codes then are 4900i j1 for pseudo-scalars and 4900i j3 for vectors. An anti-meson comes with an overall negative sign, and here i gives the anti-quark. The data tables by default contain identical properties for all diagonal mesons in a multiplet. All nondiagonal mesons of a multiplet are also assumed to be identical and stable by default.
An advantage of the SeparateFlav=on option, is the possibility of setting masses (as well as decay modes) of spin-0,1 flavour singlets differently than the corresponding multiplets. As discussed in Sect. 4.1, the exact computation of the flavor singlet mass with respect to the flavor multiplets, especially for spin-0 states, is an open question. There are indications that spin-0 singlets tend to be heavier than their multiplet counterparts, and therefore for these states a suppression of the production rate is also expected. For this reason, the option HiddenValley:probKeepEta1 can be set in pythia 8.307 in order to specify the suppression factor for the spin-0 flavor singlets production rates. This feature has been tested in the validation procedure, but we do not report plots related to this. Fixing the color charge to N c D = 3 using the option HiddenValley:Ngauge = 3, two configurations for the number of flavors N f D = 3, 8 have been considered in this study, using the setting HiddenValley:Nflav. N f D = 3 corresponds to the smallest possible configuration with more particles than the triplet representation used in pythia 8.245 , while N f D = 8 is the maximal number of flavors implemented in pythia HV module. Choosing these two values, we thus test the extremes of the flavor configurations. The hidden valley partners F D of the SM particles (charged both under both SM and hidden valley group) are assumed to be decoupled in our case, such that they will not produce interleaved showers between the hidden sector and the SM [192,193]. While in the pythia 8.245 release only dark mesons originating from string fragmentation are implemented, in pythia 8.307 tested in this study an option to produce dark baryons has been added with the line HiddenValley:probDiquark = on. With this option, it is possible to set the probability that a string breaks by "diquark-antidiquark" production rather than quark-antiquark one. This then leads to an adjacent baryonantibaryon pair in the flavor chain. Currently only one kind of diquark is implemented, implying at most eight different Delta baryons Di if HiddenValley:SeparateFlav = on. In the validation procedure of pythia 8.307 of the HV module we have considered decoupled Delta baryons.
A minimal number of input parameters have to be specified in pythia 8 when the Hidden Valley module is called with the option HiddenValley:fragment = on. In particular, the masses of the dark hadrons have to be fixed as well as the dark sector hadronization scale D (set to 10 GeV in this study). Furthermore, the masses of pseudoscalar states are set to 6 GeV, and the masses of the vector mesons are chosen to be 25 GeV. These settings correspond to π D / D = 0.6 same as that considered in benchmarks in Sect. 4.1. The final states configuration that we chose for our study is simply a fully invisible signature where all the dark hadrons are considered to be stable. A further relevant setting which must be specified is the running of the dark sector coupling α D which can be switched on with the option HiddenValley:alphaOrder = 1.
For the purposes of this study, for efficient MC generation, we consider a simplified scenario where the Z mediator decays only to dark quarks, even if in a real physics case the non-vanishing coupling to SM-quarks contributes to the branching ratios. By default the Z mediator nominal width Z of the Z boson is set to 20 GeV and the mass m Z = 1 TeV. Figure 24 shows the invariant mass distribution for the Z boson using the dark quarks before parton shower and hadronization in the hidden sector. The distribution deviates from the Breit-Wigner showing an excess of events in the low mass tail. This effect can be explained from the factorisation theorem considering that parton distribution functions blows up for low transferred momentum fractions for the SM incoming partons. Since we are only interested in typical events where a Z boson is created to have a consistent comparison between the different samples, we choose to cut away the low mass tail requiring the generated invariant mass of the Z to be within the range [800, 1200] GeV.

Validation plots
PYTHIA8 triplet implementation In a two flavor theory there are 4 spin-0 states (and 4 spin-1 states); 1 diagonal and 2 off-diagonals, which make up the triplet, and an additional singlet. In the current pythia 8 release there are only 3 PIDs for dark pions (3 PIDs for ρ D mesons), which signify the positive and the negative off-diagonal and the diagonal dark pion. However, the singlet is still produced in pythia 8 and shares the same PID as the diagonal dark pion. With HiddenValley:SeparateFlav=off option, it is thus impossible to separate out the singlet: it is produced with the same probability as that of the diagonal dark pion. As the singlet is considered to be another diagonal dark pion in pythia 8, the ratio of diagonal to off-diagonal dark pions is 1:1 for N f D = 2. In other words, pythia 8 will create an even amount of diagonal and off-diagonal dark mesons, and hence the PID for the diagonal dark pions (ρ D mesons), 111 (113), is equally as likely as the PIDs for off-diagonal dark pions (ρ D mesons) when considered together, 211 (213) and −211 (−213). This is clearly illustrated in Fig. 25a. Similarly, in a theory with N f D = 3 there is an octet and a singlet, or 3 diagonal and 6 off-diagonal dark mesons. In the current pythia 8 release there are also only 3 PIDs for these 9 dark pions (and 3 PIDs for 9 ρ D mesons), following the same logic as for N f D = 2. The ratio of diagonal to off-diagonal dark mesons is now 1:2 so pythia creates twice as many off-diagonal dark mesons as diagonal ones. In this situation, 111 (113) is only half as likely as 211 (213) and −211 (−213) together, see Fig. 25b.
The pythia 8.307 includes individual PIDs for all the multiplets and singlets, as well as a new parameter called HiddenValley:probKeepEta1, which determines the probability to create the singlet state. This probability is set relative to the probability of producing spin-0 multiplets. The default setting is 1, but it can be set to 0 such that the singlet is not produced at all.
The handling of dark PIDs affects the expected value of r inv . In pythia 8.245 release it is not possible to turn off the production of the singlet state and so this must be taken into account in the calculation of r inv . Take as an example an N f D = 2 model with diagonal ρ D mesons (113) promptly decaying to the SM through a vector portal. Firstly, the probVec = 0.75 parameter dictates that 3/4 of the dark mesons will be ρ D mesons, of which half are diagonal. This means that 3/8 of the dark mesons will be unstable, while the remaining 3/8 off-diagonal ρ D mesons and 1/4 dark pions are stable, resulting in a ratio of stable dark mesons to all dark mesons of 5/8 or 0.625. The value of r inv can be calculated at the generator level by counting separately final-state, stable dark mesons and all dark mesons (including decayed ρ D mesons) in the event and taking the ratio of these two sums. The distribution of r inv for such a model with FV splitting turned off can be seen in Fig. 26.

PYTHIA full n-plate implementation
The validation of the new pythia HV module was performed through a phenomenological analysis of the distinct variables obtained for three different cases: HV module in pythia 8.245 and in pythia 8.307 with either HiddenValley:SeparateFlav = off or Hidden Valley:SeparateFlav = on. All dark mesons were set to be stable. The distributions of angular and kinematic variables of the different final state particles were compared for those three cases. Although the most important variables are related to the dark pion and ρ D mesons, the missing transverse energy and the produced jets were also considered for this validation study. The reconstruction of the jets is done by clustering the generator level objects obtained after parton-shower and hadronization, using the radius parameter R = 1.4. As the dark mesons are set to be stable, the jets we study in this section are therefore not a result of hadronization in the dark sector. They originate e.g. from initial state radiation and subsequent hadronization in the SM sector. As mentioned before, the validation of the pythia 8.307 HV module is executed for two specific models: N f D = 3 and N f D = 8 (with N c D = 3 for both). For simplicity, only the results from the pp analysis are shown, as the same conclusions were obtained for the e + e − study.
Changing the N f D value from 2 to 3 results in additional PIDs being produced, as can be seen by comparing the dark pions and ρ D mesons particle ID shown in Fig. 27a, b, respectively, to the ones shown in Fig. 25a Concerning the specific case of the dark pions, the corresponding multiplicity and transverse momentum can be found in Fig. 29a, b. With similar conclusions as for the dark pions, the distributions of the same variables corresponding to the ρ D mesons are shown in Fig. 30a, b. From Figs. 29a and 30a, it can be concluded that the pseudo-scalars have lower multiplicity with respect to vector mesons. The difference between the distributions for the diagonal and off-diagonal dark pions and ρ D mesons was studied. The multiplicity and the transverse momentum of the diagonal and off-diagonal dark hadrons were consistent with the previous conclusions, with an agreement between the new and old HV modules with the different HiddenValley:separateFlav options. For completeness, the distributions of the missing transverse energy and the minimum azimuthal angle between jets and missing transverse energy can also be found in Fig. 31a, b. The latter shows that the missing transverse energy is recoiling against jets, as expected in the fully invisible scenario investigated here. The use of the new HV module does not have any impact on the event kinematics, as expected. N c D = 3, N f D = 8 model Setting N f D = 8 brings a whole new set of PIDs both for dark pions and ρ D mesons, as confirmed in Fig. 32a, b. For the case with HiddenValley:separateFlav = on, additional PIDs from 311 (313) and −311 (−313) to 811 (813) and −811 (−813) are produced with a total of 64 dark pions or ρ D mesons, with the same production rate for all states in each multiplet. The multiplicity and transverse momentum of the dark pions are shown in Fig. 33a, b and similarly, for the ρ D mesons in Fig. 34a, b. In agreement with the previous model analyzed, a lower multiplicity and a softer transverse momentum of the dark hadrons are observed with the new pythia 8 HV module. The same conclusions stand when looking at diagonal and off-diagonal dark pions and ρ D mesons separately. The missing transverse energy and the minimum azimuthal angle between jets and missing transverse energy can be found in Fig. 35a, b. Once again, these distributions agree for the three cases considered.
Through this validation, we thus highlight some of the differences between pythia 8.245 and pythia 8.307 hidden valley module. We also demonstrate that switching HiddenValley:SeparateFlav = on or off does not lead to physics differences in the production rates or kinematics of the events, but allows to access additional meson PIDs whose masses and branching ratios can be manipulated according to theory predictions.

Phenomenological studies of jet substructure observables
Contributors: Cesare Cazzaniga, Florian Eble, Aran Garcia-Bellido, Nicoline Hemme, Nukulsinh Parmar In this section we exemplify the kinematic distributions resulting from benchmarks proposed in Sect. 4.1, focusing on the benchmark with D = 10 GeV and N f D = 3, and belonging to the regime m π D < m ρ D /2 for which the ρ D → π D π D decay mode is open. The mass of Z boson is set to 1 TeV. We then consider either 1, 2 or 3 diagonal pions decaying to SM particles. The π D mesons decay to cc as this is the heaviest allowed fermion pair. We simulate this signal using pythia8.307 with HiddenValley:separateFlav = off and pass it through DELHPES3 using the HL-LHC card. Jets are clustered using FastJet [196,197] using anti-kt algorithm [198]. We produce 50k events for the distributions shown in Sect. 4.3.1, and 500k events for those shown in Sect. 4.3.2.

Basic kinematic distributions
As the ρ D mesons all decay within the dark shower in this benchmark, they are not included in the calculation of r inv . Figure 36 shows the r inv parameter distribution. As expected, the 1-π D decay model has a an average r inv of 8 9 as all ρ D mesons decay to π D and only 1 of the 9 π D is unstable. The 2-π D decays has a mean of r inv 7 9 and for the 3-π D , a mean of r inv 2 3 is obtained. In Fig. 37, some basic kinematic variable distributions are compared for the 3 dark pion decay models. These generatorlevel distributions are computed with jets of radius R = 0.4 and p T > 25 GeV. The p T distribution in Fig. 37a shows that more dark pion decays result in a higher average lead jet p T , as expected when more dark particles decay to SM particles that can be detected. The E T distribution shown in Fig. 37b reveals very similar values between the 3 different models, which may seem contrary to what one would expect, i.e. more SM-decaying pions might be expected to result in lower E T ; however, while more stable dark pions truly gives higher missing or invisible energy in the system, the additional invisible particles may be evenly distributed between the two back-to-back jets and therefore not appear in the detector as additional E T . Figure 37c shows the distributions of the transverse mass, M T , of the leading and sub-leading jet and the E T . As can be seen, having more SM-decaying dark pions generally yields a Fig. 36 Comparison of r inv for 1-, 2-and 3-π D decay models higher transverse mass. As the E T remains relatively stable but the jet p T increases with the number of unstable dark pions, this results in higher M T values.

Jet substructure consistency
Experimental searches and phenomenological studies for dark showers exploit jet substructure (JSS) observables to tag jets as dark jets [72,74,91,199,200]. Comparisons of jet suCohen:2020afvbstructure variables of interest, between the former and the new Hidden Valley pythia modules, between different dark vector meson production probabilities, and between different number of unstable dark pions π D , are presented in this section. In the pythia 8 Hidden Valley module [201,202] the probability to produce a dark vector meson can be changed by setting the parameter HiddenValley:probVector. There is no precise theoretical prediction for the fraction of dark vector mesons produced after string fragmentation in the hidden sector. Assuming a mass degeneracy between vector and pseudo-scalar states, it is reasonable to fix HiddenValley:probVector = 0.75 as pseudo-scalars have 1 degree of freedom while vector mesons have 3 degrees of freedom. However, generically the ρ D mesons and dark pions are not mass degenerate, hence the production rate of pseudo-scalars is enhanced compared to mass-degenerate scenarios due to the larger phase space available for lighter states. In this specific case, a reasonable value is HiddenValley:probVector = 0.5, very much like in QCD.
For this study, generator-level jets have been clustered with the inclusive anti-kt algorithm [198], choosing a cone size R = 0.8 and a minimum p T of 200 GeV. Jets were clus- Visualization of the space of the generalized angularities λ k β . Adapted from [205] tered from all visible SM particles and jet constituents were used for computing the jet substructure. The JSS observables studied here are the generalized angularities λ κ β , the N -subjettiness [203] τ N and jet major and minor axes.
Generalized angularities are presented in Fig. 38 and are defined from the constituents i ∈ {1, . . . , N } carrying momentum fraction z i inside a jet of cone size R as: N -subjettiness τ β N are designed to count the number of subjets inside a jet. In specific, N -subjettiness is defined as: where the sum is over the jet constituents, and R β N ,i is the distance between the N th subjet and the ith constituent of the jet. τ β N measures departure from N-parton energy flow: if a jet has N subjets, τ β N −1 should be much larger than τ β N . Originally, τ β N have been introduced in order to identify hadronicallydecaying boosted objects and reject QCD background. In those studies, the angular parameter β has been fixed to 1 as done in previous studies for boosted objects discrimination [204].
The shape of the jet can be approximated by an ellipse in the η − φ plane. The major and minor axes are the two principal components of this ellipse and are defined from the following symmetric matrix M: where the sum runs over all constituents of the jet and η, φ are the differences in η and φ with respect to the jet axis. The major and minor axes are defined from the eigenvalues λ 1 and λ 2 of M as: Generalized angularities belong to the category of jet shape variables and they have been originally built to measure the quantity of radiation inside a jet in order to discriminate between jets initiated by quarks and those initiated by gluons [205,206]. Indeed, for the gluon jets the values of the generalized angularities are usually expected to be larger since gluons are expected to radiate more due to the larger color factor. In the same way, these observables have been used in analyses to discriminate between SM jets and dark jets [98]. In particular, the dark jets are expected to be wider than SM jets due to the double hadronization process and the mass splitting between the dark bound states and the SM quarks.
We first start by comparing JSS observables between the old and new pythia Hidden Valley modules. Comparison of quark-gluon discriminant variables and N -subjettiness variables are shown in Figs. 39 and 40. Some systematic differences are observed for the jet transverse momentum dispersion p T D, due to the different number and different p T spectrum of the dark mesons π D and ρ D in the new pythia Hidden Valley module. N -subjettiness are smaller with the new module when decreasing r inv and looking for high number of subjets. No large systematic difference is observed for the other substructure variables.
Next, we studied the differences in the JSS observables between two dark vector meson production fractions: 50% and 75%. Comparison of quark-gluon discriminant variables, N -subjettiness and number of constituents are shown in Figs. 41, 42 and 43. Some systematic differences are observed for all variables. It is clear that the number of constituents in jets is lower for higher vector meson fraction. Jets with large number of soft constituents are characterized by low p T D while p T D is higher for jets where just a few constituents carry most of the momentum. The fact that p T D is higher for higher vector dark meson fractions is certainly an effect of the lower number of constituents. Jet girth, axes and N -subjettiness are all smaller in the case of probVector=0.75 compared to probVector=0.5. This indicates that jets are narrower since with larger values of vector mesons fraction we observe a harder p T spectrum for the dark hadrons decaying visibly.
We then studied how the number of unstable diagonal π D mesons affects the jet substructure. Plots of quark-gluon discriminant, number of constituents and photon energy fraction for different number of unstable diagonal dark pions are provided in Fig. 44. Multiplicity is higher for lower r inv , which is expected as the multiplicity is directly related to the number of unstable dark pions. Major and minor axes as well as girth are higher for lower r inv , suggesting that the jet is wider.
In conclusion, we have noticed that the variation of the hidden sector parameters such as probVector can impact JSS distributions at generator level leading to a harder spectrum for the dark hadrons and consequently narrower jets. Notably, only two benchmark points for the vector meson fraction have been investigated, and further studies are encouraged to understand better the impact of the parameters of the hidden sector on the observable JSS distributions.

Infrared-collinear safety of JSS observables
Traditional calculations in perturbative quantum chromodynamics are based on an order-by-order expansion in the strong coupling α s . Observables that are calculable in this way are known as "safe" [207]. As it is well-known, divergences of different nature can appear in the perturbative series. For the ultraviolet divergences appearing in loop diagrams, since QCD is a renormalisable theory, such infinities can be consistently cured. Moreover, real-emission diagrams exhibit singularities in particular corners of the phase-space. More specifically, the singular contributions have to do with collinear splittings of massless partons and emissions of soft gluons off both massless and massive particles. Virtual diagrams also exhibit analogous infra-red and collinear (IRC) singularities and theorems [208][209][210] assure that such infinities cancel at each order of the perturbative series, when real and virtual corrections are added together, thus leading to physical transition probabilities that are free of IRC singularities. An observable O({ p i }) calculated from a system of particles with momenta p i is defined to be infrared safe if adding a soft particle with momentum the following relation holds: Instead, if we consider a particle p 1 splitting into 2 particles To check IRC safety of JSS observables, we computed them at different stages of the shower/hadronization going from the dark sector to the SM sector. For IRC unsafe observables, large fluctuations in the showering process are expected, while IRC safe observables should be more stable during the evolution. Therefore for collinear splittings or soft emissions happening during the parton shower, the IRC unsafe observables will tend to diverge from the original value calculated in previous stages of the showering. Due to this feature, the IRC unsafe observables if not validated on data can introduce important model dependence in analyses exploiting them in supervised classifiers . This is particularly relevant in the case of dark shower studies where the MCdata agreement for signal cannot be assessed, and therefore there is no real control on IRC unsafe observables due to the unknown details of the hidden sector (for example the dark hadronization scale D ). Specifically, given an observable O({ p i }), changes in the Hidden sector parameters such as D are expected to produce a power law scaling for IRC safe observables given the jet pt p T j : δ O sa f e ∼ ( D / p T j ) α . On the other hand, the scaling is logarithmic in the case of IRC unsafe observables: δ O unsa f e ∼ log( D / p T j ). This means that depending on the hadronization scale of the dark sector, the IRC unsafe observables can undergo large fluctuations for D p T , which means that without knowing In this study we test IRC safety of JSS observables by calculating them at 3 levels in the evolution of the shower: dark sector hadrons, SM quarks and SM hadrons. As previously mentioned, we expect the IRC safe observables to fluctuate less in the evolution. For the test we consider two generalized angularities, namely p T D which is an IRC unsafe observable, and the jet girth, which is IRC safe. The collinear unsafety of p T D is due to its dependence on the squared of the transverse momenta of the jet constituents. Therefore, taking a particle with transverse momentum p 1,T , if the particle splits into 2 particles with transverse momenta p Therefore, we expect p T D to fluctuate more during the showering compared to the jet girth. Our results for the test of IRC safety for JSS observables is presented in Fig. 45. The plots show the following ratios for the tested JSS observable: unstable dark hadrons vs SM quarks, SM quarks vs SM hadrons and unstable dark hadrons vs SM hadrons. We expect the distributions of the ratios for the collinear unsafe observable calculated at different steps of the shower to differ from unity. For a fair comparison between the same observable calculated at different stages of the showering we consider only jets with a multiplicity of SM quarks which is twice the dark hadrons one. Moreover, because the girth of jets with one constituent is a special case as girth is close to 0, we consider only jets with a number of unstable dark hadrons strictly larger than one. The main result of this study is that even if IRC unsafe observables are expected to be described quite well by the parton shower, the application of IRC unsafe observables in the context of dark shower searches should be carefully validated in control regions by comparing Monte-Carlo and data. Secondly, as the dark hadronization scale is unknown, the effect of changing D on JSS observables must be evaluated. The usage of such variables especially in Hidden Valley searches can lead to important limitations in terms of interpretability of the results due to their strong dependence on the unknowns of the Hidden sector.

Study of JSS observables after jet reconstruction in Delphes
After checking how the different parameters of the model affect the generator-level jets, we perform a similar study at reconstructed level using Delphes output. This is important to understand the impact of detector effects on the JSS observables that can be used by the experiments to tag dark jets efficiently. Delphes was configured for a CMS-like detector at the HL-LHC, and in particular Particle Flow candidates have been clustered with four different distance parameters, R = 0.4, 0.8, 1.0 and 1.2, using Fast-Jet [196,197]. Jets with larger radius help in containing more of the radiation of the dark jet. Jets are required to have at least two tracks, and |η| < 2.5 and a minimum p T for clustering of 25 GeV. Figures 46 and 47 show the difference between the samples with probVector=0.5 and 0.75 when the jets are clustered with R = 0.8. Figures 48 and 49 show the effect of varying the number of unstable diagonal pions on the JSS observables, for different distance parameters and probVector=0.5. The variables p T D and the N -subjettiness ratios show the most discrimination between the different samples.

Conclusion
Setting the IR parameters in accordance with the UV physics in general leads to a more cohesive modelling of the signal. This modelling however necessarily suffers from uncertainties due to a lack of knowledge of the precise hadronization parameters. These parameters can be varied to understand their effect on the resulting kinematic observables. In this section we have considered several jet substructure variables. We illustrated that changes in probVector can lead to changes in the observed jet substructure variables. It should be noted that this study concentrates only on one specific benchmark point and two values of probVector settings. It nevertheless shows the importance of understanding the effects of hadronization uncertainties. We also discussed the importance of Infrared and Collinear (IRC) safety when using substructure variables and in particular demonstrated that p T D is not IRC safe. Our studies thus highlight the need of a more detailed analysis of widely used jet substructure techniques in the light of dark showers phenomenology.

Improved search strategies
The wide variety of signatures coming from the dark/hidden sector scenarios considered throughout this work also motivates advanced techniques which may enable us to distinguish between signal and background at the LHC. These techniques may involve new kinematic variables, jet substructure information (as briefly discussed in Sect. 4.3), machine learning, or advanced triggering strategies. In this section, we illustrate some the avenues which have been explored in the literature, using the dark/hidden sector parametrizations presented in Sect. 2.2. It would be of great interest to also perform such analyses in the light of the new developments presented in Sect. 4.1.

Contributors: Hugues Beauchesne, Giovanni Grilli di Cortona
Semi-visible jets are a characteristic signature of many confining dark sectors and consist of jets of visible hadrons intermixed with invisible stable particles. Up to now, two main search strategies have been pursued: tagging semivisible jets (see e.g. Refs. [72][73][74]199,200,[211][212][213]) and exploiting the special relation between the azimuthal direction of the semi-visible jets and the missing transverse momentum E T (see e.g. Refs. [2,63,214]). In Ref. [215], it was shown that these two approaches can be combined to define new event-level variables that considerably increase the sensitivity of semi-visible jet searches. The central idea is that semi-visible jets are responsible for most of E T in signals and that tagging specifies which jets are semi-visible. The tagging information then predicts the direction and magnitude of E T , which can be compared to its measurement. In this section, we present a summary of Ref. [215] and refer to it for technical details. For illustration purposes, consider the following benchmark model. Assume a new confining group G. Introduce a dark quark q D that is a fundamental of G and neutral under the Standard Model gauge groups. Introduce a scalar mediator S that is an antifundamental of G and has an hypercharge of −1. These fields allow the Lagrangian where E i are the Standard Model leptons. Assume for simplicity that the only non-negligible λ i is the one corresponding to the electron. If the mediators are pair-produced, they will each decay to an electron and a dark quark. The experimental signature will then be two electrons and two semivisible jets. This is similar to the signature of leptoquark pair-production and as such preselection cuts are applied based on typical leptoquark cuts. The event is also required to contain two jets tagged as semi-visible. We focus on the tt Fig. 47 Comparison of reco-level variables between probVector=0.5 and probVector=0.75 for different number of unstable diagonal dark pions. The first row shows the minor axis, the second row shows the major axis, the third row shows the n-subjettiness ratio τ 21 , and the fourth row shows τ 32 . The plotted ratio is the ratio of probVector=0.75 to probVector=0.5  First row is the axis minor, second row is the axis major, third row is the n-subjettiness ratio τ 21 and fourth row is τ 32 background. Events are generated using MadGraph5 [216], pythia 8 [217] and Delphes 3 [218]. The Hidden Valley module of pythia is used with the following parameters: Consider the decomposition E T = a 1 p D 1 T +a 2 p D 2 T . The coefficients a 1 and a 2 should then peak at ∼ r inv /(1 − r inv ) and can be combined in a single test statistics. This could be done in multiple ways, but a simple and powerful one is to train a fully supervised neural network on the a 1 and a 2 of both the signal and the background. Alternatively, one can encode much of the same reasoning in a single variable. Define where φ p D T (φ ¡ E T ) is the azimuthal angle of p D 1 T + p D 2 T ( E T ). This quantity should peak at 0 for the signal, but unfortunately contains no information on the norm of E T . We introduce two comparisons. First, the standard procedure up to now has been to compute the minimal difference in azimuthal angle between E T and the leading jets [63] where in this case four jets are considered. Second, we consider a supervised neural network using This is only meant as a comparison, as fully supervised neural network are susceptible to simulation artefacts and sculpting.
Receiver Operating Characteristic (ROC) curves are shown in Fig. 50 for different values of r inv . As can be seen, the coefficients a 1 and a 2 typically provide the strongest results. They sometimes exceed the fully supervised neural network by exploiting information on the magnitude of the momenta which are not provided to the neural network. The coefficients outperform the standard approach of φ CLLM by an order of magnitude for a signal rejection rate of 0.5. The variable φ also generally outperforms φ CLLM .

Contributors: Elias Bernreuther
To increase the sensitivity to dark shower signals consisting of promptly decaying dark hadrons, it is crucial to reduce the large QCD background. While backgrounds from mismeasured QCD jets mimic the signal with regards to eventlevel observables, such as φ, differences are expected at the level of jet substructure. These can arise from differences in the shower evolution between QCD and the dark sector, the presence of visibly decaying heavy dark mesons in the jets, or invisible dark hadrons that are interspersed with visible particles. See e.g. Refs. [74,200] for recent studies of dark shower signals in terms of classic jet substructure variables. In contrast, advances in tagging jets with modern machine learning techniques make use of low-level properties of jet constituents. Here, we summarize the results of Ref. [73], which studies the potential of deep neural networks for identifying semi-visible jets from dark showers.
As a benchmark, dark showers of nearly mass-degenerate GeV-scale dark mesons which are produced at the LHC via a heavy Z vector mediator with mass on the TeV scale were considered. The underlying dark sector is the Aachen model summarized in Sect. 2.2.6 and motivated by cosmological and experimental constraints [61]. The dark quark production process pp → q D q D was simulated with Mad-Graph5 2.6.4 [216] using a UFO file generated with Feyn-Rules [219] and performing MLM matching with up to one additional hard jet. Showering and hadronization, both in QCD and in the hidden sector, were carried out using pythia 8.240 [46,47,220]. The settings used in pythia's Hidden Valley module for a signal with dark meson mass m D are summarized in Table 5. The parameter probVector was set to 0.5 such that 25% of dark mesons are unstable, flavor-diagonal vector mesons as predicted by the benchmark model. Jet clustering is performed by FastJet [196] using the anti-k T algorithm with jet radius R = 0.8.
A priori, it is not clear what the optimal jet representation and neural network architecture are to optimally distinguish dark shower jets from QCD jets. In Ref. [73] it was shown that a dynamic graph convolutional neural network (DGCNN) [221,222] operating on particle clouds outperforms convolutional neural networks (CNNs) based on jet images [223] and a network operating on ordered lists of Lorentz vectors [224]. While a standard CNN carries out con-  volutions over neighboring pixels in a jet image, a DGCNN performs convolutions over edges of a graph constructed from jet constituents that are neighbors in feature space. While graph networks also represent the state of the art in tagging boosted top jets [225], their advantage over a CNN or a Lorentz Layer network is considerably larger in identifying semi-visible jets. A comparison of ROC curves showing the QCD jet background rejection 1/ B as a function of the dark shower signal efficiency S for m D = 5 GeV is shown in Fig. 51.
Since the parameters of the dark sector are a priori unknown it is a crucial question how well the classification performance of the DGCNN generalizes to dark showers with different parameter values than were used for training. Varying r inv and m D , the performance continuously degrades the further the parameters of the dark showers in the test sample are from those in the training sample. While the effect is modest for r inv , it is much more substantial for the dark meson mass. For example, for a network trained with m D = 5 GeV, the background rejection rate for signal efficiencies between 0.1 and 0.3 is reduced by nearly an order of magnitude when tested on samples with m D = 20 GeV. This suggests that the network learns to reconstruct this mass from the jet constituents. Importantly, this behavior can be mitigated by training the network on mixed samples which contain jets with a range of different dark meson masses. This yields a much more general classifier as reflected in the ROC curves in Fig. 51.
Finally, it was investigated how much the sensitivity of an experimental search for dark showers can be improved by applying a DGCNN as a semi-visible jet tagger. As an example, an ATLAS search for mono-jet events with a luminosity of 36.1 fb −1 [226] was considered, which is sensitive to signal events where one of the two dark showers remains invisible and, thus, φ ≈ π . For an event to be accepted, it had to fulfil the original selection criteria of the search and contain at least one fat jet that is classified as a semi-visible jet by the network. The training sample consisted of jets from a dark shower signal with the benchmark parameters stated in Sect. 2.2.6 and from the dominant Z+jets background. The expected number of background events with and without the DGCNN tagger is shown in Table 6 for the signal region EM4, which is the region most sensitive to the signal when m Z = 1 TeV. In addition, the table compares the resulting expected 95% CL limit S 95 exp on the number of signal events in the region with and without the tagger and shows the corresponding improvement of the projected limit on the dark quark production cross section. In the benchmark scenario shown in Table 6 a DGCNN for tagging semi-visible jets can improve the sensitivity of the search to dark showers by more than one order of magnitude.  Table 6 Number of background events B with systematic uncertainty in the signal region EM4 of the search with and without the dark shower tagger and corresponding expected 95% CL limit S 95 exp on the number of signal events. In addition, the improvement in the limit on the dark quark production cross section for the benchmark scenario described in the main text is shown relative to the search without a tagger.

Contributors: Annapaola de Cosa, Jeremi Niedziela, Kevin Pedro
Semi-visible jets arise from Hidden Valley models of dark matter, which include strong interaction in the dark sector. They constitute a challenging experimental signature in which a fraction of jet constituents is invisible to the detector, leading to missing transverse energy E T being aligned with the jet.
The details of the kinematics are mainly affected by the following theory parameters: m Z (the mass of the mediator), m D (the mass of the dark hadrons) and r inv (the fraction of stable, invisible dark hadrons). However, a large total number of unknown theory parameters leads to a vast model space with a huge number of possible scenarios that can easily evade any constraints from e.g. cosmological measurements. Since it is impractical to perform dedicated searches for all possible model variations, we propose to use autoencoders (AE) as anomalous jets taggers instead [199].
The autoencoder-based anomaly detection strategy is robust against both detector effects and details of the model implementation. AEs are designed to detect objects significantly different from the training sample, without prior knowledge of signal characteristics. For reference, the AE introduced here is compared to a Boosted Decision Tree (BDT) trained on the QCD background and a mixture of different signals. For completeness, we have also studied alternative anomaly detection techniques, namely Variational Autoencoders (VAE) and Principal Component Analysis (PCA).
All architectures mentioned above were trained on highlevel properties of jets: η and φ coordinates and invariant mass m j , as well as jet substructure variables: jet p T dispersion p T D, jet ellipse minor and major axes, EFP 1 , and ECF ratios: C 2 and D 2 . We have also considered including fourmomenta of jet constituents in the training, but they were ultimately discarded since no improvement was observed.
The performance of different approaches is quantified by comparing the area under the ROC (receiver operator characteristic) curve (AUC), shown in Fig. 52. It was demonstrated that an AE-based jet tagger can provide satisfactory performance, compared with the fully supervised BDT approach. The PCA proved to be less efficient than other approaches. The VAE was found give the best results when trained exclusively on reconstruction loss, leading its variance to collapse to zero and therefore becoming equivalent to a regular AE. Robustness against unknown model parameters was also assessed. As shown in the rightmost panel of Fig. 52 and in Fig. 53, in certain cases the AE can outperform the BDT when the latter was trained on an incorrect signal hypothesis. Another interesting observation that can be made in the right panel of Fig. 52 is that a BDT trained on r inv = 0.3 and tested  Ref. [199] on r inv = 0.7 performs better then the one trained on a mixture of different signals (left panel). This is caused by the fact that the low r inv signal is more similar to the background, and therefore the BDT has to learn how to distinguish between the two more precisely. This results in a performance boost when tested on an easier case of large r inv .

Searching for hadronic SUEP
The theoretical motivation and experimental phenomenology of SUEPs are described in Sect. 3.1. Strategies to overcome the experimental challenges of searches for SUEP at the LHC are still being developed. For the nightmare scenario of prompt, hadronically decaying SUEP, a search strategy was proposed in [159], employing an autoencoder neural network as an anomaly detector.
An autoencoder is an unsupervised neural network trained on background events, which attempts to minimize the difference between its output and input. Ideally the autoencoder learns to do this efficiently only for inputs that are similar to its training data, so that when evaluated on an event from outside the background distribution, a high reconstruction error flags the event as anomalous. In the case of a search for SUEP, the background events are soft, highly isotropic QCD events. The unsupervised nature of this analysis avoids the model dependence that comes from using signal simulation to develop a classifier.

Signal generation
While the use of unsupervised machine learning techniques removes the need for signal events in the training dataset, a simulated signal dataset is still necessary to evaluate the autoencoder's performance as an anomaly detector. For this purpose, SUEP events were generated using a statistical toy model of the dark shower.
The highly isotropic hadronic SUEP toy model simulated events were generated beginning with the production of Higgs bosons in association with a W or Z boson, simulated at center-of-mass energy 14 TeV in pythia 8 [220]. The vector boson was then required to decay leptonically. The hard lepton(s) from the vector boson decay were used to sidestep the issue of how to trigger on SUEP for this analysis. The decay of the Higgs to a shower of dark mesons was performed with the SUEP_Generator plugin in pythia 8.243, which models the dark shower as being a completely isotropic cloud with Boltzmann-distributed momenta as was presented in Eq. 10, and for which the parameter T D controls the energy distribution of the dark mesons, and represents the Hagedorn temperature of the dark sector. Only one flavor of dark meson is assumed, with mass m D . Each dark meson was then forced to decay hadronically to a uu quark pair. From this point the parton showering and hadronization were performed by pythia as normal. Signal simulation was generated for m D from 0.4 to 8 GeV, and for T D /m D from 0.25 to 4. Detector simulation was performed with Delphes 3 with CMS detector settings [218]. Due to the difficulty of disentangling the highly diffuse energy depositions of SUEP from pile-up, only charged track information was used for the analysis.

Background generation
The simulated QCD background events necessary for the training and test datasets were created by generating dijet plus lepton(s) events with a reduced jet p T threshold of 15 GeV in MadGraph5_aMC@NLO 2.6.6 with hadronization by pythia 8 and detector simulation by Delphes 3 [216,227].

Analysis
A trigger-level selection was applied to all simulated events requiring at least one charged lepton with p T > 40 GeV, or two opposite-charged leptons with p T > 30 (20) GeV, as well as hadronic H T > 30 GeV. A further set of pre-selection cuts were then applied. Before feeding events into the autoencoder as training data, 98% of the initial simulated background events were discarded by cutting on three highlevel observables that encode the essential features of SUEP.
First, the multiplicity of charged tracks was required to be N charged ≥ 70. Second, the event ring isotropy variable introduced in [146], measuring the Wasserstein distance between a given event and a uniformly isotropic distribution of energy, was required to be I < 0.07. Finally, the interparticle R i j distance averaged over all pairs of tracks in the event was required to be R > 3. Signal efficiency of these cuts varied from 1−30% with m D and T D .
A fully connected autoencoder with five layers was trained using QCD background events that passed the pre-selection cuts as training data. Each event was represented using a modified inter-particle distance matrix R i j of the 70 highestp T charged tracks in the event.
A modified mean-squared-error loss quantified the reconstruction error for each event.

Results
After training on background events, the autoencoder was fed test data including both background events and signal events across the range of simulated m D and T D points. Using the reconstruction loss as an anomaly score, ROC curves were constructed for each parameter point. To estimate the physical sensitivity of the model, the minimal excludable branching ratio of Higgs to SUEP for which S/ B + u sys B 2 > 2 was computed. Statistical uncertainties due to the limited size of the simulated background sample became dominant as the cut threshold was increased before the classifier's performance began to deteriorate, indicating that the sensitivity of a real search using this method could be even higher than we report here.
As Fig. 54 illustrates, this autoencoder-based analysis could exclude Higgs branching ratios to SUEP down to 1% for dark meson mass m D < 1 GeV and T D /m D < 1. If the dark shower temperature T D is < 0.5m D , branching ratios down to 5% could be probed for m D up to ≈ 8 GeV.
Using a neural network architecture and event representation tailored to the essential characteristics of the SUEP signature, but without relying on the details of any signal simulation model, this study demonstrates that even the maximally challenging scenario of entirely prompt and hadronic SUEP can be probed at the HL-LHC.

Contributors: Daniel Stolarski
The original Emerging Jets (EJ) theory paper [1] as well as the CMS search [88] (see also Sect. 2.1.2 of this white paper) considers a model with a colored mediator that is pair produced at the LHC, which leads to a final state with two QCD jets and two EJs. Those works also consider mediator masses in the regime of m 600 GeV. Given those assumptions, the vast majority of EJ events have substantial H T and thus the trigger efficiency is very high. In this section we consider relaxing both of the above assumptions and explore how one can still trigger on Emerging Jets.
In [228], an s-channel mediator was considered (see also Sect. 2.1.1 of this report), focusing on a Z that couples to the quark current in the SM and the dark quark current. Such a mediator produces events that typically do not include additional hard jets. That work also considered the possibility of relatively light Z down to masses of ∼ 50 GeV. The typical H T of such events, particularly in the light Z regime, is considerably lower than typical trigger thresholds at the LHC experiments, and other techniques are needed to increase the trigger efficiency.
The events were generated using a modified spin-1 mediator model 12 [63] implemented using the FeynRules [219] package. The hard process is generated with Madgraph 5_aMC@NLO [216] using a centre of mass energy of 13 TeV. This output is interfaced to the Hidden Valley [46,47] module of pythia 8 [220], which simulates showering and hadronization in the dark sector as well as decays of dark hadrons to either other dark hadrons or to SM states. The Z mass is varied and a Z width of Z = m Z /100 is used. 12 https://github.com/smsharma/SemivisibleJets. The remaining dark sector parameters are varied across a few benchmark models shown in Table I of [228].
Initial state radiation (ISR) in QCD or EW is included at leading order in the hard processes. The resulting hadrons are clustered into jets using the Anti-k t algorithm [229] implemented in FASTJET [196] with a jet angular parameter R = 0.4 and a maximum pseudorapidity of |η| < 2.49 to be compatible with the ATLAS inner tracker. MLM matching and merging procedure [230] is employed for extra QCD radiation with XQCut of m Z /10. A crude detector volume cut is implemented at the pythia 8 stage for which particles that are outside of a cylinder of (r = 3000 mm, z = 3000 mm) are considered stable.
Two main strategies are explored to increase the trigger efficiency. The first is exploiting the possibility of SM radiation from the initial state. While electroweak (W/Z/γ ) radiation was explored, the most effective strategy was to use additional QCD radiation. This radiation can increase the trigger efficiency in two complimentary ways. First, the additional hard jet(s) can be used to trigger on directly. Second, the emerging jets tend to be boosted and carry more energy. This in turn will increase the H T ( E T ) if the dark pion states are short (long) lived.
Using ATLAS trigger thresholds from [231], we estimate the improvement in rate achieved by including radiation and the results are shown in Fig. 55. In addition to increasing the trigger efficiency, events with extra radiation have reduced rates, therefore Fig. 55 shows the ratio of the cross section times trigger efficiency for events with radiation to those without. We see that the largest improvement is additional radiation of two extra jets (green line). The left panel is a benchmark with a dark pion lifetime of 150 mm (Model A) and uses the missing energy trigger. We see that for a light Z , more than an order of magnitude improvement in rate is possible. The right side is a model with a dark pion lifetime of 5 mm (Model B) and uses the H T trigger. In that benchmark, the efficiency of the leading order process is below what was simulated for m Z 350 GeV, and the improvement is potentially even larger.
The first method considered above uses existing triggers, but [228] also considers implementing new triggers using modern machine learning techniques. As ISR is no longer relevant, pythia 8's hidden valley production process f f → Z processes is employed to generate events. Regardless of the lifetime of the dark pions, the detector subsystem with the largest number of decays is the inner tracker. Therefore, the strategy employed (which is also similar to that of [232] proposed for b-tagging), is to use the tracker information but not reconstruct tracks. Rather [228] proposes to use hit patterns in different layers of the tracker as an input to a support vector machine 13 (SVM) from the TMVA toolkit [233]. A proper detector simulation of the inner tracker is outside of the scope, but a crude detector simulation with code used in [9] which encompasses the ATLAS tracker from the Inner B-layer (IBL) to the Transition Radiation Tracker (TRT). This detector simulation assumes simple models of energy loss through each thin layer of the detector. When proposing new triggers, backgrounds must also be considered, and the main background for this strategy is bb jets as they have a very large rate and also produce displaced hadrons. Simulations are performed using gg → bb with pythia 8's heavy flavor hard bb processes. The inclusive background cross section is taken from the pythia 8. Pileup is added to both signal and background events with pythia 8's minimum bias events. For each signal or background event, a number of minimum bias events are added randomly sampled from a poisson distribution with mean of μ = 50, mimicking the Run 2 conditions. The left panel of Fig. 56 shows the ability of the SVM to distinguish signal in blue from the dominant bb background. On the right panel we show the ROC curve as a function of lifetime. A background rejection of ∼ 10 −2 −10 −3 is needed for a novel high level trigger, and we see that efficiencies of O(10%) are achievable, with larger efficiencies at lower dark pions lifetimes. It is also found that using an SVM trained on one signal benchmark can also give good acceptance for other signal benchmarks, showing great promise for such a new trigger.

Summary and perspectives
In this report, we have summarised the work performed in the context of the Dark Shower Snowmass project: it is the first comprehensive effort to gather the large, pre-existing theoretical, phenomenological and experimental communities working in this field, following initial discussions in the LHC Long-lived Particles Working Group [234] and also some presentations in the LHC Dark Matter Working Group. This report also concretely describes pathways for a systematic exploration of strongly interacting theories. In this context, we mainly concentrated on QCD-like scenarios leading to jetty signatures at the LHC, but we also discussed signatures such as SUEPs and glueballs which are typically associated with non-QCD like theories.
QCD-like scenarios, which are the main focus of this report, are inherently non-trivial to analyse due their nonperturbative nature. In such theories, confinement in the IR leads to bound states whose masses and interactions are governed by the UV dynamics. While the SM QCD has been analysed in great detail in terms of UV versus IR parametrizations, little is known for arbitrary gauge groups and flavor contents. Nevertheless, due to the interesting new signatures the strongly interacting scenarios could produce at the LHC, their phenomenology is being actively explored.
In this context, we began this report (see Sect. 2) with a review of the existing efforts and phenomenological parametrizations of QCD-like scenarios. We qualitatively illustrated the phenomenological differences obtained for various mediator mechanisms, giving rise to exotic LHC signatures such as emerging or semi-visible jets. We also discussed some existing experimental results constraining these models and ongoing efforts to search for these signatures.
If the dark sector is instead non QCD-like, other classes of spectacular signatures can be obtained in terms of SUEPs or glueballs. These were discussed Sect. 3, in which original phenomenological SUEP studies were presented, along with recent preliminary simulation tools for these scenarios.
After this overview of existing efforts and of the signature landscape, the report also addressed in Sect. 4 possible pathways for consistent theory frameworks, especially concentrating on semi-visible jets. In that section, lattice calculations, chiral perturbation theory and an analysis of symmetry breaking due to SM-DS portals were combined to exemplify avenues in theoretical model building. Improvements to the pythia 8 Hidden Valley module, made in the context of this Dark Shower Snowmass project, were also presented along with their validation. Combining the theory developments with the new Hidden Valley module, we then illustrated their impact on the phenomenology of semi-visible jets.
In the final section of the report, Sect. 5, we discussed some proposed improvements to LHC search strategies. These include efforts using machine learning, trigger considerations and the definition of new event level variables.
Strongly-interacting dark sectors are an exciting class of scenarios in which a vibrant community of theorists, phenomenologists and experimentalists is being invested. They could lead to spectacular signatures which have not yet been systematically explored at the LHC. In view of the large phenomenological interest of such theories, a more concentrated effort in theoretical work is needed, covering model building and classification of associated LHC signatures, a deeper understanding of hadronization physics, as well as studies of cross correlation with open problems of the SM such as the nature of dark matter. It is clear from this report that such a work involves communication among experts in SM QCD, lattice, and collider physics as well as in dark matter. We hope that our report lays down the foundations for such a wider exchange, and that this may help devising better strategies that could ultimately lead to a breakthrough in finding signals of strongly-interacting theories.
the German Excellence Strategy (Project ID 39083149), and by grant 05H18UMCA1 of the German Federal Ministry for Education and Research (BMBF).

Data Availability Statement
This manuscript has associated data in a data repository. [Authors' comment: Part of the data and code for results in this article are available at https://urldefense.com/v3/ __ https://github.com/dark-showers-snowmass21__;!!NLFGqXoFfo8 MMQ!ple2r7fIW2CA0x0KS9MkLh0kK7DFVNMv_YZDJhldS3Skd vvFakJ3vKU0iJYhmcvShTzkSjwQQFv42GEqm6e4pJ7 lOKHgPLDiaiEczQc$, while other parts are available upon request.] Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Funded by SCOAP 3 . SCOAP 3 supports the goals of the International Year of Basic Sciences for Sustainable Development.