ALPs effective field theory and collider signatures

We study the leading effective interactions between the Standard Model fields and a generic singlet CP-odd (pseudo-) Goldstone boson. Two possible frameworks for electroweak symmetry breaking are considered: linear and non-linear. For the latter case, the basis of leading effective operators is determined and compared with that for the linear expansion. Associated phenomenological signals at colliders are explored for both scenarios, deriving new bounds and analyzing future prospects, including LHC and High Luminosity LHC sensitivities. Mono-Z, mono-W, W-photon plus missing energy and on-shell top final states are most promising signals expected in both frameworks. In addition, non-standard Higgs decays and mono-Higgs signatures are especially prominent and expected to be dominant in non-linear realisations.


Introduction
The Higgs discovery has set spin zero particles in the spotlight of searches for beyond the Standard Model (BSM) physics. This may have been the first incursion into new territory: scalar and pseudoscalar particles -elementary or not -as heralds of new physics.
Extra spin zero particles are in fact proposed by candidate solutions to major and pressing problems in particle physics.
For instance, an option to explain the nature of dark matter (DM) is a new scalar particle within a Z 2 invariant setup. A different and outstanding example is the strong CP problem of QCD, for which the paradigmatic solution relies on an anomalous global U (1) symmetry which is spontaneously broken; the associated (pseudo-) Nambu-Goldstone boson, the axion, is in addition an optimal candidate to explain the DM of the universe. The original formulation, the so-called PQWW axion [1][2][3], is now disfavoured by data, while other popular constructions that deal with the so-called invisible axion, such as the DFSZ [4,5] and the KSVZ [6,7] models, are still viable solutions to the strong CP problem. The magnitude of the couplings of axions to ordinary matter is inversely proportional to the scale of U (1) spontaneous symmetry breaking, which is much higher than the electroweak scale in the latter, "invisible", constructions. Lower axion scales are considered though in other implementations of the Peccei-Quinn solution to the strong CP problem [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22].
Many other extensions of the Standard Model of Particle Physics (SM) feature one or several spontaneously broken global U(1) symmetries, thus predicting the existence of massless Nambu-Goldstone excitations whose couplings need not abide by the same stringent constraints of the original QCD axion: axion-like particles (ALPs). ALPs, if they get a small mass due to non-perturbative effects or other explicit symmetry breaking mechanism, are also good DM candidates and/or may affect the thermal evolution of the universe. The impact of ALPs, at both high and low energies, depends on their nature and on the type and strength of their couplings. In practice, the relevant generic characteristic of Nambu-Goldstone bosons is that they only enjoy derivative couplings, because of the underlying shift symmetry.
The ultimate nature of the Higgs particle itself is still at stake. Is this scalar elementary or composite? Should we accept the uncomfortable fine-tuning associated to the electroweak hierarchy problem as a feature of Nature or is there some dynamic explanation for it? Many of the efforts made in this direction are based on the search for symmetries which would justify a low Higgs mass. Two major frameworks are being considered: either linear realisations of electroweak symmetry breaking (EWSB), typical of weakly coupled new physics, as for instance in many supersymmetric models, or non-linear ones such as those in the so-called "composite Higgs models" and other constructions involving new strongly interacting physics. The model-independent way of formulating the ultimate exploration of the Higgs nature in low-energy data is provided by the use of effective Lagrangians: a "linear" expansion [23,24] (often called SMEFT), in terms of towers of gauge invariant operators built out of SM fields and ordered by their mass dimension, is used when assuming linear realisations of EWSB, while "non-linear" expansions [25][26][27][28][29][30][31] -sometimes called "chiral" or HEFT -are the optimal instrument to treat regimes which are not necessarily weakly interacting. The non-linear formulation has the disadvantage of depending on a larger number of free parameters, while it has the advantage of being more general; in particular, it reduces to the SM Lagrangian in a particular limit. 1 The non-linear expansion does not presuppose that the Higgs particle at low energies belongs to an electroweak doublet, a crucial question to be explored in the years to come. This paper explores the physics of an extra singlet scalar which is a CP-odd (pseudo-) Nambu-Goldstone boson. We will formulate in all generality its leading CP-invariant effective couplings to SM fields, which must be purely derivative couplings when its mass is neglected. This first -theoretical -part is general by nature and holds for ALP scales larger than the electroweak one (in the EWSB non-linear case also larger than its implicit BSM scale). While the dominant ALP interactions in the linear -SMEFT -expansion have been formulated long ago [33], the analogous analysis for the non-linear regime is missing and will be developed here. We will first concentrate on determining a complete basis of CP-even bosonic operators containing one ALP insertion; nevertheless, the fermionic operators are also derived in this paper, building a complete and non-redundant chiral set. The relation and differences between the dominant operators in both expansions -linear and chiral -will be subsequently discussed. It is interesting to note that all results to be obtained below apply as well to a different case: the complete basis of CP-odd derivative couplings of an hypothetical CP-even scalar (see also Ref. [34] for a generic CP-even scalar).
Up to now, most phenomenological ALPs analyses concentrated on their couplings to photons, gluons and/or quarks, as they dominate at low energies and determine astrophysical and cosmological constraints for very light ALPs. Nevertheless, ALPs may well show up first at colliders [35][36][37] or in rare mesonic decays [38,39], and the SU (2) L × U (1) Y invariant formulation of their interactions developed here provides new beautiful channels involving the electroweak gauge bosons and the Higgs particle. In the second -phenomenological -part of this work, the foreseen impact of those couplings at colliders and in particular at LHC will be analyzed for the first time, identifying the new signals and performing a detailed analysis of experimental bounds and prospects. 2 Unlike for the theoretical results, this search for (pseudo-) Nambu-Goldstone bosons at colliders implicitly 1 The exception is the case in which the scalar manifold of the SM does not have a fixed point, as it would not admit then a linear representation for the Higgs; see Ref. [32]. 2 The set of Feynman rules stemming from the bosonic ALP effective Lagrangian can be found in Appendix B. assumes an overall ALP scale not far from the electroweak one, e.g. O(TeV), for observability.
The structure of the paper can easily be inferred from the Table of Contents.

The ALP linear Lagrangian
In linear realisations of EWSB with only SM fields at low energies, the leading-order (LO) effective Lagrangian is simply the SM one, where˜ = iσ 2 * and Y D , Y U and Y E are 3 × 3 matrices in flavour space which encode the Yukawa couplings for down quarks, up quarks and charged leptons, respectively. Consider now an additional particle, singlet under the SM charges, which is a (pseudo-) Nambu-Goldstone boson of a spontaneously broken symmetry at energies higher than the electroweak scale v (set by the W mass). Neglecting its mass, its couplings would be pure derivative ones because of the underlying shift symmetry. Denoting by f a the scale associated to the physics of this ALP particle a, insertions of the latter in effective operators will be weighted down by powers of a/ f a . Focusing on interactions involving only one ALP, the next-to-leading-order (NLO) effective linear ALP Lagrangian has been determined long ago [33].
In this paper we mostly focus on the bosonic operators involving a, determining a complete and non-redundant set. For linear EWSB realisations the most general linear bosonic Lagrangian, including only the NLO corrections involving a, is given by where now the leading-order Lagrangian is the SM one plus the ALP kinetic term, while the NLO bosonic corrections are given by with andX μν ≡ 1 2 μνρσ X ρσ . The action of the shift symmetry on the first three operators, a → a + α, with α constant, yields and thus the corresponding associated current is anomalous as δL = α f a ∂ μ K μ X . Even if this correction is a total derivative, in the case of AG the existence of instantonic configurations in the QCD Lagrangian implies that the action is modified because the integral of ∂ μ K μ G does not vanish (although a discrete version of the shift symmetry is preserved); it is nevertheless often added to the Lagrangian given its relevance for the case of the true QCD axion and the solution of the strong CP problem. 3 After electroweak symmetry breaking, O a induces a two-point function contribution, tantamount to a acting as an additional contribution to the longitudinal component of the electroweak gauge fields. An easy way of determining its impact on observables is to trade it for a fermionic vertex [33], either chirality conserving or chirality flipping, or a combination of them. For instance, the Higgs field redefinition → e ic a/ f a (10) applied to the bosonic Lagrangian Eq. (3), induces a correction stemming from the Higgs kinetic energy term (see Eq. (1)) which cancels exactly O a up to O(a/ f a ), while the Yukawa terms in that equation induce a new Yukawaaxion coupling for which O a can be entirely traded (see Appendix D for details and a general discussion of possible field redefinitions). The overall effect is thus the replacement in Eq. (4) where which exhibits a relative minus sign between the Yukawa-ALP type of interaction for up and down fermions. This coupling can then be written in a more compact way as O ψ a = i a f a ψ=Q, L ψ L Y ψ σ 3 ψ R + h.c., (13) where Q R ≡ {u R , d R } (L R ≡ {0, e R }) -with σ 3 acting on weak isospin space -and where the block matrices Y ψ and are defined by Alternatively, using the equations of motion of L LO , O a could be entirely traded by a flavour-blind and chiralityconserving fermionic operator, ∂ μ a f a ψ=Q, L ψ γ μ γ 5 σ 3 ψ + h.c., (15) where again terms with more than one axion insertion have been neglected. In this work we choose to use the chiralityflipping version of the fermionic couplings, though. In summary, the expression for δL bosonic a to be used below reads with O ψ a as defined in Eq. (13). For completion, it is worth mentioning that when the complete NLO Lagrangian is considered in the linear case, additional fermionic operators are present. In fact the most general NLO ALP Lagrangian is given by [33,40,41] δL total a = cW AW + cBAB + cGAG where X are 3 × 3 hermitian matrices in flavour space. The chirality-conserving operator in the last term of this equation could alternatively be traded using the equations of motion (EoM) by a chirality-flipping coupling: ∂ μ a f a ψ=Q L , Q R , In this equation, the products X ψ L Y ψ and Y ψ X ψ R are completely generic matrices and in consequence, in the complete linear basis, operators of the type aψ L ψ R are not Yukawa suppressed. Note as well that it would be redundant to consider simultaneously a bosonic coupling such as O a in Eq. (8) and the general fermionic couplings in Eq. (17) or (18), as the effects of the former are already included in the flavour-blind components of the X ψ matrices; see Eq. (15).
In this paper, we concentrate on the thorough exploration of observables induced by the purely bosonic ALP couplings as expressed in Eq. (16), for the case of linear EWSB realisations.
Coupling to photons Both AB and AW -Eqs. (5) and (6) contribute to the interaction of the ALP with two photons, where F μν denotes the electromagnetic field strength, and the dimensionful coupling g aγ γ is given by where c θ (s θ ) denotes the cosinus (sinus) of the Weinberg angle. Bounds on g aγ γ can be inferred as a function of the ALP mass m a from various astrophysical constraints and low-energy data, which rely only on an hypothetical ALP-photon coupling and not on fermion-ALP interactions, as discussed e.g. in Ref. [35]. They enforce the combination |c 2 θ cB + s 2 θ cW | to cancel to one part in 10 3 (10 8 ) for m a = 1 MeV (keV). Indeed, for m a 1 MeV the best present constraint is set by beam dump experiments, g aγ γ 10 −5 GeV −1 [35,43], that is, for m a ≤ 1 MeV.
For substantially lower masses astrophysical constraints may apply, e.g. for m a = 1 keV the combination of helioseismology, solar neutrino data observations [44] and horizontal branch stars data [45][46][47] results in g aγ γ 10 −10 GeV −1 , that is, These strong constraints on g aγ γ could suggest that each of the two coefficients involved, cB and cW , may be individually subject to bounds of the same order of magnitude. Nevertheless, often symmetry reasons force a given theory to produce couplings to photons much suppressed with respect to Z couplings. In any case, from the point of view of effective theory they are two independent degrees of freedom: the combination orthogonal to that in Eq. (20) should be probed and bounded independently. In practice, in most of the phenomenological analysis to be developed in this work the constraint cB = −t 2 θ cW (23) will be systematically enforced.
Coupling to gluons In turn, the effective ALP-gluon g agg coupling is analogously defined by where G μν denotes the QCD field strength. It receives contributions from the NLO effective operator AG in Eq. (7), where can be directly constrained at energies above the QCD scale QC D via axion-pion mixing effects, and also via mono-jet searches at hadron colliders.
Bounds on Br(K + → π + + nothing) [54] can be used to constrain the process K + → π + π 0 (π 0 → a), where the pion-axion mixing arises through the anomalous coupling of mesons and of the axion to gluons [40,55]. These bounds have been used to constrain f a in contexts where the coupling of the ALP to gluons is only present due to the anomaly, i.e. where L ⊃ α s 8π a f a GG (see, for example, Ref. [19]). They can be reinterpreted in terms of the generic ALP-gluon coupling, Eq. (24), yielding g agg 1.1 × 10 −5 GeV −1 (90% CL) for m a 60 MeV. (26) Slightly higher ALP masses have been considered at colliders, assuming only the coupling in Eq. (24). Limits of order g agg 10 −4 GeV −1 (95% CL) for m a 0.1 GeV, (27) were obtained [35] by recasting 8 TeV LHC analyses [48,49].
Coupling to fermions Interesting bounds on ALP-fermion interactions can be obtained from several set of experimental data. For instance, considering those stemming from the purely bosonic operator O a -see Eq. (13) -or, in other words, the flavour-diagonal couplings in the last operator in Eq. (17) as expressed in Eq. (18) their contribution to the effective Lagrangian in the fermion mass basis reads where m diag ψ is the fermion mass matrix resulting from diagonalizing the product vY ψ / √ 2. The severity of the constraints on g aψ depends on the ALP mass range considered. The least constrained is the high-mass region, tested through rare meson decays and in DM direct detection searches (the latter being very model-dependent) [50]. The former provide bounds on ALP-fermion couplings below 10 GeV and in particular Beam Dump experiments (CHARM) constraints read [38,56]: Lighter ALPs have been tested in axion searches in Xenon100 [52] through the axio-electric effect in liquid xenon (analogue of the photo-electric process with the absorption of an axion instead of a photon), bounding ALP couplings to electrons: Finally, the strongest bounds apply to very low ALP masses. They are inferred from high-precision photometry of the red giant branch of the colour-magnitude diagram for globular clusters [53]. Measurements of axionic recombination and de-excitation, Compton scattering and axion-bremsstrahlung set very strong bounds again on the coupling to electrons: The above set of fermionic bounds could suggest to infer new limits on the coefficient of the linear bosonic operator O a of the bosonic linear ALP basis, Eq. (8), if considered by itself, via the equivalence discussed in Eqs. (10)- (13). This bound would depend on the ALP mass, and would be conservatively summarised in except for ALPs with masses in the 1 keV-1 MeV range, where the bounds from rare meson decay and DM searches are much weaker. Nevertheless, more than one effective operator can contribute to the rare processes under discussion and, in consequence, strictly speaking a bound can only be set on the corresponding combination of operators, see further below, in the same spirit that the bounds on aγ γ decay do not nullify simultaneously the two couplings in the set {aW , aB}, but only a combination of them; see Eq. (23). For the time being, the value of c a will be thus left free for further exploration below. Axion-like particles are also appealing DM candidates and further bounds apply if such a hypothesis is considered. Indeed, heavy ALPs (in the GeV-TeV mass range) have largely been searched for at colliders as weakly interacting massive particles (WIMPs). However, the phenomenological analysis in this work will focus on a low-mass region with ALP masses below the MeV range; DM candidates in this range are known as weakly interacting slim particles (WISPs) and could be produced non-thermally through the misalignment mechanism [57][58][59][60][61]. ALP DM candidates capable of generating the correct relic abundance call for a large enough initial field value. Because of their (pseudo-) Nambu-Goldstone nature, these ALPs are the phase of a complex field and thus have field values limited to −π f a < a(x) < π f a , implying that standard ALP CDM (cold DM) producing the correct relic density would require large ALP scales [62]: f a 3.2 · 10 10 GeV(m 0 /eV) 1/4 (smaller scale values cannot explain the totality of the relic abundance). In what follows, ALPs will not be required to account for the DM of the universe.
Coupling to massive vector bosons In contrast to the present constraints discussed above, the couplings of ALPs to the heavy SM bosons have been largely disregarded although they appear at NLO of the linear expansion, that is, at the same order as the pure photonic, gluonic and fermionic ALP couplings.
The associated signals stemming from the linear δL bosonic a in Eq. (16) are illustrated in the column on the right hand side of the Feynman rules detailed in Appendix B; they include in particular interaction vertices of the ALP with electroweak gauge bosons such as aγ Z , a Z Z, aW + W − , aγ W + W − and a Z W + W − . Besides the collider signatures that will be presented in the phenomenological sections of this paper, rare decays provide an additional handle on the ALP couplings to massive vector bosons. Consider the ALP-W + W − interaction defined by which may induce flavour-changing rare meson decays via W exchange at one loop, and an ALP radiated from the W boson. AW contributes to such processes, with g aW W = 4cW / f a . Upon considering the action of AW by itself, the same coupling induces the subsequent ALP decay into two photons. NA48/2, NA62 and Beam Dump experiments have been analysed in this context in Ref. [39], which extends to higher ALP masses the bounds in Eqs. (21) and (22) of Ref. [35], indicating a constraint 4 f a /cW 4 − 8000 TeV, for m a < 500 MeV.
Other limits have been obtained from the bounds on rare meson decays into invisible products, B → K + a and K → π + a with a → inv.. This is nevertheless at the 4 We are indebted to Brian Shuve for stressing the impact of present rare-decay bounds for light axions. price of assuming, in addition to AW , the existence of some supplementary ALP decay channel into invisible sectors that furthermore is required to be largely dominant [39].
The bounds just discussed are precious and in particular the approach of having started considering just one operator at a time is a valid one. Nevertheless, with the discussed level of accuracy for cW when considered just by itself, it may be pertinent to take into account the possible competing action of other specific ALP-SM couplings in the EFT, for instance those where the ALP would not be attached to the W boson but to the intermediate fermion in the loop. These stem from the fermionic couplings -in particular the top quark coupling -induced by the bosonic operator O a in Eq. (16), or from other ALP-fermion interactions such as the generic ones in Eq. (17) for the linear case. Indeed, like the analysis that lead to Eq. (23), from the point of view of the effective field theory in the linear EWSB framework, only combinations of the couplings in the set can be strictly bound by such data, where c ψ i refers to the coefficients of the fermionic couplings in the complete NLO linear ALP Lagrangian Eq. (17) which are not tantamount to c a via EoM. In this paper we will explore the complementary information that the LHC can provide in various tree-level channels, e.g. mono-W , which are insensitive to the presence of the operator coefficient c a but share with the rare-decay analyses the dependence on the linear operator coefficient cW . This complementarity is also manifest as the LHC has access to a larger kinematic range. Hence the breakdown of the ALP Effective Theory, and possible discovery of new physics, may be possible at the LHC but be hidden in physics at B-factories. For these reasons, in the phenomenological sections we will obtain LHC bounds on operators involved in tree-level ALP-W couplings (among others) and without the prejudice from rare decays. The combined impact at LHC of cW and c a plus general ALP-fermion couplings, as well as the impact of non-linear operators on rare decays is a subject for future work.

The bosonic chiral ALP Lagrangian
This section explores the leading effective couplings between an ALP and the SM fields, in the general framework of a non-linear (often referred to as chiral or HEFT) realisation of EWSB. The complete set of LO and NLO bosonic CP-even couplings involving one ALP will be determined (again, they could also be read as the complete bosonic set of derivative CP-odd couplings involving a CP-even singlet scalar). It will be assumed that the characteristic scale f a associated to the Nambu-Goldstone boson origin of the ALP is at least of the same order of magnitude or larger than the cut-off of the BSM electroweak theory . The ALP scale and the electroweak BSM scale will nevertheless be treated here as independent.
The HEFT building blocks can be chosen to be the gauge field strengths G μν , W μν and B μν plus two SU (2) L covariant objects: where π a (x) denotes the longitudinal degrees of freedom of the gauge bosons and σ a the Pauli matrices. In this notation, the covariant derivative reads Under SU (2) L ,R global transformations (L, R, respectively), the objects defined above transform as The physical Higgs particle h is then customarily introduced as a SM isosinglet via generic polynomial functions F i (h) [73] expanded in powers of h/v, where a i , b i . . . are constant coefficients. Finally, the SM fermions are often grouped into doublets of SU (2) L and SU (2) R , Q L ,R ≡ (u L ,R , d L ,R ), L L ≡ (ν L , e L ) and L R ≡ (0, e R ). The notation chosen allows an easy identification of terms breaking the custodial symmetry SU (2) C to which the global group SU (2) L × SU (2) R gets broken after EWSB.

SU
(2) C is explicitly broken by the gauging of the hypercharge U (1) Y and by the heterogeneity of the fermion masses; insertions of the scalar chiral field T(x), which is not invariant under transformations of the full SU (2) R , account for breaking of the custodial symmetry in the effective operators.
The task now consists in the generalisation of the HEFT Lagrangian to include insertions of derivatives of a/ f a . This could be approached via the insertion in that Lagrangian of general polynomial functions of the SM singlet scalar a, F i (a/ f a ), in analogy with the treatment given to the scalar h in the HEFT Lagrangian. After all, the F i (h/v) polynomials are reminiscent of the deformed exponential Nambu-Goldstone nature of the Higgs particle in some non-linear EWSB realisations, such as "composite Higgs" models [77][78][79]. From this point of view, to restrict below to terms with a single a(x)/ f a insertion is consistent with the assumption f a ≥ . In summary, the effective Lagrangian can be written as where now the LO Lagrangian includes the usual HEFT LO terms plus two ALP-dependent terms, with where the dependence on x, as well as that on v of F(h/v), has been left implicit for brevity. The first line in Eq. (44) accounts for the h and gauge boson kinetic terms, and a general scalar potential V (h). The first term in the second line describes the W and Z masses and their interactions with h, as well as the kinetic energy of their longitudinal components; the second term in this line is a custodial-breaking term that we will disregard in what follows, being phenomenologically extremely suppressed (for this reason sometimes it is included instead among the NLO chiral terms even if it is a two-derivative coupling). The fermion kinetic energy and Yukawa-like terms written in the mass eigenstate basis come where Y Q,L are the 6×6 block-diagonal matrices containing the usual Yukawa couplings as defined in Eq. (14). This notation follows the assumption that the Yukawa-type fermion-h couplings are aligned with the fermion masses. Finally, the last line contains the usual QCD θ term associated to the strong CP problem. L LO a contains two terms which are two-derivative couplings, where A 2D (h) is a custodial-breaking two-derivative operator with mass dimension three, This operator appears then singled out at the LO in the chiral expansion, unlike the case of the linear expansion in which the only LO ALP term was the a kinetic energy; see Eq. (3) and  Table 1.
A discussion of scales The normalisation of the operators in Eqs. (44)- (47) and in the NLO chiral corrections to be discussed below follows the Naive Dimensional Analysis (NDA) master formula for the HEFT Lagrangian as discussed in Refs. [80][81][82][83]. With this convention the gauge boson kinetic terms appear canonically normalised. In addition, the strongly interacting regime would correspond to operator coefficients of ∼O(1). Furthermore, the mass parameter in front of several operators in Eqs. (44) and in Eq. (47) should be a generic scale f , which in specific models is that associated to a Nambu-Goldstone ancestry for the Higgs resonance (alike to f π for QCD pions), such that ≤ 4π f [80]. Instead, v -the electroweak scale -is shown as explicit mass parameter for bosons and fermions in Eqs. (44) and (47), with v < f : this inequality is the well-known fine-tuning of the chiral electroweak Lagrangian, necessary to recover the correct scale of the gauge boson masses. It reflects as well the fine-tuning problems of specific "composite Higgs" scenarios. For consistency v has been then chosen as weight in all mass-related terms in those equations; for instance a factor of f 2 /v 2 is thus implicitly embedded in the definition of the coefficient c 2D in Eq. (46).
The same fine-tuning is at the origin of the F i (h) functions being customarily written as generic polynomials in h/v instead of h/ f ; see Eq. (41). It can be considered that in this parametrisation factors of v/ f have been reabsorbed in the free parameters a i , b i , etc. in Eq. (41). Note as well that, in principle, a function F i (h) can be attached to any of the operators in Eqs. (44) and (46). However, those attachments can be redefined away in both Higgs and fermionic kinetic terms at the price of redefining F Q,L (h) [84] and F 2D (h). Moreover, F i (h) insertions in the gauge bosons kinetic terms can be avoided assuming that the transverse components of the gauge fields do not couple at tree level to the Higgs sector, as has been explicitly shown in Refs. [65,66] for composite Higgs models [77][78][79]. A similar assumption on the ALP sector prevents from writing terms of the type a X μνX μν at LO.

The NLO ALP operators
The complete list of HEFT CP-even bosonic operators at NLO is known [28,31,64] and will not be further discussed. We address here the NLO bosonic chiral interactions involving one insertion of a/ f a , encoded in δL bosonic a in Eq. (42). The additional inclusion of fermionic couplings and the construction of a complete and non-redundant CP-even basis, which will turn out to be composed of a total of 32bosonic and fermionic -operator structures (including the LO axionic operator A 2D and assuming one flavour), is deferred to Appendix A. The NLO Lagrangian δL bosonic a consists instead of 20 independent bosonic operator structures (disregarding in the counting the different coefficients inside the F i (h) functions), where μνW aμν a f a , μνG aμν a f a , The requirement that all ALP couplings respect a (continuous or discrete) shift symmetry prevents the insertion of F i (h) functions in the three first couplings in this list. The first block of six operators are those invariant under custodial symmetry, assuming as customary no sources of custodial symmetry breaking other than those present in the SM.
The "penalisation" of the operator coefficients by inverse powers of 4π is a most conservative choice of their possible value, which reflects the NDA normalisation of the chiral sector [80][81][82][83] in which O(1) operator coefficients indicate the strong regime. A particular case is that of vertices involving one Higgs leg, for which the overall amplitude will be proportional in practice to the product see Eq. (41). Given the f /v factor absorbed in the definition of a i , in the strong coupling limitã i is expected to be somewhat smaller than 1 for all i = 2D. Conversely,ã 2D as defined here is expected to be larger than 1 by a factor O( f /v) in that limit; see the discussion at the end of Sect. 3. Analogous reasoning applies to vertices with more than one Higgs leg.

Two-point functions
The last NLO operator in Eq. (49), A 17 (h), introduces a Z -a two-point function alike to that from the LO coupling A 2D (h), albeit with a higher momentum dependence. That is, both operators feed derivatives of the ALP field into the longitudinal components of the Z boson, in addition to the usual derivative of the SM would-be Nambu-Goldstone neutral field: The physical impact can be illustrated best via a field redefinition which trades completely this combination of twopoint functions by interaction vertices, alike to the procedure applied to the linear operator O a in Sect. 2, which translates also in contributions to the definition of the gauge fixing terms, the mass term for the gauge bosons and the Yukawa couplings (see also Ref. [34] for a similar discussion in the context of CP-odd effective operators within non-linearly realised EWSB). The net physical impact is: The purely bosonic couplings cannot be thus completely traded by fermionic ones in the generic case of non-linear EWSB. This is remarkable, as it implies that a Zh couplings could be then expected among the dominant signals of ALPs, at variance with linear realisations in which they are only expected at NNLO (as argued in Sect. 4 below). This comparison is illustrated in Table 1.
The fermionic couplings, stemming from A 2D (h) and A 17 (h) after the field redefinition discussed, will be denoted by A ψ 2D and A ψ 17 and defined by see Eq. (45) and Appendix D for details. These expressions are the non-linear equivalent of the linear interaction O ψ a in Eq. (13). Alternatively, the part of A 2D and A 17 that can be traded by fermionic couplings could be written as chiralityconserving transitions, e.g.
which are the chiral equivalent of Eq. (15). In this work, when analyzing the non-linear EWSB scenario we will use the formulation of chirality-flipping fermionic couplings in Eq. (53). 5

The bosonic chiral ALP basis
In summary, the resulting bosonic ALP Lagrangian up to NLO couplings can be written, after the redefinition in Eq. (52), as the sum of 23 terms, besides the kinetic term: where A 2D (h) and A 17 (h) are defined as the operators A 2D (h) and A 17 (h) without their h-independent terms, which have been traded instead by the fermionic A ψ i couplings as defined in Eq. (53). The rest of the operators have been defined in Eq. (49). All Feynman rules stemming from L chiral a can be found in Appendix B, up to four-leg interactions.
AB, AW , AG and A ψ 2D are identical to the operators found in the framework of the linear EWSB Lagrangian. In consequence, the bounds on ALP-photon and ALP-gluon vertices in Eqs. (21) and (22) apply. This would also hold, restricted to the indicated mass ranges, for the aW + W − coupling in Eq. (35), if AW was considered just by itself. Nevertheless, the caveats to that approach discussed in the linear case are even stronger here in the sense that the aW + W − couplings may receive contributions in the non-linear case from the set (see FR.4) Analogously, the ALP-fermion vertices in Eqs. (30)- (33) would constrain the magnitude of A 2D , if the latter is taken just by itself, to Again, in the non-linear EWSB setup many other couplings may contribute in addition to rare meson decay processes than in the linear case, see FR.17-FR.19, to wit In this ensemble, the subset {c B q i } of operator coefficients refers to the flavour-changing operators of the general ALP-fermion couplings B q i in the complete Lagrangian, see Eq. (94), which can contribute either at tree level or at one loop via W , Z or h exchange, and thus on the same footing than for instance cW or c 2 , c 6 , c 8 and c 17 . Even if the data analysis was restricted for simplicity to bosonic couplings (the focus of this work), a six-dimensional parameter space would still remain, which means that a large freedom remains for the possible value of one given coupling. In consequence, consistently with the complementarity perspective, in the second -phenomenological -part of this work we will explore the independent impact that the bosonic non-linear operator coefficients in Eq. (58) may have on LHC signals, which they impact via a different combination than in rare decays. Those couplings will be thus considered there first one at a time and occasionally in some combinations.

Linear vs. non-linear expansions
The results in the previous sections on bosonic ALP-SM interactions uncovered a plethora of effective couplings in the bosonic sector of the chiral expansion, in contrast with the mere four operator structures of the linear one shown in Eq. (17), when both Lagrangians are considered up to NLO. All ALP couplings are NLO ones in the linear case, while one of the chiral set (A 2D ) stands out at LO.
Three operators are exactly the same in both expansions. They are those with an "anomalous-type" structure of the form a X μνX μν , where X μν stands for a SM field strength: AB, AW and AG. The total number of independent interactions has to be equal in both expansions when all orders are considered, though. It is thus pertinent to identify which are the effective operators of the linear expansion that lead to the same interaction vertices than the chiral (up to) NLO couplings. This is accomplished in Appendix C, which identifies the linear siblings with mass dimension: d = 5, corresponding to AB, AW and AG and to the fermionic couplings induced by A 2D with no attached Higgs leg (these are identical in both expansions), as well as other fermionic vertices.
Furthermore, the siblings of the vertices induced by A 2D with one or more Higgs legs are linear effective operators with dimension d = 7 [85] or higher, depending on their Lorentz structure.
Common/distinctive phenomenological signals Interaction vertices predicted by both expansions include the wellknown ALP-photon and ALP-gluon couplings, and in addi-tion the yet mainly unexplored aγ Z , a Z Z, aW + W − , aγ W + W − and a Z W + W − signals.
Distinctive signals are those only present in the chiral EWSB Lagrangian at the order considered, which are: (i) extra ALP-gauge boson vertices aγ W + W − , a Z W + W − and a Z Z Z; (ii) ALP-Higgs interactions stemming from A 2D , which include aγ h, a Zh, aγ Zh, a Z Zh, aW + W − h, aγ hh and a Zhh interactions, among others. All these signals are thus putatively important pointers of non-linear realisations of EWSB.
A natural question about the bosonic ALP-Higgs interactions is how come those (Z μ ∂ μ a)h n couplings with n ≥ 1 appear at LO in the non-linear expansion while they are instead very suppressed in the linear one, as after all the latter is a limit of the former. The gist lies in the generality of the F i (h) functions, and more specifically in the difference between F C (h) and F 2D (h), see Eqs. (41), (44) and (47). Would those two functions be equal, as it happens in the linear expansion, all bosonic ALP vertices involving the Higgs would also be redefined away completely in the chiral expansion at LO and NLO. Furthermore, even if the difference between the a i , b i etc. coefficients for those two F i (h) functions was considered to be qualitatively a NLO effect, all (Z μ ∂ μ a)h n , n ≥ 1 couplings would still be phenomenologically considered NLO effects, which means in any case higher strength expected than in linear realisations of EWSB (where they start to appear only at NNLO).
The phenomenology of the ALP couplings to heavy SM bosons will be explored in Sects. 6 and 7 below.

Assumptions and validity of the EFT
The theoretical results in the previous sections focussed on a generic Nambu-Goldstone boson, singlet under the SM, identifying all bosonic derivative couplings at LO in the linear and chiral expansions (a complete set including fermionic ones was also derived and for the chiral case they can be found in Appendix A). They hold independently of whether the -unknown -underlying global symmetry is exact or slightly and explicitly broken, that is, of whether the ALP is indeed exactly massless or not, as far as its mass is negligible compared to the typical momenta considered. A few considerations are nevertheless in order before moving to the phenomenological analysis of ALPs signatures at colliders.
Validity of the EFT For the effective Lagrangian description to be valid, the relevant suppression scale, in this case f a , must be significantly larger than the typical energy scale of the process under study. In order to strictly ensure the validity of the EFT, one should require √ŝ < f a for each event ( √ŝ corresponding to the invariant mass of the event). However, √ŝ is not experimentally observable in processes with invisible particles in the final state. In this case, the compari-son to f a may be naively performed using either the missing transverse energy of a given event / E T or the transverse mass m T , defined as (in events characterised by the presence of a lepton and significant / E T ) where p T is the transverse momentum of the lepton and φ is the azimuthal angle between the lepton and the missing transverse momentum vector / E T (note that m T encompasses contributions from both the visible and the invisible parts of the final state). We use these two variables in the analysis below, depending on the process, and require that the maximum values allowed for those variables obey m max T < f a for mono-W analyses (see Sect. 6.3), as the ATLAS search we reinterpret uses m T as discriminating variable. m max T corresponds to the highest m T data bin in a given analysis, for each value of f a considered.
The effect of imposing the strict validity criterion √ŝ < f a can be assessed through the correlation between / E T , m T and √ŝ for each analyzed signal, obtained from Monte Carlo. For binned analyses, the signal event fraction for which √ŝ > m max T , / E max T in different bins may then be discarded. We will explicitly use this procedure for the mono-W and mono-Z analyses in Sects. 6.3 and 7.1, and discuss the impact of the strict validity criterion on the bounds/sensitivities on f a /c i obtained from the rest of analyses. 6 On a different note, we stress that as the chiral expansion has an implicit BSM electroweak scale ≤ 4π f , there is an underlying assumption that f a ≥ . This / f a hierarchy sustains the choice of restraining the analysis to vertices involving only one ALP. ALP stability at the LHC and its mass In the LHC phenomenological exploration to follow, it will be assumed that the ALP is stable on collider scales, thus escaping the detector as missing transverse energy / E T . This further restricts the range of values of m a , f a , appropriate for the concrete numerical analysis below, given the various interactions of a that could allow its decay -see Eqs. (FR.1)-(FR.7) and (FR.17)-(FR. 19) in Appendix B. The valid m a range should be specified for a correct interpretation of the collider results: because of the assumed stability, all phenomenological results to be obtained below hold for ALP masses m a ≤ 1 MeV, without any additional assumption as regards which channels may be open. The ratio between the ALP mass m a and f a is then 6 See also Ref. [86], where a similar method has been applied to DM searches with the added feature of marginalizing over the unknown contribution of new physics beyond the cut-off. safely small, m a / f a ≤ MeV/TeV, for characteristic f a scales of at least a few TeV.
For ALP masses above the MeV, the signals to be studied below may also be present even if the pattern is altered, accompanied by new ones which can be used to precisely test the couplings through which the ALP may decay within the detector (e.g. leptonic couplings). 7 This would require an extended dedicated study.
In this work, an ALP mass m a ∼ 1 MeV is used in the numerical simulations, light enough to avoid altogether a → + − and a → νν + − decays. The decay channels which then remain a priori available are: a → νννν As neutrinos are undetectable at the LHC, this decay does not have any impact on our phenomenological analysis. It would simply become part of the / E T contributions.
a → γ γ This decay is constrained by astrophysical observations, as detailed at the end of Sect. 2. The distance d covered in the laboratory frame by an ALP before decaying can be estimated as where τ , (a) and p a are, respectively, the proper lifetime, width and three-momentum of the a particle, and c denotes the speed of light. Restricting the width to (a → γ γ ) and using the coupling strength g aγ γ as defined in Eq. (19), it follows that which can be rewritten as For m a = 1 MeV, given the experimental constraint (see Eqs. (20)- (21)), it results The ALP momentum | p a | is typically of the order of the missing energy of the candidate signals, selected imposing a minimum / E T cut, which for instance using ATLAS and CMS data is O(100) GeV. Thus, within the allowed range for g aγ γ and / E T , the ALP always covers an enormous distance -many orders of magnitude larger than the LHC detectors size (∼ 10 m) -before decaying into two photons. For lighter ALPs, the situation is even safer given the inverse quartic dependence of d with m a . ALP masses above the MeV range and up to hundreds of MeV could be considered without risking two-photon ALP decay in the data analyzed 8 by raising the minimum / E T cut imposed on data, but this would open the e + e − leptonic decay channels.
a → γ νν Analogously, this process does not affect the stability of the ALP particle at the LHC. It could be mediated by the ALP-Z -γ interaction parametrised by g a Zγ , where Z μν denotes the Z -boson field strength. The decay width shows a very strong dependence on the mass of the ALP, due to a peculiar cancellation occurring in the phase-space integration. In the limit m a m Z (and neglecting the Z boson width for simplicity) we find For m a = 1 MeV, this corresponds to a distance covered by the ALP before decaying where on the last inequality the constraint on g a Zγ derived further below has been used (see Eq. (70)).

Phenomenological analysis I: new bounds
In this section we derive new constraints on the operator coefficients using LEP and LHC Run I and II data. Table 2 summarises the observables/processes which are sensitive to the various effective operator coefficients, to be considered in this and the next section.
Unless otherwise specified, we will consider the effect of one operator at a time. Note that the dependence of the signal cross section σ or partial width on an operator coefficient c i is (c i / f a ) 2 , hence the ratio c i / f a is the relevant combination of parameters throughout the analysis.
For the operator coefficients we will use the notation of the chiral expansion, as its couplings outnumber and include those of the linear expansion -see Sect. 3. Whenever pertinent, the applicability of a given bound or a sensitivity prospect to both expansions will be specified. Special attention will be paid overall to the comparison between the expectations based on the linear and non-linear effective Lagrangians.

ALP coupling to Z -photon
In the non-linear expansion, the effective a Zγ coupling takes the form: with the custodial-preserving limit recovered for c 7 = 0 and the linear limit at NLO recovered for c 1 = c 2 = c 7 = 0. This interaction can be constrained from various sets of experimental data: -The uncertainty on the Z boson width [42], (Z → BSM) 2 MeV at 95% CL, allows one to set a conservative bound on the process Z → aγ . The latter would contribute to the Z width as In consequence, we use for the first time the Z boson width to obtain a bound on this coupling, constraining the combination of coefficients in Eq. (68) within the limit (with basically no dependence on m a for m a 1 GeV) -LEP limits on Z → 3γ [36] constrain the product g aγ γ g a Zγ . However, given the bounds on g aγ γ reported at the end of Sect. 2, the inferred bound on g a Zγ is weaker than that in Eq. (70).
As illustrated in Fig. 1, the Z boson width is able to probe regions in the parameter space orthogonal to those tested by g aγ γ . In the linear EWSB setup, those two bounds constrain New constraints LEP data 20

Fig. 1
Left Constraints on the parameters cB / f a and cW / f a derived from the tree-level bounds on the combinations g aγ γ (y-axis) and g a Zγ (x-axis) defined in Eqs. (20) and (68). The hatched (solid) region is obtained with the benchmark mass m a 1 MeV (keV). The different colours show how the allowed region is shifted in the non-linear setup, depending on the parameter c 127 = g 4π (2c 1 + t θ (c 2 + 2c 7 )). The value c 127 = 0.2 is about maximal, as it is obtained fixing c 1 = c 2 = c 7 = 1, typical of the strongly interacting regime. The linear case corresponds to c 127 = 0. Right The rotated figure shows only the region allowed for m a = 1 MeV cB/ f a and cW / f a to take values within a limited area: imposing Eq. (23) leads to |cW / f a | < 0.42 TeV −1 . In the non-linear EWSB case, that region can be shifted depending on the value taken by the combination c 127 ≡ g 4π (2c 1 + t θ (c 2 + 2c 7 )), as shown in Fig. 1. Overall, the constraints on the quantities c i / f a are of order TeV −1 and thus correspond to a loose O(1) bound on the coefficients c i for f a = 1 TeV. 6.2 ALP coupling to Z -Higgs: non-standard Higgs decays As shown in Sect. 3.2 and Appendix D, the presence of the coupling a Zh is a characteristic feature of the non-linear effective Lagrangian, as in the linear expansion it would only be expected at NNLO. We propose here for the first time to use non-standard Higgs channels to bind couplings of the ALP to the Higgs particle. Consider a range of ALP masses such that it allows the Higgs particle to decay into Za. The presence of non-standard decay modes of the Higgs is constrained by ATLAS and CMS global fits to Higgs signal strengths. Current constraints on the Higgs non-standard branching fraction Br(h → BSM) from LHC 7 and 8 TeV data yield [87] Br (71) where the SM Higgs width is SM = (4.07 ± 0.16) MeV [88] and BSM denotes the non-standard Higgs partial width stemming in this case from the presence of the ALP, The interaction vertices contributing to h→a Z , h→a ff and h→a Zγ are shown in FR.7, FR.20-FR. 22 and FR.14 in Appendix B, respectively. The last two terms are three-body phase-space suppressed and yield negligible contributions to the Higgs total width; 9 they will be then discarded in what follows. Using then BSM h→a Z in Eq. (71) yields the present bound h→a Z receives contributions from the chiral LO operator A 2D , Eq. (47), and from several NLO ones in Eq. (49), with 9 h→a ff is further suppressed by factors of (m f /v) 2 1, while the interaction ah Zγ is linked to the a Zγ vertex (see FR.14 and FR.3 in Appendix B), whose strength is bounded from the Z width (see Sect. 6.1). and where the coefficientsã i for the couplings involving one Higgs leg have been defined in Eq. (50). The bound in Eq. (73) translates into the constraint where we use the fact that the inequality on the left is generically dominated by theã 2D contribution, as it enters weighted by a large factor. If the constraint in Eq. (57) is considered, the impact of A 2D on h → a Z decay is negligible for ALP masses below 3 GeV, and in consequence the bound in Eq. (73) would apply to the combination ofã 3 andã 10 . However, present LHC sensitivity does not allow one to constrain these operators. The above limits are expected to improve significantly at the high-luminosity phase of LHC (HL-LHC). For example, Ref. [89] estimates that a bound will be reached for 3000 fb −1 of data at √ s = 14 TeV (neglecting here theoretical uncertainties). This would roughly translate into a sensitivity 3 ab −1 h→a Z 0.45 MeV ( f a /ã 2D 6 TeV for the case in Eq. (76)).
An alternative approach to tackle h→a Z is to use the constraints from direct searches for invisible Higgs decays, since h → a Z yields an invisible Higgs decay for Z → νν. Current experimental searches by ATLAS [90,91] and CMS [92] constrain the branching ratio for Higgs decay into invisible states Br(h → inv) to [91] Br(h → inv) < 0.23 (95% CL).
In the future, given the improvement on the sensitivity to Br(h → inv) foreseen at HL-LHC with 3000 fb −1 of data at √ s = 14 TeV [94], direct searches of the invisible decays of the Higgs resonance may be sensitive to h→a Z . Indeed, in the ALP scenarios under discussion and in consequence, barring a positive signal in future data, Eq. (79) may translate into 3 ab −1 h→a Z 2.71 MeV, setting new limits on the operator coefficients participating in this decay. This expected sensitivity is, however, weaker than the present bound obtained from global fits to Higgs signal strengths in Eq. (75), and the latter will be used in the remainder of the paper.
6.3 Mono-W and mono-Z searches at √ s = 13 TeV We now study the production of a in association with a W and a Z boson, as illustrated in Fig. 2. Since the ALP escapes the LHC detectors as missing transverse energy / E T , this yields, respectively, "mono-W " [95] and "mono-Z " [96][97][98][99][100] signatures. Both channels are being currently searched for by the ATLAS and CMS experimental collaborations. In this section we use their studies from public Run II data to set limits on the presence of different ALP effective operators that contribute to these signals.

Analysis tools
All signals and backgrounds to be discussed below in this and the next section will be generated using MadGraph5_aMC @NLO [101]. For this section it is enough to consider a parton-level analysis as the final states considered involve only leptons in addition to the ALP.

Statistical tools
In order to set limits on c i / f a for each effective operator, a binned likelihood analysis will be performed. The likelihood function for a given lepton flavour in the final state = e, μ, is built as a product of bin Poisson probabilities where and b k and s i k are, respectively, the background prediction and the signal prediction for c i = 1 and f a = 1 TeV in a given bin k. The significance is estimated via the test statistic Q μ i , withμ i being the value of μ i which maximises L (μ i ). Alternatively, we may include the effect of systematic uncertainties on the background prediction (which for the mono-W searches can be obtained from Refs. [102,103] and for the mono-Z searches from Ref. [104]) by convoluting each bin Poisson probability with a Gaussian prior, 10 such that the likelihood function is given by 10 The Gaussian normalisation in Eq. (84) is consistent as long as σ i 1, which is the case in our present analysis.
with σ k being the background systematic uncertainty in each bin k. Our test statistic accounting for background systematic uncertainties Q S μ i is then defined as The value of μ i that can be excluded at 95% CL corresponds to Q μ i = 3.84 (Q S μ i = 3.84) if background systematic uncertainties are not (are) included.

Mono-W signatures: pp → a W ±
We are targeting in this paper bosonic couplings of the ALP particle, and here in particular ALP couplings to electroweak gauge bosons, as illustrated in Fig. 2. Let us first concentrate on the ALP production in association with a W boson, as illustrated in Fig. 2(left). It is possible to derive limits on the coefficient of each effective operator contributing to this process from LHC Run II data at √ s = 13 TeV, by reinterpreting the ATLAS search for W decaying to + / E T final states with 3.3 fb −1 of integrated luminosity [102] (with = e, μ). The backgrounds will be taken from Ref. [102], considering independently the electron and muon samples and selecting events with transverse momentum p T > 65 GeV (55 GeV) as well as / E T > 65 GeV (55 GeV) and transverse mass m T > 130 GeV (110 GeV) in events with electrons (muons).
The couplings that may contribute to this process are the custodial-invariant AW and A 2 operators, and the custodialbreaking ones A 6 and A 8 in Eq. (49), as illustrated in Fig. 2 and shown in the Feynman rules FR.5. Figure 3 depicts the m T spectrum of the SM background contributions, as well as the various signals corresponding to the c 2 , c 6 , c 8 , cW Wilson coefficients, for f a = 1 TeV and c i = 1 (with cB obeying Eq. (23)). The bins used in this figure are those for which there is experimental information on the background [102], corresponding to m T < m max T = 2.6 TeV for electrons and m T < m max T = 3 TeV for muons. As discussed in Sect. 5, the strict EFT validity condition √ŝ < f a can be imposed by computing the fraction of events in each bin for which √ŝ > m max T , and discarding it. In Fig. 4 (left) we show the correlation between m T and √ŝ for mono-W through a double-differential Monte Carlo distribution for the AW signal. We also show the normalised m T distribution (again, for the AW signal) before/after discarding the events for which  Fig. 3 Transverse mass m T distribution for a W ± (W ± → ± ν ) production in the e + / E T final state (left) and μ + / E T final state (right), generated from AW (green), A 2 (purple), A 6 (orange) and A 8 (yellow).
Also shown are the binned experimental data and dominant backgrounds from the 13 TeV (3.3 fb −1 ) ATLAS analysis [102] We note that although the c 6 and c 8 signatures in Fig. 3 exhibit a kinematical shape a priori much more favourable to be distinguished from background than those proportional to cW and c 2 , at the end the most prominent impact on this purely LHC analysis is that of cW (followed by that of c 6 ) due to suppression factors in the cross sections. 11 Mono-W signatures from the operators A 6 , A 8 and A 2 are buried in the backgrounds of present LHC data, and they will remain out of reach with future HL-LHC data, except for A 6 ; see Sect. 7.
The loop-level bound obtained in Eq. (35) would imply (if taken at face value) that AW is out of reach of foreseen LHC prospects, for light enough ALPs; however, as previously discussed, because more than one operator contributes to those rare process -see Eq. (58) -the data only constrain a combination of operator coefficients which differs from that 11 The impact of A 8 is suppressed with respect to that from A 6 well beyond what suggests the ∼(g/4π) 2 factor in the Feynman rule FR.5, as the squared matrix element of its contribution qq → W ± a vanishes with the quark mass as ∼m 2 q /m 2 W (∼ 2 × 10 −4 for the charm quark).
in LHC signals, see Eq. (56); it is thus pertinent to analyze the impact of AW on LHC independently.
The results obtained, for which the LHC sensitivity in f a /cW extends up to significant values, are listed in Table 3. They show an important impact of the systematic uncertainties on the background and also indicate that present LHC Run II limits on f a /cW from mono-W signals would a priori be sensitive to cW only in the region of strong coupling cW 1 (possible in non-linear EWSB constructions), for values of f a compatible with the validity of the EFT. These bounds have been computed in compliance with the strict validity criterion ( √ŝ < f a ) by discarding the fraction of events in each bin for which √ŝ > m max T (recall the discussion in Sect. 5). We note that here the effect of considering a strict validity criterion instead of the milder f a > m max T one is of the order of the few percent on the numbers in Table 3. The bound which suffers the most from applying the strict validity criterion is the present constraint from the W → eν final state, where applying the naive validity criterion would imply overestimating the bound in ∼20%. However, this is not a problem since it is the muon channel with yields a more constraining result.

Mono-Z signatures: pp → a Z
Consider now ALP production in association with a Z boson, in hadronic collisions, as illustrated in Fig. 2 (centre and   right). The recent CMS Z + / E T search [104] with √ s = 13 TeV and integrated luminosity 2.3 fb −1 will be used to estimate present sensitivities to various Wilson coefficients. Table 2 summarises the couplings which may a priori con-  [104] tribute to a mono-Z signal among those in the chiral basis, Eqs. (47) and (49). It will be argued next that only cW may be expected to be seriously tested by this signal.
The / E T distribution for signal and background will be used as kinematic discriminator, applying the same tools and procedure described at the beginning of Sect. 6.3. In order to optimise the search, the following preselection and selection cuts are applied: , and furthermore third-lepton and extra highp T jets vetoes are implemented. The cut / E T > 80 GeV ensures that a contamination from the gluon-fusion initiated signal leading to s-channel Higgs mediation can be safely neglected: for an on-shell Higgs the maximum / E T is ∼30 GeV. Furthermore, for a higher / E T cut the fraction of the cross section contributed by this channel may be estimated as the integral of the Breit-Wigner distribution of the Higgs resonance forŝ > m 2 GeV gives a suppression factor of 5 × 10 −6 . Given that the on-shell Higgs production via gluon-fusion is σ (gg → h) = 48.6 pb [105], the Higgs-mediated contribution is completely negligible. Similarly, contributions involving a quark in the t-channel are not relevant in the kinematic region considered. In summary, with present data the signal cross sections for pp → Za have a negligible dependence on the Wilson coefficients parameterizing the qqa and h Za vertices, i.e. c a in the linear case and c 2D , c 3 , c 10−14 , c 17 in the non-linear one (see Appendix B).
The remaining ALP-gauge boson interactions which may induce a mono-Z signal are the custodial-invariant operators AW , AB, A 1 and A 2 , and the custodial-breaking coupling A 7 ; see Fig. 2 (centre and right). AB will not be considered independently all through the rest of this work, given the constraint in Eq. (23). The contribution from A 7 does not need to be considered separately either, as c 7 enters exclusively through the combination c 2 + 2c 7 ; see the Feynman rules FR.3 and FR.2. The analysis focuses thus on cW , c 1 and c 2 .
The comparison of signals and background / E T distributions for = e, μ is shown in Fig. 5. The highest-energy bin considered is / E max T = 1.2 TeV. In analogy with the previous mono-W analysis, the EFT validity condition √ŝ < f a is implemented by discarding the fraction of events in each bin for which √ŝ > 2 / E max T (see Sect. 5). The correlation between √ŝ and / E T is shown in Fig. 4 (right), as well as the normalised / E T distributions before/after discarding the invalid event fraction in each bin.
The results obtained for AW are listed in Table 3: the present mono-Z search turns out to be significantly more powerful in constraining cW / f a than the ATLAS mono-W search previously analyzed. Furthermore, the impact of systematic errors is negligible in this case. An interesting fact is the different discriminating power of electrons and muons in mono-Z signals induced by ALP emission with respect to the coupling strength: while present muon data could a priori be sensitive to cW only in the region of strong coupling cW ≥ 1, a signal in electron data would be compatible as well with cW values in the perturbative regime, cW ≤ 1. It is relevant to point out that these results, obtained imposing √ŝ < m max T < f a , are equal up to the permille level to the ones which are obtained if the naive validity criterion (only m max T < f a ) is used instead. The contributions to mono-Z signals from A 1,2 are shown in Fig. 5 for illustration only, as the corresponding values for c 1,2 would lie outside the region of validity of the EFT in present data and also if assuming the 3000 fb −1 integrated luminosity foreseeable at HL-LHC; see the next section. The mono-Z analysis with present and projected data is thus only sensitive to the AW operator, which is common to the NLO cW +c 2 +c 6 +c 8

Phenomenological analysis II: √ s = 13 TeV LHC prospects
This section explores the sensitivity prospects for constraining the effective ALP couplings to SM bosons at the HL-LHC, as well as the analysis strategy sensitive to the linear/non-linear character of the underlying EWSB mechanism. Assuming thus proton-proton collisions at c.o.m. energy √ s = 13 TeV and successive integrated luminosities of 300 and 3000 fb −1 , the following channels will be analyzed: -Mono-W and mono-Z signatures, see Fig. 2, projecting the analysis in Sect. 6 onto future data. A qualitative discussion of their ratio as a probe of non-linear character will be added. -W aγ associated production; see  Table 2 summarises the set of operator coefficients that could contribute to these signals and be tested in LHC prospects, among those defined in the Lagrangians Eqs. Mono-W and W aγ associated production, together with a j j (γ ) production through vector-boson fusion (VBF) -also shown in Fig. 6 -are intimately related processes, as they probe the same limited set of effective operator coefficients 12 cW , cB, c 1 , c 2 , c 6 , c 7 and c 8 .
A special role is played by the Higgs-ALP couplings associated to c 2D , which, barring extreme fine-tunings, may be only expected among the leading signals if the underlying EWSB enjoys a non-linear character, and within a certain ALP mass range, as previously discussed. This coupling will be shown to be a priori testable through mono-Higgs searches at HL-LHC, which exhibit a sensitivity reach well beyond the bounds obtained in Sect. 6.2 from the limits on the nonstandard Higgs decay width.
Relevant information about the structure of the ALP couplings can be inferred both by analyzing the different signatures independently and by studying their interplay. As some effective operators contribute to several processes, a combined analysis may be necessary in order to access the individual Wilson coefficients. Furthermore, the study of (de)correlations between the various putative signals serves as a good probe of the degree of EWSB non-linearity.

Mono-W and mono-Z signatures
The result of extending the analysis in Sect. 6.3.1 to the projected sensitivity in (c i / f a ) 2 for LHC 13 TeV with 300 and 3000 fb −1 is summarised in Table 4 (for mono-Z ) and Table 5 (for mono-W ), considering electrons and/or muons in the final state. Table 4 Projected 95% CL f a /c i reach at LHC, with L = 300 fb −1 and L = 3000 fb −1 for μW = (cW / f a ) 2 from mono-Z production, as detailed in Sect. 6 They show that mono-Z searches will be stronger than mono-W ones in probing at LHC the effective operator AW . Both electron and muon channels will access the perturbative regime cW < 1. Mono-Z searches would reach ALP scales up to f a ∼ 20 TeV (for cW = 1) with 3000 fb −1 disregarding background systematics -see Table 4. Assuming instead future background systematics as (1/2 of) the present ones, the mono-Z reach is somewhat milder, up to f a ∼ 15 TeV (∼18 TeV). Table 5 shows that instead the limits on cW / f a from LHC mono-W searches are systematics dominated. 13 Future mono-W searches appear instead of special interest in order to uncover the A 6 coupling, which is a signal of non-linearity up to NLO. Table 5 shows that with 300 and 3000 fb −1 it is possible to either discover it or derive a consistent projected limit. The sensitivity to c 6 turns out to be mainly limited by statistical uncertainties, being less dependent than AW on SM background systematics. Nevertheless a significant reduction of the latter is shown to have a significant impact also on tackling A 6 , particularly with 3000 fb −1 : scales up to f a /c 6 ≤ 3.44 TeV (4.68 TeV) would be then attainable if systematic errors were reduced by 1/2 (completely) with respect to their present value (see Table 5), leading to c 6 being testable within the perturbative region.
Finally, mono-W and mono-Z signals may turn out to be especially prominent as phenomenological signals of the complete NLO ALP basis, in particular of ALP-fermion couplings in the chiral EWSB case. For instance, the a Zψψ couplings B

Strategy for a combined analysis
As is apparent from the discussion above, the interplay between mono-Z and mono-W signatures may be relevant as a way of disentangling the presence of non-linearity in the Higgs sector. Up to NLO in both expansions and barring extreme fine-tunings of operator coefficients, the cross sections for those two processes are: -Strongly correlated in the linear case, being both controlled by the coefficient cW (cB is not independent; see Eq. (23)). -Less correlated in the non-linear case, as operators other than AW and AB are expected to contribute to those mono-signals. For instance the purely chiral A 6 operator may contribute visibly to mono-W production within the projected HL-LHC prospects, as shown above.
A combined analysis of mono-Z and mono-W appears thus to be a valid method to shed light on the nature of the EWSB dynamics, once a positive detection occurs. Here we illustrate the (de)correlations of those signals in a purely qualitative way. The cross sections for pp → Za and pp → W ± a at a c.o.m. energy √ s = 13 TeV are computed using MadGraph5_aMC@NLO, and subject to no other constraint than Eqs. (21) and (70) and a kinematical cut / E T > 200 GeV. A random scan of Wilson coefficients c i ∈ [−1, 1] has been performed, along three scenarios: (i) the linear setup, which in practice reduces to the custodial-preserving AW and AB operators, see Eqs. (5) and (6); (ii) the non-linear custodial case, involving operators AW , AB, A 1 and A 2 ; (iii) the non-linear case including both custodial-preserving and non-custodially invariant couplings, here denominated nonlinear for short, which adds to the previous set A 6 and A 7 ; see Eq. (48).
The results for the cross sections are summarised in Fig. 7 (top) in the σ ( pp → W ± a), σ ( pp → Za) plane, for linear (orange), cyan (non-linear custodial) and dark blue (nonlinear), for f a = 1 TeV. The strong correlation characteristic of the EWSB linear scenario is clearly seen. In contrast, in the non-linear setup deviations from the sharp linear pattern  86), with the area of the histograms normalised to 1. In both cases, orange, cyan and dark blue correspond, respectively, to linear, non-linear custodial and non-linear non-custodial setups emerge as expected as they stem from the non-linear operators A 1,2,6,7 . Those deviations are necessarily small, though, as the contribution from any of the coefficients c 1,2,6,7 is suppressed by a factor g/(16π) -see the Feynman rules in Appendix B -compared to that of cW , cB (this conclusion may, however, be somewhat modified if a harder / E T cut is imposed on the signal). In any case, in the event of a mono-W and/or mono-Z excess in future data, the ratio may be used to discern possible physical explanations. This observable has the advantage of being in principle independent of the scale f a as well as of the ALP mass m a (provided that m a m Z , as is the case assumed in this analysis for ALP stability reasons). Figure 7 (bottom) shows, for the three sets of operators considered, the R Z W distributions obtained letting the coefficients of each operator considered assume random values in the interval [−1, 1]. Note that neither the slope in the upper plot in Fig. 7, nor the numerical values in the other plots in this figure, are meaningful per se, but rather strongly dependent on the specifics of the analysis (e.g. on the kinematical cuts applied, here / E T > 200 GeV). Therefore, the strategy to follow in a realistic experimental analysis would be to look for a coincidence/tension between the expected value of R Z W in the linear scenario and the measured one which, if detected, could indicate the presence of non-linearity in the Higgs sector.
We stress that the considerations in this subsection are aimed at discussing the expected relative strength of the mono-W and mono-Z observables in a purely qualitative way, having in mind future hadronic machines in general. Indeed, besides the strong dependence of the results on the kinematical cuts chosen, no consideration of backgrounds has been taken into account here. This is in contrast to the detailed phenomenological analysis at the beginning of the subsection, where it was shown that only the deviations stem-ming from A 6 have a chance of being visible within the foreseen HL-LHC prospects. 7.2 Associated production: pp → aW ± γ Consider next ALP production in association with both a W boson and a photon, as illustrated in Fig. 6(i) and (ii). Examining the interactions in the chiral effective Lagrangian Eq. (49), it is easy to see that those couplings exhibit a particularly interesting combined potential for disentangling the presence of different effective operators: as illustrated, respectively, in FR.4 and FR.11 of Appendix B and summarised in Table 2. Both processes are thus a priori sensitive 14 to A 6 , A 8 , and to a fixed combination of AW and A 2 , which therefore singles out a flat direction. In contrast, in the linear scenario only the Wilson coefficient cW contributes significantly to both interaction vertices. The first process in Eq. (87) leads to the striking mono-W signal being already searched by LHC collaborations and whose physics impact has been explored in Sects. 6.3.1 and 7.1. The second process leads to aW ± γ associated production, a search not being yet performed by the ATLAS and CMS collaborations. We will explore its prospects next, focusing on final states characterised by leptonic W decays. It is necessary to take into account, though, that the pp → aW ± γ channel may be induced also by a Z γ -mediated contributions, to which the set {AW , AB, A 1 , A 2 , A 7 } may contribute as illustrated in Fig. 6( (88) where the constraint in Eq. (23) has been applied. The contribution of A 7 is equivalent to that of A 1 and it is not necessary to consider it independently. In summary, the analysis is done 14 The sensitivity to c 2D (c a in the linear expansion) remains in practice negligible even for L = 3000 fb −1 for the same reasons explained in Footnote 12. 15 The a γ γ contribution to pp → aW ± γ is proportional to c 2 θ cB + s 2 θ cW and thus irrelevant; see Eq. (23).

Fig. 8
Missing transverse energy distributions for pp → aW ± γ (W ± → ± ν) at √ s = 13 TeV LHC normalised to unity, for signals generated one by one for operators A 1 (blue), A 2 (violet), A 6 (orange), A 8 (yellow), AW (green) and AB (red) on five distinct operators: {A 1 , A 2 , A 6 , A 8 } and the combination of {AW , AB} orthogonal to the aγ γ coupling. They are studied next, one at a time and keeping our analysis at parton level. 16 The main irreducible SM background is pp → W ± γ (with W ± → ± ν), a process which has been measured by ATLAS [106] and CMS [107] during the LHC √ s = 7 TeV Run. The reducible backgrounds are subdominant with respect to the direct W ± γ production and consist of (i) W ± +jets (with a jet misidentified as a photon), (ii) Z + − (with one of the leptons misidentified as a photon or unidentified and the Z decaying into neutrinos), (iii) γ +jets (with a lepton originating from a heavy quark decay) and (iv) tt (with a semileptonic decay of the top pair and a misidentification of a field as a photon). Their combined effect is to approximately increase the size of the W ± γ background by 15-25% depending on kinematics and the flavour of the lepton [106,107]. For the present analysis, we simply account for this by scaling up our dominant SM W ± γ background by 20%.
The event selection requirements for photons and leptons for both signal and background are p γ T > 20 GeV, p T > 20 GeV, |η γ | < 2.5 and |η | < 2.5. / E T will be employed as kinematic variable for distinguishing signal from background, as we find that this variable has significantly more signal discrimination power than the p T of the lepton, because it receives contributions directly from the ALP in the signal set. The / E T distributions (normalised to unity) for the various effective operators and the SM background are shown in Fig. 8. The harder momentum dependence of the effective couplings explored compared to the SM contribution are illustrated. In practice, we simulate events only up to / E T = 1 TeV, as we find that signal cross sections for / E T > 1 TeV are negligible. The significance σ i of a signal associated to one given operator A i is defined here as [108] where μ i was defined in Eq. (82), and μ i s i and b denote, respectively, the number of events in the signal and the background, alike to the definitions used in Eq. (81) with, in this case, where L is the integrated luminosity, σ i stands for the cross section induced by A i and σ SM for the SM one. The kinematical cuts are taken as follows: - A very slight improvement in sensitivity to c 6 is found for / E min T = 300 GeV, as illustrated in Table 6 where the optimal cuts in / E min T and the corresponding sensitivity reach are shown. Nevertheless, for comparison purposes it is more appropriate to use one single cut for all operators, and the value / E min T = 200 GeV indicated above will be used in the aW γ analysis for all operators. Figure 9 shows the 2 σ and 5 σ sensitivity to the Lagrangian terms cW AW − t 2 θ AB (right) and c 6 A 6 (left), depicted in the { f a , c i } plane and for 300 and 3000 fb −1 . The hatched area is excluded as it would correspond to f a ≤ 2 / E min T (corresponding to all signal events being outside the range of validity of the EFT). The 2 σ exclusion sensitivity reaches f a /cW 3.8 TeV (6.8 TeV) and f a /c 6 0.4 TeV (0.8 TeV) for an integrated luminosity of 300 fb −1 (3000 fb −1 ) of data, assuming the naive EFT validity criterion f a > 2 / E max T . 17 We also note that when f a drops below 2 TeV (twice the energy of the highest bin in the / E T distribution in Fig. 8) the reach in f a /c i is diminished: this can be seen from Figure 9 comparing the sensitivity curves with the grey reference lines which correspond to constant f a /c i . The rightmost parts of the sensitivity curves is "parallel" to the latter lines, signalling that the reach in f a /c i is constant in this region. For f a ≤ 2 TeV instead, the sensitivity lines drift upwards compared to the reference lines, meaning that in that region the analysis is sensitive only to smaller values of f a /c i than for the regions to the right. This effect is due to the fact that, as f a is diminished, the EFT validity gradually excludes the high-energy bins from the analysis, thus losing discrimination power (note also that considering the strict EFT validity criterion √ŝ < f a would amplify this effect). Alike to the conclusions in Sect. 7.1 based on mono-W and mono-Z searches, associated aW γ production at the LHC exhibits thus some (weaker but complementary) sensitivity to cW and to c 6 (for large values of the latter), and may potentially reach stronger constraints on cW than those obtained from LEP data, see Sect. 6.1, but weaker than the limits from rare decays -see the discussion around Eq. (35). It should be possible to further increase the reach of the analysis by using a sophisticated version of the transverse mass instead of / E T , the so-called m T 2 variable [109,110].

Decorrelating power
The vertices in Figs. 2 (left) and 6 contribute simultaneously to the production of a and aγ in association with W ± , as well as to VBF processes. Equations (88) and (87) show a dependence of these processes on certain combinations of coefficients and thus correlation effects are a priori expected. A combined analysis of a/aγ production in association with W ± and through VBF would enlarge the amount of kinematical information available, helping to disentangle their respective contributions. 18 We consider for illustration the simultaneous action of c 2 and cW on mono-W signals and on aW γ production. Note that it is precisely the contribution to the latter process of the a Zγ vertex in Eq. (88) which would allow one to separate the contributions of those two operator coefficients if data were sensitive to both, while from Eq. (87) alone the two 17 We warn the reader that, while the sensitivity to cW is expected not to appreciably change if the strict EFT validity criterion √ŝ < f a were required, the sensitivity to c 6 could be significantly modified. We leave a more precise assessment of this effect for future work. Nevertheless, our results indicate that, within the foreseen experimental prospects, no sensitivity is expected via mono-W and aW γ production to couplings other than cW and c 6 , that is, to {A 1 , A 2 , A 7 , A 8 } as these yield very suppressed contributions. The combination of mono-W data and aW γ production data will therefore allow one to disentangle the measurement/constraint of cW (a custodial-invariant signal common to linear and non-linear EWSB) from that of c 6 (only expected if the EWSB mechanism enjoys non-linear aspects and violates custodial symmetry).

Higgs signatures
Bosonic ALP-Higgs couplings are an interesting class of new signals which may be observable only within non-linear realisations of EWSB. Indeed, in the latter case a Zh n vertices with n ≥ 1 are expected at LO, while they do not appear in the linear expansion below NNLO, as discussed in Sects. 3.2 and 4. They could induce especially interesting ALP signals: non-standard Higgs decay (h → a Z) including invisible Higgs decay (h → aνν), associated ALP-Higgs production yielding an h + / E T "mono-Higgs" signature at the LHC, or even a hh + / E T di-Higgs signature, see Fig. 10. The possibility that a Zh couplings of heavy pseudoscalars with masses in the 0.5-2 TeV range may yield observable signals in pp → a → Zh (h → bb) was recently considered in the context of the linear expansion [85] (while the ALP signatures in Higgs and Z decays are presented in this paper for the first time), stemming from one loop corrections to the NLO linear Lagrangian and from d = 7 operators.
The set of operators in the Lagrangian Eq. (48) contributing a priori to those signals are {A 2D , A 3 , A 10 , A 11 , A 12 , Table 2. Nevertheless, only the first three will be phenomenologically relevant within the LHC prospects, as the contributions from the rest are comparatively much suppressed by extra powers of 1/(4π) and/or m 2 a /v 2 in the case of A 17 ; see Feynman rules FR.6 and FR.7. This section focusses thus on the prospects for detecting A 2D , A 3 and A 10 , both taken one by one and in a combined analysis. The vertices relevant to the mono-h signal and the nonstandard Higgs decays are showing that a 3 and a 10 enter in two different combinations into the processes considered: -the mono-h (and di-Higgs) signatures depend on the combinationã 3 s θ −ã 10 c θ via Z exchange, and also on the orthogonal oneã 3 c θ +ã 10 s θ via γ exchange -see Fig. 10; -in contrast, the non-standard Higgs decays depends only onã 3 s θ −ã 10 c θ . We are thus contemplating three coefficients and two distinct processes. For the range of ALP masses used in the present numerical simulations (m a ≤ 1 MeV), the fermionicinduced bound on c 2D in Eq. (57) would lead to disregard the impact of A 2D on LHC data if that coupling were considered by itself. Nevertheless, given that a different combination of couplings is at work in rare decays and in LHC signals, for consistency with the perspective of exploring complementary approaches, and given that for larger ALP masses the LHC signals would still be present in a refined analysis, the contributions of A 2D must be retained in the analysis to follow. With this strategy, the impact of A 3 and A 10 on the nonstandard Higgs decay width is subdominant with respect to that of A 2D , given the different v dependence; see Eq. (91). On the contrary, LHC data are instead quite sensitive to c 3 and c 10 , in addition to c 2D , given the stronger momentum dependence of A 3 and A 10 . This suggests that, in order to disentangle the contributions from A 3 and A 10 , a detailed study of the kinematic distributions of the mono-Higgs channel would be necessary, together with the combination of these results with those stemming from bounds on h → BSM from Higgs signal strength measurements. On the other side, a 3 and a 10 have a similar overall impact on the total mono-h cross section. For the sake of simplicity, we will then consider here only the impact of a 2D and a 3 , separately and combined, deferring the detailed study of A 10 to a future work.
A remark on the range of values of the operator coefficients is pertinent. Generally speaking, large values correspond to strongly interacting regimes, and NDA suggests c i ≤ 1, with the bound saturated in the strong regime. Nevertheless, as discussed in Sect. 3, a factor ( f /v) has been implicitly absorbed in the definition of the parameterã 2D = c 2D a 2D , where a 2D is the coefficient of the one-Higgs contribution in the polynomial F 2D (h). The ratio ξ ≡ v 2 / f 2 is not a parameter from the effective theory point of view, but it is currently bounded to be 0.2 in concrete models [105] such as composite Higgs scenarios. Numerically, this would translate into an enhancement of a factor f /v 2.3, which implies that the absolute value of the parameterã 2D can naturally exceed by at least 2-3 units the bare NDA constraintã 2D ≤ 1. In this section we will assume a maximum absolute valueã max = 3 for both a 2D andã 3 , along the same lines as the analysis presented in Sect. 7.2.

Mono-Higgs: pp → a h
The process pp → ah (h → 4 ) is considered next at 13 TeV LHC, and it follows the mono-Higgs analysis from the "Les Houches 2015" report [111], considering both 300 fb −1 and 3000 fb −1 integrated luminosity. Our signal sample is produced with MadGraph5_aMC@NLO [101], passed on to Pythia 8 [112] for showering and hadronisation and then to FastJet [113] for jet reconstruction. The reconstructed events are finally filtered imposing the selection cuts from Ref. [111], for a consistent comparison with SM backgrounds which are taken precisely from that reference.
The / E T spectrum can be used to disentangle the new interactions from the SM background. This applies in particular to A 3 , which induces a strong momentum dependence through both the a Zh and the aγ h contributions to the mono-h signal. This is illustrated in Fig. 11 for an integrated luminosity of 3000 fb −1 . As expected, the / E T spectrum produced by A 3 (orange line) is harder compared to that produced by A 2D (red line) while, at the same time, the total (no cuts) integrated cross section for the signal generated with A 3 is manifestly lower than the one induced by A 2D .
In order to quantify the potential for observing in future LHC data a mono-Higgs signal generated by either of the two operators A 2D and A 3 , the analysis is done in two different stages. 19 One operator at a time In a first stage, each of the two relevant operators, A 2D and A 3 , is considered individually, i.e. assuming that only one of the coefficients c 2D and c 3 has a non-zero value. With this choice, the procedure already described in Sect. 7.2, Eqs. (89) Fig. 11 / E T distributions for 4 + / E T signal and background for √ s = 13 TeV and 3000 fb −1 of integrated luminosity, after applying the selection cuts from [111]. SM / E T background distributions are obtained directly from [111], and the signal pp → ah (h → 4 ) / E T distribution is shown for A 2D (red) and A 3 (orange)   Figure 12 shows the σ = 2 and σ = 5 sensitivity regions obtained for the two coefficientsã 2D andã 3 individually (see Eq. (89)), and integrated luminosities of 300 fb −1 and 3000 fb −1 . As shown in Fig. 12, only a very restricted region of the parameter space for A 3 is accessible within 3000 fb −1 at the LHC, due to its very small cross section: it results in a 2 σ sensitivity to f a /ã 3 470 GeV, which is expected to further degrade if the strict EFT validity criterion √ŝ < f a would be considered.
In contrast, Fig. 12 illustrates that mono-Higgs signatures in the h → 4 final state at HL-LHC have the potential to explore some region of parameter space for A 2D within the range of EFT validity. The 2 σ exclusion sensitivity reaches f a /ã 2D 340 GeV (780 GeV) for an integrated luminosity of 300 fb −1 (3000 fb −1 ) of data. While considering the strict EFT validity criterion would somewhat degrade these limits, we also stress that considering other final states, e.g. h → γ γ , h → bb, would significantly increase the sensitivity of this search, and we leave such a study for the future.
These results can be contrasted with the bounds on f a /ã i , (i = 2D, 3) inferred from the current upper limit on Br(h → BSM) in Sect. 6.2, which is depicted as a green region in Fig. 12 (right). If only A 2D is considered, the area of parameter space which is to be probed by LHC with 3000 fb −1 is already ruled out by that limit. This is not the case when only A 3 is considered, since its contribution to h → BSM is very suppressed. Nevertheless, cancellations might exist amongst the contributions of those two operators to non-standard Higgs decays, in regions of the parameter space where a mono-Higgs signal could be expected at a testable level. This is the motivation for the second stage in the analysis: a combined study where both operators are considered simultaneously.

Combination of the two operators A 2D and A 3
In this case of simultaneous consideration, the shape of the / E T distribution after applying the analysis cuts can be estimated, for an arbitrary choice of f a ,ã 2D andã 3 , as where the index k runs over the distribution bins, and x k , y k , and z k represent the / E T prediction in the kth bin obtained with f a = 1 TeV and for the configurations (ã 2D = 1,ã 3 = 0), (ã 2D = 0,ã 3 = 1) and (ã 2D = 1,ã 3 = 1), respectively. With this estimate of the / E T distribution one can easily compute the maximal projected sensitivity to mono-Higgs signals, varying the lower cut in missing transverse energy, / E min T , in order to maximise the sensitivity σ at each The results are shown in the scatter plot in Fig. 13. The yellow (orange) points are those for which there exists a lower / E T cut within the EFT validity region, which allows one to observe a mono-Higgs signature with a significance of least 2 (5) σ at the 13 TeV LHC with 3000 fb −1 . The left and right panels distinguish the two cases in whichã 2D andã 3 have either opposite or same sign. In both cases, in the limit f a /ã 2D → ∞, the 2 σ and 5 σ sensitivity curves for f a /ã 3 converge towards values close to the optimal ones found in the A 3 -only analysis (the discrepancy is due to the different treatment of the / E min T cut). An analogous behaviour is also observed in the orthogonal direction.
More interesting is the region whereã 2D andã 3 are close in absolute value. In particular, it shows that the contributions to the mono-Higgs process stemming from the two operators produce destructive interference whenã 2D andã 3 have the same sign: forã 2D ã 3 (right panel) the signal is reduced compared to the case in which one of the two operators dominates, and the sensitivity is therefore lower in this region of the parameter space. On the other hand, forã 2D −ã 3 (left panel) constructive interference effects enhance mono-Higgs production, so that the LHC would be sensitive to larger values of f a /ã i than in the one-operator case.
As with the previous study, the results obtained for projected mono-Higgs searches in the (ã 2D ,ã 3 ) plane can easily be contrasted with the bound inferred in Sect. 6.2 from the current upper limits on Br(h → BSM). This is depicted as a grey-shaded region in Fig. 13 and seen to be more stringent for same-signã 2D andã 3 , as no cancellation can then take place in the dominant expression in Br(h → BSM); see Eq. (75).
As a result of the combination of the existing bound with the projected reach, it appears that mono-Higgs searches may be useful for probing a relevant region of the parameter space, namely that with 300 GeV | f a /ã 3 | 700 GeV, where the lower bound is a direct consequence of requiring the EFT validity. In this region, | f a /ã 2D | may be no smaller than 2-3 TeV, as lower values are already excluded by the h → BSM constraint that we derived from present data in Sect. 6.2. Overall, we find that although mono-Higgs searches at the LHC are sensitive to the presence of both operators A 2D and A 3 , they are not competitive in constraining f a /ã 2D with the Br(h → BSM) bound, neither with the fermionic-induced bound in Eq. (57) when that coupling is considered just by itself. On the other hand, they are more sensitive to the presence ofã 3 and therefore they may provide valuable, complementary, information in the study of the ALP's coupling to the Higgs.
In conclusion, if interpreted in terms of the presence of a light pseudo-Goldstone boson, and barring fine-tunings, the observation of a mono-Higgs signature at the LHC represents a smoking gun of non-linearity in the EWSB sector, as couplings such as a Z(γ )h are not to be found in the NLO Lagrangian of linear EWSB setups (see FR.1, FR.7). Within the effective Lagrangian in Eq. (42), the observation of this signal at foreseen LHC data can only be attributed to the presence of A 3 (or eventually A 10 ), asã 2D is out of reach in that data set given the range of values allowed by the current bounds.

A comment on di-Higgs production
The a Zhh interaction allows for di-Higgs final state, due to a quark-initiated hh + / E T production via Drell-Yan (see Fig. 10 (right)). This is in contrast to di-Higgs production in the SM, which is exclusively gluon-fusion initiated. Moreover, the presence of / E T in the final state could serve as an additional handle to suppress SM backgrounds to the di-Higgs process. This discussion highlights that a-h interactions could constitute a very promising avenue for non-linear ALP phenomenology at the LHC, which we intend to explore in the future.

Coupling to fermions
In this paper we have focussed on the relation of the ALP with the EWSB sector via bosonic operators, and explored the impact of couplings of the ALP to SM bosons. However, in Sects. 2 and 3 we noticed that bosonic operators would lead to ALP-fermion couplings via a field redefinition.
Although the bounds we obtained in Eqs. (33) and (57) when considering operators one at a time are very strong, it is worth exploring complementary searches at the LHC. The structure of these fermionic couplings is very specific, proportional to the Yukawa matrices; see the Feynman rules in Appendix B. One would then expect the ALP to couple more strongly to third generation quarks, provided the matrices X ψ in Eq. (18) are generic. We then consider the characteristics of the leading ALP production in association with a tt pair at LHC.
For ALPs stable on LHC scales, this final state is similar to searches for supersymmetric scenarios, where two stops are strongly produced and produce a signature of tt in association with two neutralinos (dark matter candidates). For example, via the LO coupling c 2D the production cross section of the final state tt+ALP, where the ALP is emitted as final state radiation -see FR.17, is given by In these searches, final states are selected by requiring a number of jets, b-jets with characteristics matching those of top decays. More importantly, a substantial cut on missing energy is required. For example, a recent study with 13 TeV data by ATLAS [114], the cut on missing energy for the channel of interest (TT) (topology of two tops) is 400 GeV. In our scenario, with single-production of a light pseudoscalar via strong production of two tops, the distribution of missing energy is not as hard as in scenarios where heavy stops are pair produced and inject a large boost into the neutralino. This is shown in Fig. 14, where we compare our results for f a = 1 TeV with the ATLAS data and Monte Carlo simulations for a supersymmetric scenario with 800 GeV stops and a light neutralino.
This type of analysis opens the way to further phenomenological explorations of the fermionic signals associated to ALP production. This is most relevant and promising in order to tackle the ALP-fermionic couplings identified in Appendix A, which are part of the complete NLO basis of operators -bosonic and fermionic -involving one ALP and established in this work. See also the phenomenological signals discussed just before Sect. 7.1.1.

Summary and outlook
In this paper we have developed a systematic approach to describe interactions of an axion or an axion-like particle (ALP) with special attention to the sector responsible for electroweak symmetry breaking (EWSB), obtaining the complete -bosonic and fermionic -NLO Lagrangian in the case that the latter is non-linearly realised. With this theoretical framework in place, we have then studied new collider phenomenology associated with ALPs, as well as explored the sensitivity of the LHC in the high-luminosity phase (HL-LHC). Both the approach and the phenomenological results in this paper are novel, and they will hopefully guide new searches at the LHC and the study of complementarity with other experiments at lower energies.

Theoretical developments
Neglecting ALP masses, we have developed a complete list of bosonic operators under two scenarios, with EWSB linearly and non-linearly realised, valid in all generality for any value of the axionic scale larger than the electroweak scale (and in the non-linear case also larger than its implicit electroweak BSM scale). In the linear case, in which the couplings involving an ALP first appear at d = 5, special attention has been paid to recalling the subtle effect of the operator ( † ← → D μ ) ∂ μ a f a , which induces a contribution to the two-point function involving longitudinal gauge bosons, and can be removed via a Higgs field redefinition. This redefinition generates new couplings of the ALP to fermions, with the distinctive feature of being proportional to the Yukawa couplings.
In the non-linear realisation, we have employed a systematic approach to classify the new operators order-by-order, and much care has been paid to define the expansion in both its non-linear and ALP sectors. A complete and nonredundant basis of operators involving an ALP has been determined, even though the impact analysis has focussed on the bosonic couplings. Several interesting features arise when considering ALPs coupled to a non-linear realisation of EWSB, in particular the existence -already at the leading order in the derivative expansion -of interaction vertices involving the Higgs and gauge bosons with the ALP. This is due to the fact that the two-point function stemming from the operator A 2D cannot be entirely traded by fermionic couplings (in contrast to the linear case above). Additionally, we find that the non-linear effects induce new Lorentz structures beyond those in the traditional (linear) ALPs couplings.
Furthermore, a detailed comparison of the differences and correspondences between the operators in the linear and nonlinear setups has been developed, as well as a prospective study on how to disentangle a priori both expansions if a signal is found.
For the most part, phenomenological studies on the ALP effective Lagrangians have focussed on couplings to photons, gluons and fermions. However, if the ALP couples to photons SM gauge invariance also implies the existence of similar couplings to the massive gauge bosons, irrespective of whether the mechanism behind EWSB gives rise to a linear or a non-linear expansion.
In this paper we have obtained new constraints on ALP couplings to SM particles, as well as provided a guide for future searches of ALPs and the sensitivity HL-LHC could reach, for ALP scales of O(TeV) or somewhat above. Special attention has been paid to the consistency of the kinematic regions used for each search with the assumption of validity of the ALP expansion in powers of 1/ f a .

Current constraints
We started by looking at new constraints on (linear) ALP couplings to hypercharge and left interactions. In particular, we looked for observables sensitive to the linear combination of SU (2) L × U (1) Y operators (cB, cW ) orthogonal to the coupling to photons, i.e. orthogonal to g aγ γ ∼ cW s 2 θ + cBc 2 θ . To account for the strong constraints on the value of g aγ γ we then imposed cB −t 2 θ cW in our analyses, effectively reducing the number of parameter by one. In Fig. 15 one can see that LEP constraints on the invisible width of the Z boson and LHC searches for final states with one massive boson and missing energy (mono-Z and mono-W channels) provide handles to probe the Wilson coefficient cW . We find that mono-Z limits impose at present a constraint f a /cW 4 TeV.
We also discussed the impact on bounds from rare decays to mesons and missing energy, and how they provide a complementary approach to accelerator searches. Besides the stringent constraints existing in the literature on f a /cW from the former searches -which strictly speaking only apply when all other operators are set to zero -a similar new bound on the strength of the linear operator O a has been obtained here.
In non-linear realisations of EWSB in particular, many other operators affect LHC physics. For ALP masses under 3 GeV, data on rare meson decays allowed one to strongly bound c 2D if considered by itself. Furthermore, of particular interest are operators which induce new type of couplings, specifically new couplings of the ALP to Higgs particles, e.g. ALP-Zh or ALP-Zhh, which are dominantly generated by the non-linear operators A 2D , A 3 and A 10 . In the LHC RunI the coupling ALP-Zh can be probed via non-standard Higgs decays; if the impact of the different operators contributing is considered one at a time, a bound on f a /ã 2D of the order of 3 TeV follows for ALP masses in the range 3-34 GeV.

Future sensitivity
We then moved on to examine the capability of the HL-LHC to search for ALPs. Apart from improvements on current channels (non-standard Higgs decays, mono-W and mono-Z ), we proposed and evaluated possible new channels at 13 TeV which could dramatically change our understanding of ALPs both in the linear and non-linear realisations.
Future improvements of mono-Z searches with 3 ab −1 of data could bring the collider sensitivity to the linear operator coefficient f a /cW to above 20 TeV. But the most striking sig- Fig. 15 Summary of the most significant constraints stemming from the studies on tree-level ALP couplings presented in this work, upon the assumption g aγ γ = 0 or equivalently cB = −t 2 θ cW . The upper bars down to mono − Z prospects included correspond to 95% CL existing constraints and expected reaches, inferred in Sects. 6  natures stemming from bosonic operators, i.e. mono-Higgs and associated W ± γ production plus missing energy, would access the non-linear operators mentioned before, A 3 and A 2D , and a new one, A 6 . We also propose the study of different channels, like mono-W in combination with aW γ production, to disentangle the presence of two different operators. On the other hand, the very sensitive mono-Z and mono-W signals may play a specially important role in probing fermion-ALP interactions; this will be tackled in a future study.
Besides these signals, we proposed to use the searches on stops in on-shell top final states to look for ALPs, whose couplings to quarks are derived from couplings to the bosonic sector and are proportional to the fermion mass.
This study motivates further work on ALP physics beyond the usual framework of couplings to photons and gluons, and more emphasis was placed on the effects in the sector responsible for electroweak symmetry. Additionally, we propose to perform dedicated experimental analyses in channels like mono-Higgs and new channels involving the ALP and two bosons in the final state, such as W ± γ and missing energy.
Although in this paper we presented a rather comprehensive analysis of the effective theory for ALPs as well as their phenomenology, there are a number of open issues that deserve further study. To name a few: the extension of the collider analysis to higher ALP mass regions (including signals from ALP decays), the study of vector-boson fusion channels, the analysis of ALP-fermion signals to probe the complete NLO basis of operators -bosonic and fermionic -established in this work, the combination of collider constraints with lower-energy experiments (particularly rare decays of mesons), and the evaluation of modifications to the history of the axion in the early universe due to the non-linear effects.
to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Funded by SCOAP 3 .

A Fermionic chiral ALP Lagrangian and complete basis
In what follows, a complete basis of operators -bosonic plus fermionic -which include an ALP insertion is determined, up to NLO for the chiral EWSB. Consider the following set of independent fermionic structures, assuming only one flavour family: where Y ψ (h) has been defined in Eq. (45),F 2D (h) is defined as F 2D (h) without its h-independent term and the prime on the F i and Y ψ functions denotes the first derivative with respect to h. h ψ i in Eq. (97) are the hypercharges given in the 2 × 2 matrix notation A consequence of Eqs. (95) and (96) is [31,68] which can be recast as and is valid order-by-order in the h expansion. Applying the EoMs above, the operators A 8 , A 11 , A 13 , A 17 in Eq. (49) can be removed as redundant, because tradable by flavour-blind structures of the type in Eq. (94). In summary, the complete basis of LO plus NLO operators of the EWSB chiral expansion which include an ALP insertion includes a total of 32 independent operators, considering only one fermion generation and disregarding the different coefficients inside the F i (h) functions; the extension to three generations is obvious.

B Feynman rules for the bosonic basis
This section provides a complete list of the Feynman rules for vertices involving an ALP and resulting from the NLO linear Lagrangian Eq. (16) and the chiral one Eq. (55), up to four legs. The coefficientsã i andb i have been defined in Eq. (50) in terms of the parameters in those Lagrangians; this is extended below for the operators A 15 and A 16 , which contain two functions F i (h) and F i (h), redefining c i a i a i → a i . The rules are computed: -choosing the momenta to flow inwards in the vertices -in unitary gauge -neglecting flavour effects, i.e. assuming hermitian and diagonal Yukawa matrices (Y ψ ≡ Y † ψ ) and V CKM ≡ (Greek indices will indicate flavour).
The Feynman rules for the linear case easily be obtained from those for the non-linear Lagrangian, in the limit c 1 , . . . , c 17 → 0, cB → cB, cW → cW ,

C Linear siblings
The interaction vertices described by the chiral operators in Sect. 3 can also be described in the context of linearly realised EWSB, through linear operators in which the Higgs resonance is embedded within the SM Higgs doublet. In this section, the connection between the two expansions is shown. Operators up to NNLO of the linear expansion have to be taken into account in order to encompass all the interaction vertices appearing in the chiral framework up to NLO. The chiral couplings involving an ALP discussed in this work can be grouped as those Connected to d = 9 operators in the linear expansion ∂ μ a f a , ∂ ν a f a (106) Connected to d = 11 operators in the linear expansion This shows that operators of the linear expansion up to d = 11 "collapse" into NLO or LO operators of the chiral one. Note that the leading corrections of the non-linear bosonic set encompass 1 (2 derivatives) + 20 (4 derivatives) = 21 couplings while the linear d = 5 level has only 4.

D Effects of fields redefinitions
The field redefinitions performed to remove the a-Z twopoint function stemming from the operator O a in the linear EFT, Eq. (8), and from its sibling A 2D in the chiral EFT, Eq. (47), can be generalised. The effects of generic redefinitions of the GB matrix U and of fermionic fields in both the linear and chiral cases will be discussed and compared next.

D.1 Chiral EFT
In the chiral EFT case, the most general redefinition of the GB matrix U and of fermionic fields can be schematically written Although the parameters x ψ L ,R are generically 3 × 3 hermitian matrices in flavour space, they will be taken to be flavour universal, x ψ L ,R ≡ x ψ L ,R . Moreover, without loss of generality, all the arbitrary x i parameters are taken to be real, and f = v will be assumed in order to simplify the notation.
where α i ≡ g 2 i /4π . The contributions in the last three lines, proportional to a X μνX μν , arise from the anomaly triangle and the sum runs over the three fermion generations. This is consistent with the result shown in Ref. [41].
The a-Z two-point function stemming from the operator A 2D can be completely removed by choosing x U = 2c 2D : this corresponds to the procedure described in Sect. 3.2. In addition, it remains the freedom to choose the six fermionic transformations so as to remove two fermionic terms among (ψ L γ μ ψ L )∂ μ a, (ψ L γ μ σ 3 ψ L )∂ μ a and ia(ψ L Uψ R ). For example, requiring that the result follows that μνW aμν a f a 3x QL + 2c 2D The parameter x QL is still free and can be set to zero: this corresponds to recasting the impact of the a-Z two-point function into a redefinition of the coupling cW plus the insertion of the fermionic term (∂ μ a)(ψγ μ γ 5 σ 3 ψ). This result is equivalent to that reported in Eq. (54).

D.2 Linear EFT
It is useful to reformulate the discussion of the previous paragraph for the linear EFT, in order to point out a few worthy differences. The most general field redefinition for this case is for ψ L = {Q L , L L }, ψ R = {u R , d R , e R }. As above, the fermion redefinitions generically act as 3×3 hermitian matrices in flavour space, while x ∈ R is chosen x ∈ R. The action of these redefinitions on the LO linear Lagrangian, L SM , Eq. (1), is where † ← → D μ ≡ † D μ − (D μ ) † , and the last three lines are identical to those for non-linear case, Eq. (110).
The parameter x can be conveniently chosen so as to remove the a-Z two-point function contained in the operator O a = ( † ← → D μ )∂ μ a: this is similar to what happened in the chiral case choosing conveniently the parameter x U . Moreover, it is also possible to choose in this linear case only one of the two axion-fermion operators (either the Yukawa-like or the vector-axial structure) by tuning the fermion field redefinitions, as described for the chiral case. For instance, focusing on the add vertex, it is possible to retain the structure ia(Q L d R ) choosing x QL = x d R ; alternatively, the coupling ∂ μ a(dγ μ γ 5 d) can be selected setting x Y D − x T QL Y D + Y D x d R ≡ 0. The major difference of the impact of the field redefinitions on the linear and chiral EFTs resides instead in the Higgs couplings: while in the linear case the operator O a is completely removed from the Lagrangian, including its couplings containing Higgs legs, this is not the case in the chiral case where only the pure a-Z two-point coupling is redefined away, as illustrated in Sect. 3.2, but in general not those involving the ALP, gauge bosons and Higgs legs. This follows from the fact that Higgs couplings and pure-gauge interactions are correlated in the linear case, while they are independent in the chiral one. The presence of couplings with the structure (Z μ ∂ μ a)h n , n ≥ 1, among the dominant deviations from the SM expectations, is a smoking gun of non-linearity, as such vertices appear in the linear EFT case only at NNLO (operators with d ≥ 7; see Sect. 4).