Flavourful Axion Phenomenology

We present a comprehensive discussion of the phenomenology of flavourful axions, including both standard Peccei-Quinn (PQ) axions, associated with the solution to the strong $CP$ problem, and non-standard axion-like particles (ALPs). We give the flavourful axion-fermion and axion-photon couplings and calculate the branching ratios of heavy meson ($K$, $D$, $B$) decays involving a flavourful axion. We also calculate the mixing between axions and heavy mesons $ K^0 $, $ D^0 $, $ B^0 $ and $ B_s^0 $, which affects the meson oscillation probability and mass difference. Mixing also contributes to meson decays into axions and axion decays into two photons, and may be relevant for ALPs. We discuss charged lepton flavour-violating decays involving final state axions of the form $\ell_1 \to \ell_2 a (\gamma) $, as well as $ \mu \to eee $ and $ \mu-e $ conversion. Finally we describe the phenomenology of a particular"A to Z"Pati-Salam model, in which PQ symmetry arises accidentally due to discrete flavour symmetry. Here all axion couplings are fixed by a fit to flavour data, leading to sharp predictions and correlations between flavour-dependent observables.


Introduction
One of the puzzles of the Standard Model (SM) is why QCD does not appear to break CP symmetry. The most popular resolution of this so-called "strong CP problem" is to postulate a Peccei-Quinn (PQ) symmetry, namely a QCD-anomalous global U (1) symmetry which is broken spontaneously, leading to a pseudo-Nambu-Goldstone boson (pNGB) called the QCD axion [1][2][3]. The two most common approaches to realising such a PQ symmetry is either to introduce heavy vector-like quarks (the KSVZ model) [4,5] or to extend the Higgs sector (the DFSZ model) [6,7]. The resulting QCD axion provides a candidate for dark matter [8][9][10] within the allowed window of the axion (or PQ symmetry-breaking) scale f a = 10 9−12 GeV [11].
It has also been realised that the PQ axion need not emerge from an exact global U (1) symmetry, but could result from some discrete symmetry or continuous gauge symmetry leading to an accidental global U (1) symmetry. Considering the observed accuracy of strong-CP invariance, it is enough to protect the PQ symmetry up to some higherdimensional operators [12][13][14]. In this regard, it is appealing to consider an approximate PQ symmetry guaranteed by discrete (gauge) symmetries [15][16][17][18][19][20][21]. Alternatively, attempts to link PQ symmetry protected by continuous gauge symmetries to the flavour problem were made in [22,23]. It is possible that PQ symmetry arises from flavour symmetries [24], linking the axion scale to the flavour symmetry-breaking scale, and various attempts have been made to incorporate such a flavourful PQ symmetry as a part of such continuous flavour symmetries [25][26][27][28][29][30][31][32][33]. It is also possible that PQ symmetry could arise accidentally from discrete flavour symmetries [34][35][36][37], as recently discussed [38] in the "A to Z" Pati-Salam model [39], where quarks and lepton are unified. This is difficult to achieve in a grand unified theory (GUT) based on SO(10) [40], which otherwise presents a stronger case for unification. 1 Recent efforts have been made [29,30,48] to unify the U (1) P Q symmetry with a Froggatt-Nielsen-like U (1) flavour symmetry [49]. The resultant axion is variously dubbed a "flaxion" or "axiflavon"; we shall refer simply to a "flavourful axion".
In this paper we focus on the phenomenology of flavourful axions, including both standard PQ axions, associated with the solution to the strong CP problem, and non-standard axion-like particles (ALPs) (see e.g. [50]). For a complementary analysis of ALP signatures and bounds at the LHC, see [51]. We present the flavourful axion-fermion and axion-photon couplings both for the standard axion and for ALPs, and show that they quite naturally are non-diagonal. We use these couplings to calculate the branching ratios for two-body decays of heavy mesons K, D, and B involving a flavourful axion. Moreover, we calculate the mixing between axions and neutral hadronic mesons K 0 , D 0 , B 0 and B 0 s and its consequences, which has not been discussed in the literature before. These can lead to new contributions to neutral meson mass splitting, meson decays into axions and axion decays into two photons which may be relevant for ALPs. We also discuss lepton decays involving final state axions, including two-body decays ℓ 1 → ℓ 2 a and radiative decays ℓ 1 → ℓ 2 aγ, as well as µ → eee and µ − e conversion. Finally we describe the phenomenology of the A to Z Pati-Salam model, which predicts a flavourful axion [38], and show how unification leads to correlations between different flavour dependent observables, as the down-type quark and charged lepton couplings are very similar. Notably, as the axion arises from the same flavon fields that dictate fermion Yukawa structures, no additional field content is necessary to solve the strong CP , and all axion couplings are fixed by a fit to quark and lepton masses and mixing.
The layout of the remainder of the paper is as follows. Section 2 describes the flavourful axion-fermion and axion-photon couplings both for the standard axion and for ALPs. In 1 These ideas should not be confused with alternatives to PQ symmetry, such as Nelson-Barr type resolutions to the strong CP problem [41][42][43][44], or GUT models where specific Yukawa structures have been proposed [45][46][47]. Section 3 we apply these couplings to calculate the branching ratios of heavy meson decays involving a flavourful axion. Section 4 discusses the mixing between axions and neutral mesons while Section 5 discusses lepton decays. Section 6 focusses on the phenomenology of the A to Z model, which predicts correlations between different flavour dependent observables, and Section 7 concludes. Appendix A gives more details about axion-meson mixing. Appendix B details the calculation the heavy meson branching ratios. Appendix C shows the derivation of the couplings in the A to Z Pati-Salam model and Appendix D tabulates the numerical fit to flavour data.
2 Axion couplings to matter

Lagrangian
Relevant to a discussion on axion-fermion interactions is the Lagrangian L = L kin + L m + L ∂ + L anomaly , (2.1) where L kin contains the kinetic terms, L m the fermion mass terms, L ∂ the axion derivative couplings to matter, and L anomaly the QCD and electromagnetic anomalies. In the physical (mass) basis below the electroweak symmetry-breaking scale, we have with the axion decay constant f a = v P Q /N DW defined in terms of the PQ-breaking scale v P Q and anomaly (or domain wall) number N DW . The axion-photon coupling is discussed in Section 2.3 below. The physical masses m f i are defined by m f i = (U † Lf M f U Rf ) ii , in terms of the mass matrix in the weak basis, M f , and unitary matrices U Lf , U Rf which transform left-and right-handed fields, respectively. The vector and axial couplings are given by 3) x f L , x f R are the fermion PQ charges in the left-right (LR) basis, 2 written here as (diagonal) matrices. As x f L , x f R are real, V f and A f (as well as chiral coupling matrices X L,R ) are Hermitian.
In this formulation, the implications of flavour structure are clear. If all generations of a fermion couple equally to the axion, the charge matrices x Lf,Rf are proportional to the and there is no flavour violation. In standard axion models, e.g. DFSZ, charges can be assigned such that x f L = −x f R and the axion couples only via A f ; this is generally not true in flavoured axion models. Meanwhile if x f L = x f R , the U (1) P Q transformation is not chiral (N DW = 0), the Goldstone field a doesn't couple to the QCD anomaly, the strong CP problem is not solved, and a is then interpreted as an ALP. 3 However, as long as x f L ,f R / ∝I 3 , we still get flavour-violating (vector and axial) interactions due to weak mixing encoded in U Lf,Rf .

Physical axion basis
The above Lagrangian describes an interacting axion, not necessarily in its mass eigenstate. The off-diagonal couplings to fermions are nevertheless V f and A f for the physical axion, as we will see. Unlike standard DFSZ models with PQ-charged Higgs doublets, our flavoured axion does not mix with the longitudinal component of the Z boson. We still need to identify the physical axion at low energy as the state orthogonal to π 0 and η mesons. One can then determine the canonical axion mass and couplings [52][53][54]. Let us briefly summarize how it works, following the prescription e.g. in [11]. The axion mass generated by the QCD anomaly coupling in Eq. 2.2 is conveniently calculated by rotating away the anomaly via chiral transformations of light quarks (q = u, d, s), This leads to a low-energy effective Lagrangian below the chiral symmetry-breaking scale, Using the relation ū L u R = d L d R = m 2 π f 2 π /(m u + m d ), the axion-pion mixing term vanishes. We identify the state a in Eq. 2.5 as the physical axion and extract its mass, There remains additional mixing with heavier mesons such as η ′ which provide further small corrections. A precise calculation performed in [55] gives us m a = 5.70(6)(4) 10 12 GeV f a µeV. (2.7) The transformation in Eq. 2.4 affects also the axion-quark couplings. For example for u, d and s quarks, the axion-quark Lagrangian in Eq. 2.2 is transformed into the physical basis, We see that the diagonal couplings are modified by an amount proportional to N DW , whereas the off-diagonal couplings are unchanged. Physically, this is a consequence of the QCD anomaly being flavour-conserving, and unable to mediate flavour-violating interactions that contribute to c sd .
The above discussion identifies the physical axion basis in the limit of no kinetic mixing between the axion and heavier mesons. Such mixing, induced by the effective Lagrangian in Eq. 2.8, needs to be further diagonalized away to obtain the physical axion basis. This will be discussed in detail in Section 4 and Appendix A. The kinetic mixing contribution is negligibly small for the standard QCD axion with m a ≪ m π and f a ≫ f π , but can be important for an ALP.

Decay constant and axion-photon coupling
In standard axion scenarios, the decay constant f a is defined by v P Q /N DW , where N DW is the QCD anomaly number. Provided the U (1) P Q symmetry is broken by the VEV of a single field φ with PQ charge x φ , we simply have v P Q = x φ v φ . 4 In more general models, where several fields φ contribute to symmetry breaking, we define v 2 We will encounter exactly this scenario when discussing the A to Z model presented in Section 6.
The axion-photon coupling aFF defined in Eq. 2.2 is given in terms of the electromagnetic anomaly number E, through the coefficient In unified models, such as the A to Z model with Pati-Salam unification presented in Section 6, the ratio of anomaly numbers is fixed to E/N DW = 8/3, giving c aγ ≈ 0.75.

Heavy meson decays
The flavour-changing vector couplings in L ∂ may lead to observable decays of heavy mesons into axions. A general study of such flavour-changing processes involving a (massless) Nambu-Goldstone boson was made in [56], which is applicable to our flavourful axion. For a two-body decay P → P ′ a of a heavy meson P = (q P q ′ ) into P ′ = (q P ′ q ′ ), the branching ratio is given by with V f as defined in Eq. 2.3. Its indices q P q P ′ relate to the constituent quarks, e.g. a K + → π + a decay proceeds bys →da with coupling strength V d sd ≡ V d 21 . For completeness, a rederivation of Eq. 3.1 is provided in Appendix B. It depends on a form factor f + (q 2 ) encapsulating hadronic physics, where q = p a = p P − p P ′ is the momentum transfer to the axion. The lightness of the axion means we can safely take the limit q 2 → 0. For kaon decays, f + (0) ≈ 1 to good approximation. For heavier mesons, we use results from lattice QCD [57], summarised in Table 1. Table 1. Form factors f + (0) extracted from [57] for K, D and B decays.
B and B s decays B physics has a rich phenomenology, and is recently of particular interest due to persistent anomalies in observed semileptonic B decays at the LHC, which may be evidence for charged lepton flavour violation (cLFV) [70]. Rare B decays of the type B → π(K)νν, while generally not as tightly constrained as those for kaons, may also provide insights into new physics. A dedicated search for decays like B → π(K)a with a light invisible particle a was made by CLEO, which collected 10 7 BB pairs throughout its lifetime. It provides the limits Br(B ± → π ± (K ± )a) < 4.9 × 10 −5 and Br(B 0 → π 0 (K 0 )a) < 5.3 × 10 −5 at 90% CL [71]. More recent and powerful experiments, namely BaBar and Belle, have not yet provided limits on this exact process. However we may estimate their experimental reach by the stated limits on the decays B → π(K)νν, which are typically O(10 −5 ) (see Table 2), an improvement of approximately one order of magnitude. The upgraded experiment Belle-II at SuperKEKB is expected to collected approximately N = 5 × 10 10 BB pairs, improving the limits on many rare decays [72]; assuming the sensitivity scales as √ N , we may expect an O(10 2 ) improvement in branching ratio limits.
It is worth noting that the decay B 0 → π 0 a, predicted by flavoured axion models, has not been analysed explicitly by experiments. However, some information may be gleaned from searches for the SM process B 0 → π 0 νν, which are a background to the axion signal. Generically, any bound on the SM decay will translate into a bound as strong (or stronger) on the two-body decay to an axion. Finally, we remark on the fact that also decays of the form B 0 s → K 0 a and B 0 s → η(η ′ )a are allowed, but no meaningful experimental information is available.

D and D s decays
Little is said in the literature about decays of charmed mesons of the form D ±,0 → π ±,0 a or D ± s → K ± a, or the corresponding decays involving a νν pair. The branching ratio for D → π(K)a may be easily calculated using the same formulas for K and B decays, given below. The trivial requirement that Br(D → π(K)a) < 1 allows us to place weak bounds on v P Q of O(100) TeV, but without an experimental probe, little more can be said. As we will show below, the predicted branching ratios are anyway expected to be rather small when compared to K and B decays, which have corresponding branching ratios approximately three and one order of magnitude greater. In conclusion, while further experimental probes of D decays are of course welcome, they are not expected to be more sensitive to flavoured axions than other decays. On the other hand, in flavoured axion scenarios only D decays can probe the up-type quark Yukawa matrix.

Bounds
Ultimately the experimental data can be used to constrain the ratio |V f q P q P ′ |/v P Q for a given decay. Collecting terms in Eq. 3.1, we define a branching ratio coefficientc P →P ′ , which depends only on hadronic physics, by The values ofc P →P ′ are tabulated in Table 2, along with experimental limits on the branching ratio and the corresponding bound on v P Q , where available. D, D s and B s decays have no experimental constraints, however we can compute the numerical coefficientsc, which are all O(10 −14 − 10 −13 ). These are also given in Table 2.

Decay
Branching ratio Table 2. Branching ratios (upper limits) and corresponding bounds (lower limits) on the PQbreaking scale v P Q from flavour-violating meson decays. Bold font marks the current best limit from searches for P → P ′ a, while parentheses mark the bound on the rare decay P → P ′ νν, which should be comparable. Asterisks ( * ) mark the expected reach of current or planned experiments.

Axion-meson mixing
In this section we discuss the mixing between axions and neutral hadronic mesons, and the impact on the meson oscillation probabilities. Such a mixing effect can also lead to new contributions to both meson decays into axions and axion decays into two photons. Although the mixing effect will turn out to be negligible for PQ axions which solve the strong CP problem, it may be relevant for non-standard axions such as ALPs. Readers who are not interested in ALPs may skip this section, since it will not lead to any competitive bounds on PQ axions.

Parametrisation of mixing
Axion-quark couplings in the mass-diagonal basis were discussed in Section 2.2. Relevant to meson mixing are the terms These derivative couplings translate into effective axion-meson couplings where f P is the meson decay constant for P = π 0 , η, η ′ , K 0 , K 0 , and where η P ≡ c P f P /v P Q . This is naturally generalised to include also mesons containing c and b quarks. For a QCD axion with m a ≪ m P and f a ≫ f P , there is almost no impact on the standard meson dynamics. However, the results are valid for generalised ALPs, where the effect may be detectable.

Meson mass splitting
Axions and ALPs with off-diagonal quark couplings will mediate mixing between a heavy neutral meson P 0 (P = K, D, B, or B s ) and its antiparticle P 0 in addition to that from weak interactions. An explicit calculation, showing how axion interactions yield an additional contribution to meson mass splittings, is given in Appendix A. We quote the result, namely that The total mass difference is then given by ∆m P = (∆m P ) SM +(∆m P ) axion . As an example, consider the effect of axion-kaon mixing on the K 0 L − K 0 S mass difference, experimentally measured to be (∆m K ) exp = (3.484 ± 0.006) × 10 −12 MeV [78]. The error is dominated by the theory uncertainty, which may be large [79]; near-future lattice calculations aim to reduce the error on ∆m K to O(20%) [80], with further improvements from next-generation machines. As a conservative estimate, we shall only demand the axion contribution to any ∆m P is not larger than the experimental central value. We then have |η K 0 | 8 × 10 −8 , which (assuming c K 0 ≈ 1) corresponds to the bound v P Q 2 × 10 6 GeV. Similar results for D, B and B s mixing are tabulated in Table 3. Belle-II is expected to improve the sensitivity of D 0 − D 0 mixing by about one order of magnitude with the full 50 ab −1 of data [81]. Table 3. Limits on v P Q from contributions to neutral meson mass differences. Measured values of ∆m P are given in the PDG [78]. Meson decay constants f P 0 are extracted from global averages given in [82].

Axion-pion mixing and ALPs
We have seen that axion-meson kinetic mixing can affect the oscillation probability (and thereby the mass difference) of neutral heavy mesons, arising from off-diagonal quark couplings of axions. In this subsection, we will see that even flavour-diagonal couplings can lead to interesting consequences. As shown in Eqs. 4.1 and 4.2, there arises in particular axion-pion kinetic mixing as a consequence of the physical π 0 containing a small admixture of the nominal axion and vice versa. This induces axion contributions to any process normally involving π 0 . Kinetic diagonalisation (as in Eq. 4.3) induces mass couplings of the form where η π 0 = c π 0 f π /v P Q =c π 0 f π /f a , withc π 0 ≡ c π 0 /N DW . This is subsequently diagonalised by a 2 × 2 rotation in terms of an angle θ π , where Starting from the canonical physical basis in Eq. 4.1, the physical basis accounting also for kinetic mixing is thus obtained by field transformations a → cos θ π a + sin θ π π 0 (4.7) To leading order in η π 0 , we have For a QCD axion with m a ≪ m 0 π and η π 0 ≪ 1, its contribution to the physical pion is vanishingly small. However, this mixing may be interesting for more general ALPs, where the mass and decay constant are not necessarily correlated.
The axion-meson mixing effect discussed above can modify decays of heavy mesons to lighter mesons plus an axion, as well as to the decay of an axion to two photons. The basic idea is very simple: in the standard hadronic decay of a heavy meson into two pions, one of the neutral pions in the final state can convert into an axion via the mixing effect discussed above, leading to a final state consisting of an axion. Similarly, the standard decay of a neutral pion into two photons can also mediate the decay on an axion into two photons.
Applying Eq. 4.8 to an ALP, still denoted by a, perhaps the most interesting processes induced by mixing are K + → π + a and a → γγ. Considering only the mixing-induced effect, we have Taking the ballpark of Br(K + → π + a) 10 −10 listed in Table 2 and Br(K + → π + π 0 ) = 20.67%, we find a mass-dependent bound which is applicable for m a 110 MeV. Similarly, one finds the axion decay to photons In the SM with massless valence quarks and N C = 3 colours, we have [83] Γ The standard form of the axion-photon coupling, 1 4 g aγ aFF , gives Γ(a → γγ) = 1 64π g 2 aγ m 3 a . We may then write the mixing-induced axion-photon coupling as Therefore the bound in Eq. 4.10 corresponds to Extensive studies of ALPs over a wide range of parameter space (summarised in e.g. Let us finally note that the E787 experiment searched for K + → π + a followed by a → γγ in the range of m a = 5 − 100 MeV [85]. Combining the two expressions in Eqs. 4.9 and 4.13, the E787 result gives (for m a = 10 − 96 MeV) the bound (g aγ ) mix 5 × 10 −5 GeV −1 , (4. 16) which is less stringent than Eq. 4.14.
5 Lepton decays Two-body lepton decays of the form ℓ 1 → ℓ 2 a follow analogously to meson decays, with the notable difference that both axial and vector couplings contribute, since the decaying particle has non-zero spin. We define a total coupling C e ℓ 1 ℓ 2 by As done for mesons in Eqs. 3.2-3.3, the branching ratio may once again be written in terms of a coefficientc ℓ 1 →ℓ 2 , by These are evaluated, with corresponding limits placed on v P Q , for the three possible lepton decays. The results are tabulated in Table 4. The most interesting of these is µ + → e + a, for which the SM background consists almost entirely of ordinary β decay, µ + → e + νν. The muon decay width Γ µ is given to good approximation by Γ µ ≃ Γ(µ + → e + νν) ≃ G 2 F m 5 µ /(192π 3 ). Assuming µ + → e + a decays are isotropic, i.e. the decay is purely vectorial (or axial), the experiment at TRIUMF provides the limit Br(µ + → e + a) < 2.6 × 10 −6 [86], corresponding to v P Q /|V e 21 | (or |A e 21 |) > 5.5 × 10 9 GeV. They searched specifically for decays with an angular acceptance cos θ > 0.975, where θ is the positron emission angle; in this region SM three-body decays are strongly suppressed. The TWIST experiment [87] has performed a broader search, accommodating non-zero anisotropy A as well as massive bosons, but are less sensitive for isotropic decays in the massless limit. The limits for isotropic (A = 0) and maximally anisotropic (A = ±1) decays are given in Table 4.
Let us sketch the angular dependence of µ → ea decays, which are not generally isotropic, as these would relate to TWIST; the formulas generalise immediately to τ decays. Consider µ + with a polarisation η = (0, η) decaying into a positron with helicity λ e = ±1 and momentum k e , as well as an axion. Neglecting m e and m a , where η · k e = −|k e | cos ϑ ηe . We can describe the degree of muon polarisation P µ as the projection of η onto the beam directionẑ, i.e. P µ ≃ cos ϑ ηz = η ·ẑ/|η|. For a more precise treatment one should consider the distribution of η in a muon ensemble, but as we shall assume all muons are highly polarised opposite to the beam direction, i.e. P µ ∼ −1, this is sufficient for our purposes. TWIST measures the positron emission angle θ = ϑ ηz − ϑ ηe ; for highly polarised muons, we have cos ϑ ηe ≃ P µ cos θ. Summing over λ e , the differential decay rate is given by where we define the anisotropy The limiting cases are A e 21 = V e 21 , giving A = −1 (corresponding to an SM-like V −A current interaction), or A e 21 = −V e 21 , giving A = 1 (a V + A interaction). The signal strength with respect to the SM background is maximised for A = 1, particularly in the region with cos θ ∼ 1. The A to Z model, discussed below, predicts exactly this scenario, although the high predicted PQ scale v P Q ∼ 10 12 GeV implies the signal is very small despite the enhancement.
Finally, the Mu3e experiment, primarily designed to look for µ → eee (discussed below), can also be used to test for µ → ea, and tentatively probe scales of v P Q 10 10 GeV [88] by the end of its run.

Decay
Branching ratio Experimentc ℓ 1 →ℓ 2 v P Q /GeV   Table 4. Branching ratios (upper limits) and corresponding bounds (lower limits) on v P Q from two-body cLFV decays. The assumed anisotropy A can be related to the formula in Eq. 5.6.
Additionally, we may examine decays with an associated photon, i.e. ℓ 1 → ℓ 2 aγ. These can be studied in experiments searching for ℓ 1 → ℓ 2 γ, which, if experimentally measured, are unequivocal signs of new physics; in the SM, Br(µ → eγ) ∼ 10 −54 , certainly unobservable. The differential decay rate for ℓ 1 → ℓ 2 aγ in the limit of m ℓ 2 = m a = 0 may be expressed by where f (x, y) is a function of x = 2E ℓ 2 /m ℓ 1 , y = 2E γ /m ℓ 1 , i.e. (twice) the fraction of invariant mass carried away by the lighter lepton and photon, respectively. Energy conservation requires x, y ≤ 1 and x + y ≥ 1. Moreover, the angle θ 2γ between ℓ 2 and the photon is fixed by kinematics to Alternatively one can write the decay rate in terms of x and c θ ≡ cos θ 2γ , i.e.
We may relate the branching ratios of decays with and without a radiated photon by The radiative decay possesses two divergences: an IR divergence due to soft photons (x ≃ 1) and a collinear divergence (θ 2γ ≃ 0). In practice, appropriate cuts are made on the minimum photon energy and angular acceptance well away from the IR-divergent region. Such cuts were discussed in the context of ℓ 1 → ℓ 2 γ decays [90], in particular as they related to LAMPF [91] and MEG [92] experiments. The region of interest for MEG is for x, y ≃ 1, or equivalently c θ ≃ π, where the SM background disappears. However, decays with an associated flavoured axion are also suppressed in this limit, i.e. the integral f vanishes for very soft axions. One might consider a broader region of phase space, provided the induced backgrounds 5 are under control. A comprehensive experimental study of such signals, e.g. for the MEG-II upgrade [93], would be welcome. An explicit limit on µ → ef γ, where f is a light scalar or pseudoscalar, is given by the Crystal Box experiment, which sets Br(µ → ef γ) < 1.1 × 10 −9 at 90% CL [94]. Unlike the TRIUMF experiment [86] discussed above, this limit does not assume isotropic decays. Using the same cuts 6 we find f ≃ 0.011, yielding the bound v P Q /GeV > 9.4 × 10 8 |C e 21 |. In Table 5 we summarise current and future experimental limits on branching ratios of ℓ 1 → ℓ 2 γ.
µ → eee and µ − e conversion We may also consider processes without an axion in the final state. Axion mediation will induce the decay µ → eee, although the presence of two axion vertices and additional suppression by 1/v P Q means these processes are again only interesting for ALPs. The current upper bound on the branching ratio is Br(µ + → e + e − e + ) < 1.0 × 10 −12 , set by SINDRUM [97]. The Mu3e experiment [98] currently under development is expected to start taking data in 2019, and will significantly improve the sensitivity by four orders of magnitude, i.e. Br(µ → eee) 1 × 10 −16 . To lowest order in m 2 e , the branching ratio for the axion-mediated decay is given by

(5.11)
Assuming O(1) couplings, we see that such decays are only reachable by experiment provided v P Q 10 6 GeV. As the axion (or ALP) also couples to quarks, one may consider µ − e conversion in nuclei, mediated by the axion. The relevant couplings are now C e 21 and the axion-nucleon coupling g aN = C aN m N /v P Q . The numerical factor C aN is model-dependent, given in terms of flavour-diagonal couplings of the up and down quarks. In standard cases these are essentially given by the quark PQ charges (see e.g. [53] for standard formulae), but in more general scenarios such as a flavoured axion, these can deviate significantly. 7 The axion-mediated µ − e conversion is a spin-dependent process which was discussed in [100]. The conversion-to-capture ratio in a nucleus (A, Z) is qualitatively given by where q 2 ≈ m 2 µ is the momentum-transfer and S (A,Z) N is the total nucleon spin of a nucleus (A, Z). Not accounted for here are nuclear spin and structure form factors, which were discussed in [100] and are O(1). The suppression by v 4 P Q suggests µ − e conversion is only realistically detectable in ALP scenarios. The current best limit comes from SINDRUM-II: R Au µe < 7 × 10 −13 [101]. Assuming again O(1) couplings and form factors, SINDRUM-II sets v P Q 10 6 GeV, comparable to the µ → 3e bound. The upcoming experiments Mu2e and COMET are both looking for µ − Al → e − Al, and both aim to probe R µe < 6 × 10 −17 at 90% CL [102,103], a factor 10 4 improvement over the SINDRUM result.

A to Z Pati-Salam Model
We present here a recently proposed QCD axion model [38], based on the rather successful A to Z model [39], which seeks to resolve the flavour puzzle by way of Pati-Salam unification coupled to an A 4 × Z 5 family symmetry. The family symmetry is completely broken by gauge singlet flavons φ, which are triplets under A 4 and couple to left-handed SM fields. However, information about the underlying symmetry remains in the particular vacuum structure of the flavons. The initial viability of the model, which predicts certain Yukawa structures based on the so-called CSD(4) vacuum alignment, was demonstrated in [39], and leptogenesis was considered in [104].
In [38], we updated and improved the numerical fit to flavour data, as well as demonstrating that, with small adjustments, the A to Z model can resolve the strong CP problem. The axion then emerges from the same flavons that are responsible for SM Yukawa couplings; in other words, no additional field content is necessary to realise a PQ axion. Moreover, as all Yukawa couplings are fixed by the fit to data, also the axion couplings are known exactly, with no additional free parameters. As the focus of this work is on axion couplings to matter, we limit our discussion primarily to the resultant Yukawa and mass matrices of the SM fermions. However in Appendix C we derive explicitly the axion-matter couplings from the Yukawa superpotential. In Appendix D we provide the best fit parameters for the A to Z model and corresponding axion couplings.

Mass matrices and parameters
The charged fermion Yukawa matrices are given at the GUT scale by where m i are real, with dimensions of mass and η, ξ are phases. Note that the scales of the various free parameters are constrained by the model itself. By rather simple assumptions about the flavon VEVs, discussed fully in [39], and assuming all dimensionless couplings in the renormalisable theory are O(1), we may infer generic properties of the parameters. Parameters a, b and c correspond closely to the three up-type quark Yukawa couplings, i.e. a ≪ b ≪ c ∼ 1. Meanwhile y 0 d , y 0 b and y 0 s are correlated with the down-type quark Yukawa couplings, i.e. y 0 d ≪ y 0 b ≪ y 0 s . B is an O(1) ratio of couplings, and ǫ i3 ≪ 1 are small perturbations of a flavon VEV. The O(1) factor x is a Clebsch-Gordan factor, introduced by additional Higgs multiplets in a variation of the Georgi-Jarlskog mechanism. In the neutrino sector, the principle of sequential dominance on which the model relies demands a normal ordering and strong mass hierarchy, with m a ≫ m b ≫ m c , predicting the lightest neutrino with a mass of < 1 meV. A fit of these parameters to data has been performed [38], with central results collected in Appendix D. The model is fitted to experimental results 8 by an MCMC analysis. Bayesian credible intervals are also provided, showing that despite a large number of free parameters, small tensions in the predictions for θ ℓ 23 and δ ℓ may be further probed by increased sensitivity in current and future neutrino experiments.
The PQ-breaking scale v P Q is determined primarily by the largest VEV among the flavons φ carrying PQ charge. The VEV of this flavon (named φ u 2 ) is proportional to the parameter b in Y u , which in turn is dominantly responsible for the charm quark Yukawa coupling; as the third generation largely does not couple to the PQ symmetry, this is the heaviest relevant fermion in the flavoured axion theory. The numerical fit gives |b| = 3.4 × 10 −3 . The details of how the flavons and parameters are related are given in Appendix C,

Predictions
Once the fermion mixing matrices are known from the fit, we can immediately determine the vector and axial coupling matrices V f and A f using Eqs. 2.3. Recalling that V f and A f are Hermitian, we have We may immediately compute the branching ratios for all aforementioned meson and lepton decays and neutral meson mass splittings. The only remaining parameter is the axion scale v P Q , which is only loosely constrained by naturalness arguments to be v P Q ∼ 10 12 GeV. In principle, any two measurements of either flavour violation (as discussed in this paper), the axion-photon coupling g aγ , or the axion-electron coupling g ae , would be sufficient to overconstrain v P Q in this model. Here, g aγ is fixed by v P Q and the domain wall number N DW = 6. In other words, although the charge assignments are very different, the A to Z model will resemble the original DFSZ model in experiments sensitive to g aγ , such as haloscopes and helioscopes. In Table 6 we give the model predictions for some of the most phenomenologically interesting experimental probes. We explicitly set v P Q = 10 12 GeV when computing the branching ratio.

Process
Branching ratio (v P Q = 10 12 GeV) Experimental sensitivity 8.3 × 10 −13 5 × 10 −9 (Mu3e future) Table 6. Predictions for axion-induced processes in the A to Z model. Branching ratios are computed assuming v P Q = 10 12 GeV, which should be true up to an O(1) factor.
In summary, we find that evidence for or against the A to Z model must come primarily from the (non-)observation of K + → π + a; the NA62 experiment is expected to be able to exclude most of the model's parameter space. A next-generation experiment could exclude the model definitively. Secondary sources of interest are decays of K 0 L and µ + ; detecting the A to Z model would require v P Q to be slightly lower than the natural prediction. However, two-body decays may be powerful channels for excluding other flavour models, sometimes placing stronger constraints than those from astrophysics, which typically give the strongest limits on v P Q .

Decay correlations
The prominent feature of unified models is correlations between Yukawa couplings of quarks and leptons. In this A to Z model, Y d ∼ Y e , up to diagonal Clebsch-Gordan factors. Notably, the (2,2) entries differ by a parameter x, which is determined by the fit and acts as a necessary Clebsch-Gordan factor to distinguish the strange quark and muon masses. Naturally, one expects x ∼ m µ /m s > 1; at the GUT scale, m µ /m s ∼ 4.5. Now consider the two decays K + → π + a and µ + → e + a, which are the most experimentally promising among flavoured axion decays. Their branching ratios are determined, respectively, by the couplings |V d 21 | 2 and |C e 21 | 2 = 2|V e 21 | 2 . With all other parameters held constant, the dependence on x of the ratio r = |V e 21 | 2 /|V d 21 | 2 is well approximated empirically by r ≈ 6.9 e −1.8 √ x .
We then find that the ratio of branching ratios R µ/K is given by For the model best fit point x = 5.88, R µ/K ≈ 0.38. Should both of these decays be measured experimentally, such a ratio, which is independent of the axion scale v P Q , is a valuable statistic for constraining the flavour sector of the model, giving immediate information about the high-scale parameters. For models where Y d ∼ Y e , typically x > 1; generically one expects R µ/K < 1. Similar ratios can be considered for other decays of K or B mesons and charged leptons. However, as this requires direct observation of both decays, which are suppressed in both sectors, these are realistically feasible only for more general ALPs.

Conclusion
In this paper we have reviewed and extended the phenomenology of flavourful axions, including both standard PQ axions, associated with the solution to the strong CP problem, and also for non-standard axion-like particles (ALPs) which do not care about the strong CP problem but which may generically arise from spontaneously broken symmetries and multiple scalar fields. We have presented the flavourful axion-fermion and axion-photon couplings both for the standard axion and for ALPs, and shown that they quite naturally are non-diagonal. Using these couplings, we have calculated the branching ratios for twobody decays of heavy mesons K, D, and B involving a flavourful axion. We have also calculated the mixing between axions and hadronic mesons K 0 , D 0 , B 0 and B 0 s and its consequences, which has not been discussed in the literature before. These can lead to new contributions to neutral meson mass splitting, meson decays into axions and axion decays into two photons which may be relevant for ALPs. We have also discussed charged lepton flavour-violating processes involving final state axions, of the form ℓ 1 → ℓ 2 a(γ), as well as µ → eee and µ − e conversion.
Correlations between observables may arise in specific flavourful axion models. To illustrate this, we have described the phenomenology of the A to Z Pati-Salam model, which predicts a flavourful QCD axion [38], and shown how unification leads to correlations between different flavour-dependent observables, as the down-type quark and charged lepton couplings are very similar. Within this model, since the axion arises from the same flavon fields that dictate fermion Yukawa structures, no additional field content is necessary to solve the strong CP problem, and all axion couplings are fixed by a fit to quark and lepton masses and mixing.
In conclusion, flavourful axions can appear naturally in realistic models and have a rich phenomenology beyond that of the standard KSVZ/DFSZ paradigms. In this paper we have attempted to provide the first comprehensive discussion of a number of relevant processes involving flavourful axions, including meson decays and mixing, as well as charged lepton flavour-violating processes. For a QCD axion, typically the bounds from such processes are very weak. However, K → πa is an ideal channel for looking at these types of decays, especially in specific models such as the A to Z Pati-Salam model, where exactly this type of flavour-violating coupling is the largest. By comparing multiple flavour-violating processes for both quarks and leptons, one may experimentally probe lepton and quark Yukawa structures which determine their masses and mass ratios. Although for QCD axions some of the flavour-violating processes we consider are not competitive, for flavourful ALPs many of them may be important, especially if the symmetry-breaking scale is 10 6 GeV or less.

A Axion-meson mixing
Kinetic mixing between the axion and neutral mesons (any of the pairs where P 0 , P 0 are strong eigenstates. The superscript 0 signifies we are not in a diagonal (physical) basis. We define the CP eigenstates P 1 (even) and P 2 (odd) by Inversely, In the case of the kaon, the states K 1,2 are close (but not exactly equal) to the physical eigenstates K S and K L , so defined by having definite lifetimes in weak decays. They are given in terms of a small parameter ε K ∼ 10 −3 characterising indirect CP violation, We will neglect such a contribution in this work. Rewriting L 0 kin in terms of P 1,2 , we have Note the wrong sign of the P 2 diagonal kinetic and mass terms; these can be made canonical by letting P 2 → iP 2 , which introduces a factor i in the kinetic mixing term. This can be absorbed in new couplings η 1,2 , defined by We also define a "total" coupling η 2 ≡ η 2 1 + η 2 2 = 2η P η * P = 2|η P | 2 . We diagonalise the kinetic Lagrangian by transformations The mixing is transferred to the mass matrix, giving where Φ = (a, P 1 , P 2 ) and The eigenvalues of M 2 Φ , corresponding to the physical squared masses, are given to good approximation for small η by Recalling that η 2 = 2|η P | 2 , we conclude that We have not taken into account a mass difference from SM physics, such as for kaons, where K S and K L differ by approximately 3 µeV.

B Heavy meson decay branching ratio
The Feynman rule for the vertex (∂ µ a)q 1 γ µ q 2 defined by the Lagrangian in Eq. 2.2 is where q = p a = p 1 − p 2 is the momentum transfer to the axion. For a two-body decay P → P ′ a of a heavy meson P = (q P q ′ ) into P ′ = (q P ′ q ′ ), the amplitude may be written It depends on a form factor f + (q 2 ) encapsulating hadronic physics. The lightness of the axion means we can safely take the limit q 2 → 0, wherein the form factor is defined by the relation The differential decay rate in the rest frame of P is with the momentum of decay products |p P ′ | = |p a | given by (B.6) Integrating over the solid angle Ω yields a factor 4π, arriving at

Superpotential
The effective Yukawa superpotential below the GUT scale, once messengers X have been integrated out, is given by with explicit couplings λ, which are naturally O(1) and assumed real by a CP symmetry at high scale. In the corresponding Lagrangian, the fermion part of the chiral superfields F , F c i are denoted f , f c i , respectively. 9 These are the familiar SM fermions as well as a set of right-handed neutrinos. The light Higgs scalar doublets keep the same notation as their corresponding superfield. 10 The fields Σ acquire high-scale VEVs which give dynamical masses to the X messengers in the renormalisable theory, expected to be O(M GUT ).

Goldstone field
The central actors in the flavoured axion model are the A 4 triplet flavons φ. Taking only the scalar part of superfields φ, we let To be precise: f, f c i are Weyl fermions, by definition transforming as left-handed fields. In other words, f c i are the left-handed components of a weak SU (2)L singlet. 10 This is rather imprecise but tolerable, as the Higgs sector is not relevant to the PQ mechanism, and fields are anyway replaced by their VEVs eventually.
where we have expanded around the flavon VEVs, noting that each ϕ consists of a scale v and direction x in A 4 space. The VEVs are aligned according to the CSD(4) prescription, i.e.
x ϕ u 1 = (0, 1, 1), The radial fields ρ ϕ are very heavy and phenomenologically uninteresting, so will be neglected henceforth. The phase fields a ϕ are not independent, but related by the single U (1) rephasing symmetry. We identify the Goldstone (or axion) field a by Component fields are given by

Lagrangian (SUSY basis)
The Yukawa Lagrangian may thus be written as Let us make the SM field components of the PS fields f, f c i explicit: Below the EWSB scale, Q and L further decompose into (u L , d L ) and (ν L , e L ), respectively. In addition, h u → v u , h d , h d 15 → v d , with some small mixing assumed between Higgs bi-doublets to give the MSSM 2HDM; we assume the effects of this mixing are negligible. The fields Σ acquire real VEVs, with magnitudes generically written v Σ , i.e.
The interplay between the singlet Σ d and adjoint Σ d 15 also provides Clebsch-Gordan factors which are different for quarks and leptons. To account for the split between down-type quarks and charged leptons, we reparametrise the couplings λ in the charged lepton sector, so λ 1d →λ 1d , λ 2d →λ 2d , and λ ud →λ ud .

Lagrangian (left-right basis)
It is also convenient to work in the left-right (LR) basis, in terms of Weyl fermions u L,R , d L,R , e L,R , and ν L,R . This amounts to nothing more than taking the Hermitian conjugate of the terms in Eq. C.6. With all above considerations taken into account, the Lagrangian becomes λ 1d →λ 1d , λ 2d →λ 2d , λ ud →λ ud + h.c..

(C.8)
This rather hefty expression can be put in a more conventional format by 1) expanding the A 4 triplet products like Q · ϕ , such that we may write the couplings as matrices, and 2) noting that each term must be PQ-invariant, allowing us to replace the flavon PQ charges with those of the SM fermions. 11 Moreover, all λ are real by an assumed CP symmetry at high scales.

Lagrangian (condensed linear basis)
with dimensionless parameters defined by (C.12)

Lagrangian (derivative basis)
We perform an axion-dependent rotation of the fermion fields to replace the linear couplings with derivative ones; the anomaly term is also induced. Extending the Lagrangian to include the fermion kinetic terms, f (f Li / ∂f Li +f Ri / ∂f Ri ), we let ijēLi e Rj + h.c. + anomaly. (C.14) We rotate to the mass basis by unitary transformations where by definition m f ≡ U † f M f V f , U Q ≡ U u , and V CKM ≡ U † u U d .

Derivative couplings
The axion-fermion derivative couplings become 16) where now f L , f R are vectors and x f L , x f R are diagonal 3 × 3 matrices. We define the coupling matrices X L ≡ U † f x f L U f and X R ≡ V † f x f R V f , and note that, since charges x f are real, X L = X † L and X R = X † R . In terms of Dirac spinors,

D Couplings in A to Z: numerical fit
The best fit parameters, as well as a Bayesian 95% credible interval, are given in Tables 7 (leptons) and 8 (quarks). The corresponding best fit input parameters are given in Table 9. We fit the model to data at the GUT scale. The running from low to high scale was performed, assuming the MSSM, in [105]. They parametrise threshold corrections by a series of dimensionless parameters η i . All but one (η b ) were set to zero, and choosinḡ η b = −0.24 to account for the small GUT-scale difference between b and τ masses.   Table 9. Best fit input parameter values.