1 Introduction

Nowadays, the renormalization procedure is mostly well-established and is no longer considered to just “sweep infinities under the rug”, however, this establishment is not complete. For example, it does not seem that there is an agreed-upon recipe for the renormalization of mixing angles and the literature suggests a myriad of renormalization schemes [1,2,3,4,5,6,7,8,9,10,11] to name a few. Even more so, there appears to exist two different philosophies regarding the renormalization of mixing angles, sometimes even used simultaneously [9] or proposed as alternatives [10]. This is a rather unpleasant situation since particle mixing is present already in the quark sector of the Standard Model (SM) as well as in nearly all models with extended scalar sectors as compared to the SM.

In slightly more detail, the two renormalization approaches differ in whether the mixing angles receive counterterms or not. The more common treatment is to introduce mixing angle counterterms, which are rather inevitably related to the field renormalization (e.g. [1]). In turn, this causes these mixing counterterms to be gauge-dependent – an unwanted feature – such that additional effort must be put in to separate the gauge-independent part (e.g. [7]). The less common approach is to trade the mixing matrix counterterms for the off-diagonal mass matrix counterterms such that the bare mixing matrix is already renormalized (e.g. [9]). It seems that the latter, although not as popular, does not introduce downsides such as unwanted gauge-dependence.

The fact that there are two rather different philosophies, one of them in general leading to gauge-dependent mixing angle counterterms, seems to be an expression of the fact that mixing angles are basis-dependent and, therefore, not physical quantities. For example, this has been rather explicitly noted in [12, 13] at tree-level when considering basis-independent methods for the Two Higgs Doublet Model (THDM). An analogous statement on the redundancy of the renormalization of mixing angles was also made in [10] in the context of the THDM. Seeing that mixing angles are basis-dependent is simple, for example, the flavour basis of the SM has no mixing matrices, but rotation to the quark mass-eigenstate basis produces the quark mixing matrix. Of course, many other bases where the quarks are not in their mass-eigenstates also contain some mixing matrix. The not so simple point, which seems to cause a lot of confusion, is whether and how to renormalize these basis-dependent quantities.

In this work we do not intend to propose a particular renormalization scheme, instead we want to establish a conceptually consistent philosophy for the renormalization of mixing angles such that renormalization schemes can later be constructed. All of our upcoming arguments are geared to highlight renormalization scheme independent structures. For example, in Sect. 3.2 we argue that the mixing matrix counterterms are naturally associated with gauge-dependent structures independently of a renormalization scheme, while a specific scheme may make use of this structure or not. Further, given the scheme-independent arguments, we conclude that schemes in fact should make use of these structures by not associating counterterms to mixing angles as is done in, e.g. [14], where we propose a renormalization scheme for fermions. The absence of mixing angle counterterms seems to offer all of the required properties for mixing renormalization [5, 11, 15] and is a step towards basis-independence. Therefore, we consider this approach to be the consistent one and the one that should be used in practice over the more common approach with counterterms for mixing angles.

The paper is structured as follows: Sect. 2 introduces nearly all the needed notation and relations, Sect. 3 is then dedicated to providing arguments for having the mixing angle counterterms set to 0. In particular, Sect. 3.1 is based on basis-independence arguments, Sect. 3.2 discusses the gauge-dependence and Sect. 3.3 considers the degenerate mass limit. In Sect. 4 we give our conclusions.

2 Basis rotations and renormalization

In this section we set up the discussion of mixing, mass, and field renormalization by generalizing the discussion found in [10], while more specific arguments will be given in further sections.

For simplicity, let us consider a system of real scalar fields

$$\begin{aligned} \varvec{\phi }_0=\begin{pmatrix} \phi ^0_1 \\ \phi ^0_2 \\ \vdots \\ \phi ^0_n \end{pmatrix} \,, \end{aligned}$$
(1)

where the 0 (sub)superscripts indicate that the fields are bare. Now, one may relate the fields \(\varvec{\phi }_0\) in the initial basis to some other basis of the fields \(\varvec{h}_0\) via an orthogonal rotation matrix \(\varvec{R}_0\)

$$\begin{aligned} \varvec{\phi }_0=\varvec{R}_0 \varvec{h}_0. \end{aligned}$$
(2)

Considering the kinetic term in the Lagrangian in momentum space we may write this relation as

$$\begin{aligned} {\mathcal {K}}&=\varvec{\phi }^{T}_0\left( p^2-\varvec{M}^{2}_0 \right) \varvec{\phi }_0 \end{aligned}$$
(3a)
$$\begin{aligned}&= \varvec{h}^{T}_0\left( p^2-\varvec{R}^{T}_0\varvec{M}^{2}_0\varvec{R}_0 \right) \varvec{h}_0 \end{aligned}$$
(3b)
$$\begin{aligned}&= \varvec{h}^{T}_0\left( p^2-\varvec{{\widetilde{M}}}^2_0\right) \varvec{h}_0\,, \end{aligned}$$
(3c)

where T in the superscript stands for transposition, \(p^2\) is the squared momentum, \(\varvec{M}^2_0\) (\(\varvec{{\widetilde{M}}}^2_0\)) is the bare mass-squared matrix in the \(\varvec{\phi }_0\) (\(\varvec{h}_0\)) basis, which is in general not diagonal. We have used

$$\begin{aligned} \varvec{R}_0^T \varvec{R}_0=\varvec{1} \end{aligned}$$
(4)

in the momentum term and defined

$$\begin{aligned} \varvec{{\widetilde{M}}}^2_0=\varvec{R}^{T}_0\varvec{M}^{2}_0\varvec{R}_0. \end{aligned}$$
(5)

Apart from performing basis rotations, the fields may be renormalized

$$\begin{aligned} \varvec{\phi }_0=\varvec{Z} \varvec{\phi }=\left( \varvec{1}+\delta \varvec{Z}\right) \varvec{\phi }. \end{aligned}$$
(6)

Here \(\varvec{Z}\) is the field renormalization constant, \(\delta \varvec{Z}\) is the corresponding counterterm that can be considered to be of 1-loop order, and \(\varvec{\phi }\) stands for the vector of renormalized fields. Analogously, the fields \(\varvec{h}_0\) may also be renormalized

$$\begin{aligned} \varvec{h}_0=\varvec{{\widetilde{Z}}} \varvec{h}=\left( \varvec{1}+\delta \varvec{{\widetilde{Z}}}\right) \varvec{h}. \end{aligned}$$
(7)

The renormalization procedure also requires counterterms for the mass matrices

$$\begin{aligned} \begin{aligned} \varvec{M}^2_0&= \varvec{M}^2+\delta \varvec{M}^2 , \\ \varvec{{\widetilde{M}}}^2_0&= \varvec{{\widetilde{M}}}^2+\delta \varvec{{\widetilde{M}}}^2 , \end{aligned} \end{aligned}$$
(8)

where \(\varvec{M}^2\left( \varvec{{\widetilde{M}}^2}\right) \) is the renormalized mass matrix and the \(\delta \varvec{M}^2\left( \delta \varvec{{\widetilde{M}}^2}\right) \) is the mass matrix counterterm in the \(\varvec{\phi }\left( \varvec{h}\right) \) basis. For the sake of the argument we also introduce mixing matrix counterterms

$$\begin{aligned} \varvec{R}_0=\varvec{R}+\delta \varvec{R} \end{aligned}$$
(9)

such that both the bare and the renormalized mixing matrices are orthogonal. The following property stems from orthogonality at 1-loop

$$\begin{aligned} \delta \left( \varvec{R}_0^T \varvec{R}_0\right) =0 \Rightarrow \delta \varvec{R}^T \varvec{R}=-\varvec{R}^T \delta \varvec{R} . \end{aligned}$$
(10)

Now, we should be able to apply the renormalization procedure to the kinetic term, Eq. (3), in any basis. For example, taking Eqs. (3a) and (3c) we get

$$\begin{aligned} {\mathcal {K}}&=\varvec{\phi }^{T}\Big \{p^2-\varvec{M}^2 +\delta \varvec{Z}^T\left( p^2-\varvec{M}^2\right) \end{aligned}$$
(11a)
$$\begin{aligned}&\quad +\left( p^2-\varvec{M}^2\right) \delta \varvec{Z} -\delta \varvec{M}^2\Big \}\varvec{\phi }\, \end{aligned}$$
(11b)
$$\begin{aligned}&=\varvec{h}^{T}\Big \{p^2-\varvec{\widetilde{M}}^2 +\delta \varvec{{\widetilde{Z}}}^T\left( p^2-\varvec{{\widetilde{M}}}^2\right) \end{aligned}$$
(11c)
$$\begin{aligned}&\quad +\left( p^2-\varvec{{\widetilde{M}}}^2\right) \delta \varvec{{\widetilde{Z}}} -\delta \varvec{{\widetilde{M}}}^2\Big \}\varvec{h}\,, \end{aligned}$$
(11d)

where we dropped all the terms non-linear in the counterterms. Alternatively, taking Eq. (3b), where the mixing matrix \(\varvec{R}_0\) is present, leads to the following

$$\begin{aligned} {\mathcal {K}}= & {} \varvec{h}^{T}\Big \{p^2-\varvec{{\widetilde{M}}}^2 -\varvec{R}^{T}\delta \varvec{M}^{2}\varvec{R} \nonumber \\{} & {} +\delta \varvec{{\widetilde{Z}}}^T\left( p^2-\varvec{{\widetilde{M}}}^2\right) +\left( p^2-\varvec{{\widetilde{M}}}^2\right) \delta \varvec{{\widetilde{Z}}} \nonumber \\ {}{} & {} -\delta \varvec{R}^T\varvec{R} \varvec{{\widetilde{M}}}^2 -\varvec{{\widetilde{M}}}^2\varvec{R}^T \delta \varvec{R} \Big \}\varvec{h}, \end{aligned}$$
(12)

where we have

$$\begin{aligned} \varvec{{\widetilde{M}}}^2=\varvec{R}^{T}\varvec{M}^{2}\varvec{R}. \end{aligned}$$
(13)

Splitting the field counterterms into the symmetric and anti-symmetric parts

$$\begin{aligned} \delta \varvec{{\widetilde{Z}}}=\delta \varvec{{\widetilde{Z}}}^S+\delta \varvec{{\widetilde{Z}}}^A, \end{aligned}$$
(14)

with

$$\begin{aligned} \left( \delta \varvec{{\widetilde{Z}}}^S\right) ^T=\delta \varvec{{\widetilde{Z}}}^S, \qquad \qquad \left( \delta \varvec{{\widetilde{Z}}}^A\right) ^T=-\delta \varvec{{\widetilde{Z}}}^A, \end{aligned}$$
(15)

and by using Eq. (10) we may rewrite the kinetic term as

$$\begin{aligned} {\mathcal {K}}= & {} \varvec{h}^{T}\Big \{p^2-\varvec{{\widetilde{M}}}^2 -\varvec{R}^{T}\delta \varvec{M}^{2}\varvec{R} \nonumber \\ {}{} & {} +\delta \varvec{{\widetilde{Z}}}^S\left( p^2-\varvec{{\widetilde{M}}}^2\right) +\left( p^2-\varvec{{\widetilde{M}}}^2\right) \delta \varvec{{\widetilde{Z}}}^S \nonumber \\ {}{} & {} -\left[ \varvec{{\widetilde{M}}}^2, \varvec{R}^T \delta \varvec{R}+ \delta \varvec{{\widetilde{Z}}}^A \right] \Big \}\varvec{h}, \end{aligned}$$
(16)

where \(\left[ \dots , \dots \right] \) is the commutator. The commutator term shows that the counterterms \(\delta \varvec{R}\) and \(\delta \widetilde{\varvec{Z}}^A\) cannot be determined separately and only the combination \(\varvec{R}^T \delta \varvec{R}+ \delta \varvec{{\widetilde{Z}}}^A\) can be fixed, i.e. the mixing matrix counterterms are degenerate with the anti-symmetric part of the field renormalization, which is a slightly more general version of the statement made in [10]. This degeneracy implies that the mixing may be renormalized through the (anti-symmetric part of the) field renormalization, which is what enables, for example, the scheme in [9]. However, we attempt to make the statement stronger – the mixing angle/matrix counterterms should always be included in the field renormalization. In the following sections we give arguments for why one should set \(\delta \varvec{R} = 0\) by comparing Eqs. (11b), (11d), and (12) in terms of basis-dependence and by discussing gauge-dependence and the degenerate mass limit.

3 Arguments for having \(\delta \varvec{R}=0\)

3.1 Basis independence

Basis-independent methods are often sought after since observables must be expressed in terms of basis-independent quantities, for example, see [12, 13, 16,17,18]. In a similar manner it is desirable for the renormalization procedure to also show some basis-independent features. For example, the form of the renormalized kinetic term in Eqs. (11b) and (11d) is the same although the bases are different – this is welcome. In contrast, the form of Eq. (12) is already different due to additional mixing/rotation matrix counterterms, even though all three equations (should) correspond to the same bare kinetic term.

It is rather simple to see that Eq. (12) can be brought to the form of Eq. (11d), by simply setting \(\varvec{R}_0 = \varvec{R} \Leftrightarrow \delta \varvec{R}=0\) or, equivalently, by redefining the anti-symmetric part of the field renormalization to include \(\varvec{R}^T\delta \varvec{R}\). Once \(\delta \varvec{R}\) no longer appears we may easily equate Eqs. (11d) and (12) and get

$$\begin{aligned} \delta \varvec{{\widetilde{M}}}^2 = \varvec{R}^{T}\delta \varvec{M}^{2}\varvec{R} . \end{aligned}$$
(17)

Further, Eqs. (11b) and  (11d) correspond to the same bare kinetic term if

$$\begin{aligned} \varvec{{\widetilde{Z}}}= \varvec{R}^T \varvec{Z} \varvec{R} \end{aligned}$$
(18)

and

$$\begin{aligned} \varvec{\phi }= \varvec{R} \varvec{h}. \end{aligned}$$
(19)

In more detail, with \(\delta \varvec{R}\ne 0\) one is, or at least should be, free to perform a rotation by \(\varvec{R}^T\) on the renormalized fields \(\varvec{h}\) in Eq. (16)

$$\begin{aligned} {\mathcal {K}}= & {} \varvec{h}^{\prime \,T}\Big \{p^2-\varvec{M}^2 -\delta \varvec{M}^{2} \nonumber \\ {}{} & {} +\delta \varvec{Z}^S\left( p^2-\varvec{M}^2\right) +\left( p^2-\varvec{M}^2\right) \delta \varvec{Z}^S \nonumber \\ {}{} & {} -\left[ \varvec{M}^2, \delta \varvec{R} \varvec{R}^T + \varvec{R} \delta \varvec{{\widetilde{Z}}}^A \varvec{R}^T \right] \Big \}\varvec{h}^\prime , \end{aligned}$$
(20)

Here \(\varvec{h} ^\prime =\varvec{R} \varvec{h}\),Footnote 1 we have used Eqs. (13) and (18) for the symmetric part of the field renormalization. Evidently, all the terms except for the one with \(\delta \varvec{R}\) contain quantities in the basis of \(\varvec{\phi }\) even though the fields are labeled as \(\varvec{h}^\prime \). This means that one computes identical amplitudes in both the \(\varvec{\phi }\) and \(\varvec{h}^\prime \) bases, except that they are renormalized with different sets of counterterms. The presence of the \(\delta \varvec{R}\) counterterm is the source of inconsistency.

For one thing, because of the \(\delta \varvec{R}\) counterterm the basis rotations of the anti-symmetric part of field renormalization do not seem to follow the same law as the other counterterms. For the symmetric part we could use Eq. (18), while the anti-symmetric part gives

$$\begin{aligned} \delta \varvec{Z}^A {\mathop {=}\limits ^{!}} \delta \varvec{R} \varvec{R}^T + \varvec{R} \delta \varvec{{\widetilde{Z}}}^A \varvec{R}^T. \end{aligned}$$
(21)

To preserve the same law of basis transformations, Eq. (18), one must have \(\delta \varvec{R}=0\).

For another view at the inconsistency, one easily notices that the \(\delta \varvec{R}\) counterterm in the basis \(\varvec{h}^\prime \) does not have an associated renormalized parameter. This means that it is impossible to form the bare mixing matrix \(\varvec{R}_0\) in the \(\varvec{h}^\prime _0\) basis, i.e. the bare kinetic term no longer follows the form of Eq. (3) and instead becomes

$$\begin{aligned} {\mathcal {K}}^\prime= & {} \varvec{h}^{\prime \, T}_0\left\{ p^2 -\varvec{M}_0^2 \right. \nonumber \\{} & {} \left. -\left[ \varvec{M}^2, \delta \varvec{R}\varvec{R}^T +\varvec{R}\delta \varvec{{\widetilde{Z}}}^A\varvec{R}^T -\delta \varvec{Z}^A\right] \right\} \varvec{h}_0^\prime \nonumber \\ {}\ne & {} {\mathcal {K}}. \end{aligned}$$
(22)

Here we have used the inverse of \(\varvec{h}^\prime _0= \varvec{Z} \varvec{h}^\prime \). The only way to preserve the bare kinetic term and more generally the bare Lagrangian, which defines the theory, is for the commutator term to vanish. However, this gets us back to Eq. (21) and so, setting \(\delta \varvec{R}=0\) preserves not only the form of basis transformations, but also the form of the bare Lagrangian.

The third and final view of the inconsistency may be seen by considering why in Eq. (22) we have \({\mathcal {K}}^\prime \ne {\mathcal {K}}\). We started with the bare kinetic term in Eq. (3a), rotated it by \(\varvec{R}_0\) to Eq. (3b), renormalized it to get Eq. (12), and tried to rotate back into the \(\varvec{\phi }\) basis by \(\varvec{R}^T\). However, instead of Eq. (11b) the rotation took us into Eq. (20) and \({\mathcal {K}}^\prime \) in Eq. (22)! In other words, we see that basis rotations and the renormalization procedure do not commute, i.e. there is a difference if one renormalizes the theory before or after basis rotations. This is a rather awkward feature since there is nothing special about basis rotations or renormalization and we should be working with the same theory in whichever basis we choose to renormalize the theory. In turn, we formulate a consistency condition, which we also imposed in [14], that basis rotations should commute with the renormalization procedure. This condition automatically requires the bare rotations to be identified with the renormalized ones, i.e. \(\varvec{R}_0=\varvec{R}\) and \(\delta \varvec{R}=0\).

The upshot is that having the bare rotation matrix set to the renormalized one, \(\varvec{R}_0=\varvec{R}\), allows to freely change the basis at any point, be it for the bare fields as in Eq. (2) or the renormalized ones in Eq. (19) while keeping the same form of the Lagrangian. Alternatively, this may be rephrased as having a basis-invariant set of counterterms, i.e. upon basis rotations

$$\begin{aligned} \left\{ \varvec{Z}, \delta \varvec{M}^2, \delta \lambda \right\} \Rightarrow \left\{ \varvec{{\widetilde{Z}}}, \delta \varvec{{\widetilde{M}}}^2, \delta \widetilde{\lambda }\right\} \end{aligned}$$
(23)

but not

$$\begin{aligned} \left\{ \varvec{Z}, \delta \varvec{M}^2, \delta \lambda \right\} \Rightarrow \left\{ \varvec{{\widetilde{Z}}}, \delta \varvec{{\widetilde{M}}}^2, \delta \varvec{R}, \delta \widetilde{\lambda }\right\} , \end{aligned}$$
(24)

where \(\delta \lambda \) and \(\delta \widetilde{\lambda }\) stand for the counterterms of other parameters in the theory in the two respective bases.

There is also a formulation in slightly more philosophical terms. One of the main points of the renormalization procedure is that it takes some measurement (observable) as a reference point in order to make the theory predictive. The standard book-keeping device of these measurements are the counterterms. Since the observables must be basis-independent it also makes sense to have a basis-independent set of counterterms – this means \(\delta \varvec{R} = 0\). Of course, one may argue that things such as the Cabbibo–Kobayashi–Maskawa (CKM) matrix [19, 20] elements can be measured and, hence, should receive counterterms. However, the CKM matrix itself can in principle be expressed in terms of the initial (renormalized) mass matrices of the up- and down-type quarks. It is the renormalization of these mass matrices that provides a set of basis-independent counterterms and also ensures cancellations of UV divergences. Put in another way, in a diagonal mass basis, measurement of the mixing angles and masses is the same as measuring the non-diagonal mass matrix in some initial basis. In turn, mixing matrices may still be used as they are a nice way of parameterizing the mixing, but it should not be forgotten that they are derived and basis-dependent quantities and, hence, should not have counterterms.

In the two following sections we show that setting \(\delta \varvec{R}\) to 0 is not only conceptually consistent, but also of practical importance.

3.2 Gauge dependence

Let us consider the case with \(\delta \varvec{R} \ne 0\) and see how it leads to difficulties. One of the requirements for the mixing renormalization is that it should be gauge-invariant [5, 11, 15]. However, this is a rather complicated task because of Eq. (16) and the degeneracy between \(\delta \varvec{Z}^A\) and \(\varvec{R}^T\delta \varvec{R}\). A way to investigate gauge dependence is via the Nielsen Identities [21, 22], which allow to take gauge derivatives of the self-energies.

For concreteness, let us proceed in the basis of the fields \(\varvec{h}\) and consider the 1-loop case, for which the derivative w.r.t. the gauge parameter \(\xi \) of the bare self-energy \(\varvec{\Pi }^0\left( p^2\right) \) is [22]Footnote 2

$$\begin{aligned} \partial _\xi \varvec{\Pi }^0\left( p^2\right)= & {} \varvec{\Lambda }^T\left( p^2\right) \left( p^2-\varvec{{\widetilde{M}}}^2\right) \nonumber \\ {}{} & {} +\left( p^2-\varvec{{\widetilde{M}}}^2\right) \varvec{\Lambda }\left( p^2\right) , \end{aligned}$$
(25)

where \(\varvec{\Lambda }\) is a correlation function involving BRST sources, describes the gauge-dependence of \(\varvec{\Pi }^0\left( p^2\right) \), and is a matrix in flavour space. Just as for the field renormalization in Eq. (14), we may split \(\varvec{\Lambda }\) in its symmetric and anti-symmetric parts, then the Nielsen Identity becomes

$$\begin{aligned} \partial _\xi \varvec{\Pi }^0\left( p^2\right)= & {} \varvec{\Lambda }^S\left( p^2\right) \left( p^2-\varvec{{\widetilde{M}}}^2\right) \nonumber \\ {}{} & {} +\left( p^2-\varvec{{\widetilde{M}}}^2\right) \varvec{\Lambda }^S\left( p^2\right) \nonumber \\ {}{} & {} -\left[ \varvec{{\widetilde{M}}}^2, \varvec{\Lambda }^A\right] . \end{aligned}$$
(26)

Let us also consider the self-energy \(\varvec{\Pi }\left( p^2\right) \) renormalized as in Eq. (16)

$$\begin{aligned} \varvec{\Pi }\left( p^2\right)= & {} \varvec{\Pi }^0\left( p^2\right) -\varvec{R}^{T}\delta \varvec{M}^{2}\varvec{R} \nonumber \\ {}{} & {} +\delta \varvec{{\widetilde{Z}}}^S\left( p^2-\varvec{{\widetilde{M}}}^2\right) +\left( p^2-\varvec{{\widetilde{M}}}^2\right) \delta \varvec{{\widetilde{Z}}}^S \nonumber \\ {}{} & {} -\left[ \varvec{{\widetilde{M}}}^2, \varvec{R}^T \delta \varvec{R}+ \delta \varvec{{\widetilde{Z}}}^A \right] . \end{aligned}$$
(27)

Now, we may take the gauge derivative of the renormalized self-energy and arrive at

$$\begin{aligned} \partial _\xi \varvec{\Pi }\left( p^2\right)= & {} -\varvec{R}^{T}\partial _\xi \delta \varvec{M}^{2}\varvec{R} \nonumber \\ {}{} & {} +\left( \partial _\xi \delta \varvec{{\widetilde{Z}}}^S+\varvec{\Lambda }^S\right) \left( p^2-\varvec{{\widetilde{M}}}^2\right) \nonumber \\ {}{} & {} +\left( p^2-\varvec{{\widetilde{M}}}^2\right) \left( \partial _\xi \delta \varvec{{\widetilde{Z}}}^S+\varvec{\Lambda }^S\right) \nonumber \\ {}{} & {} -\left[ \varvec{{\widetilde{M}}}^2, \varvec{R}^T \partial _\xi \delta \varvec{R} + \partial _\xi \delta \varvec{{\widetilde{Z}}}^A+\varvec{\Lambda }^A \right] . \end{aligned}$$
(28)

Here we assumed \(\varvec{{\widetilde{M}}}^2\) and \(\varvec{R}\) to be gauge-independent. It is evident that the field counterterms as well as \(\delta \varvec{R}\) are naturally associated with gauge-dependent structures. In turn, it is rather hard to fix \(\delta \varvec{R}\) in a gauge-independent way since that immediately requires an additional renormalization condition to break the degeneracy between the field and mixing matrix counterterms. Once again, the easiest way around this is to simply set \(\delta \varvec{R}=0\).

In contrast, the mass counterterm \(\varvec{R}^{T}\delta \varvec{M}^{2}\varvec{R}\) is not associated with any gauge-dependent structure and so it can be defined in a naturally gauge-independent way, only non-physical renormalization conditions can induce gauge-dependence in the mass counterterm.

3.3 Non-singular degenerate mass limit

If one keeps \(\delta \varvec{R}\ne 0\) and manages to renormalize it in a gauge-independent way, the counterterm will still be problematic. To see this, let us for simplicity explicitly choose a basis where the mass matrix is diagonal

$$\begin{aligned} \varvec{{\widetilde{M}}}^2 = \textrm{diag}\left( m_1^2,\, \dots ,\, m_n^2\right) \end{aligned}$$
(29)

and take Eq. (27)

$$\begin{aligned} \Pi _{ij}\left( p^2\right)= & {} \Pi _{ij}^0\left( p^2\right) -\left( \varvec{R}^{T}\delta \varvec{M}^{2}\varvec{R}\right) _{ij} \nonumber \\ {}{} & {} +\delta \widetilde{Z}_{ij}^S\left( p^2-m^2_j\right) +\left( p^2-m^2_i\right) \delta \widetilde{Z}_{ij}^S \nonumber \\ {}{} & {} -\left( m^2_i-m^2_j\right) \left( \left( \varvec{R}^T \delta \varvec{R}\right) _{ij}+ \delta \widetilde{Z}_{ij}^A \right) . \end{aligned}$$
(30)

Here \(i,\, j\) are flavour indices, the non-bold notation (where appropriate) indicates matrix elements, and the counterterm \(\left( \varvec{R}^{T}\delta \varvec{M}^{2}\varvec{R}\right) _{ij}\) is in general not diagonal even if \(\varvec{{\widetilde{M}}}^2\) is.

Further, the counterterms must cancel the UV divergences in the bare self-energy independently of the chosen scheme, hence, we only take the UV parts, although the arguments carry over to the finite parts without difficulty. In addition, the UV divergences in the bare-self energy must be accompanied by the same structures as the counterterms, since otherwise one could not use the counterterms to cancel the UV divergences, or at least could not do so for every momentum \(p^2\). As the structures with \(p^2-m^2_{i}\) and \(p^2-m^2_{j}\) multiply the symmetric part of the field renormalization, which is not related with \(\delta \varvec{R}\), we simply drop such terms (as indicated by ) for simplicity. With these considerations and as well taking only off-diagonal terms with \(i\ne j\) we have

(31)

Here lies part of the problem: in the literature there are many schemes (e.g. [1, 25,26,27,28]) where the off-diagonal mass counterterm \(\left( \varvec{R}^{T}\delta \varvec{M}^{2}\varvec{R}\right) _{ij}\) is set to 0. Another part is that in the degenerate mass limit, i.e. \(m_i\rightarrow m_j\), the bare self-energy in Eq. (31) does not vanish in general, but the mixing and field counterterms are multiplied by \({m_i^2-m_j^2}\), which does vanish in this limit. In such schemes the UV divergences in Eq. (31) must be canceled with the counterterms \(\delta \varvec{R}\) or \(\delta \varvec{{\widetilde{Z}}}^A\), but this is only possible if these counterterms are proportional to \(\left( m_i^2-m_j^2\right) ^{-1}\). In other words, the counterterms \(\delta \varvec{R}\) or \(\delta \varvec{{\widetilde{Z}}}^A\) must be singular in the degenerate mass limit for the cancellation to work out. In turn, these singularities can cause numerical problems, which are required to be absent for the mixing renormalization [11].

Alternatively, the non-diagonal mass counterterm can naturally cancel the non-vanishing terms without being singular. Also note that according to Sect. 3.2 (and with the diagonal mass matrix) the gauge-dependent parts vanish in the degenerate mass limit [11, 29] so that the mass counterterms can be defined in a gauge-independent way. Even when the renormalization is performed in a basis where the (renormalized) mass matrix is diagonal the corresponding counterterm has to be a matrix with possible non-trivial off-diagonal elements depending on the particular model – this avoids singularities in the degenerate mass limit. Out of \(\Pi _{ij}^0\) only terms which are gauge-independent and proportional to \({m_i^2-m_j^2}\) could possibly be included in \(\delta \varvec{R}\) such that it is non-singular and gauge-invariant. However, this is a step towards basis-dependence and it is best to keep \(\delta \varvec{R} = 0\) and to avoid inconsistencies altogether.

Finally, even without considering the degenerate mass limit, the non-diagonal mass counterterms are essential to ensure that all the relevant UV divergences cancel out without the need for a non-trivial \(\delta \varvec{R}\) as is explicitly done in [9, 14]. These explicit schemes with \(\delta \varvec{R} = 0 \), where the UV divergences are properly taken care of, reinforce the more general and scheme-independent arguments laid out in this paper.

4 Conclusions

In this paper we have considered the interplay between basis rotations of the fields and the renormalization procedure. In particular, we have found that adding counterterms to mixing angles is a step towards basis-dependence and introduces various problems. For one thing, counterterms of mixing angles are naturally associated with gauge-dependent structures, while at the same time a gauge-independent definition of them is likely to be singular in the degenerate mass limit. Neither of these two properties are welcome, since the former makes physical amplitudes gauge-dependent and the latter can cause numerical instabilities. More importantly, mixing angle counterterms obstruct the form of basis transformations such that the renormalization procedure does not commute with basis rotations – we see this as an inconsistency and a step towards basis-dependence. In contrast, stepping in the direction of basis-independence by setting mixing angle counterterms to 0 completely avoids inconsistencies together with all the gauge-dependence and singular behaviour problems. We conclude that the basis-independent approach is practically far more simple, consistent and should be taken.