1 Introduction

QED radiative corrections to hadronic observables are generally rather small but they become phenomenologically relevant when the target precision is at the percent level. For example, the leptonic and semileptonic decay rates of light pseudoscalar mesons are measured with a very high accuracy and, on the theoretical side, have been calculated with the required non-perturbative accuracy by many lattice collaborations. Most of these calculations have been performed by simulations of lattice QCD without taking into account QED radiative corrections. A recent review [4] of the results obtained by the different lattice groups shows that leptonic and semileptonic decay rates of \(\pi \) and K mesons are presently known at the sub-percent level of accuracy. At the same time, QED radiative corrections to these quantities are estimated to be of the order of a few percent, by means of chiral perturbation theory [5]. These estimates have recently been confirmed in the case of the leptonic decay rates of \(\pi \) and K by a first-principle lattice calculation of the QED radiative corrections at \(O(\alpha )\) in Refs. [6, 7].

Other remarkable examples of observables for which QED radiative corrections are phenomenologically relevant are the so-called lepton flavour universality ratios. For example \(R(D^{(*)})\) is defined as the branching ratio for \(B \mapsto D^{(*)} \ell {{\bar{\nu }}}_\ell \) with \(\ell =e,\mu \) divided by the branching ratio for \(B \mapsto D^{(*)} \tau {{\bar{\nu }}}_\tau \). Most of the hadronic uncertainties cancel in these ratios that are built in such a way that they are trivial in the Standard Model, in the limit in which the two leptons have the same mass. Presently, a combined analysis [8] of the R(D) and \(R(D^*)\) ratios shows a deviation of the experimental measurements from the theoretical predictions of the order of 3 standard deviations. On the other hand, QED radiative corrections are different for the two leptons because of the different masses and an improved theoretical treatment of these effects (see for example Refs. [9, 10] for a discussion of this point) can possibly enhance or reconcile the observed discrepancy between the experimental measurements and the theoretical expectations.

QED radiative corrections to hadronic observables can be computed from first principles by performing lattice simulations of QCD coupled to QED, treating the photon field on an equal footing as the gluon field. Since these corrections are expected to be at the percent level, in order to resolve them against the statistical noise, one needs to simulate at various values of the fine-structure constant and to interpolate to the physical value. This approach, pioneered in Refs. [11,12,13], is highly non-trivial from both the numerical and theoretical point of view, because of the peculiarities of QED. Numerically, lattice calculations are unavoidably affected by statistical and systematic uncertainties and it can be challenging to resolve QED radiative corrections from the leading QCD contributions within the errors of a simulation. Theoretically, a big issue arises because lattice calculations have necessarily to be done on a finite volume. QED is a long-range interaction and, consequently, finite-volume effects are the key issue in presence of electromagnetic interactions.

In fact, as a consequence of Gauss’ law, it is impossible to have a net electric charge on a periodic torus. Because of this strong theoretical constraint, it is particularly challenging to calculate from first principles physical observables associated with electrically charged external states, such as the phenomenologically relevant quantities discussed above. Several approaches have been proposed over the years to cope with this problem, see Ref. [14] for a recent review. The most popular approaches to the problem of charged particles on the torus solve the Gauss’ law constraint by introducing non-local terms in the finite-volume action of the theory.Footnote 1 The effects induced by the non-locality of the action are expected to disappear once the infinite-volume limit is properly taken and, as far as \(O(\alpha )\) QED radiative corrections are concerned, it is generally possible to show that this is indeed the case.

On the one hand, the non-local formulations of the theory are particularly appealing because of their formal simplicity. On the other hand, it has been shown in Ref. [18] that it is possible to probe electrically charged states on a finite volume by starting from a local formulation of the theory and, remarkably, in a fully gauge-invariant way. This is possible by using C-parity (or \(\mathrm {C}^*\)) boundary conditions for all the fields and by using a certain class of interpolating operators originally introduced by Dirac in a seminal work [19] on the canonical quantization of QED.

The formulation of Ref. [18] has also been studied numerically. The results for the meson masses extracted in a fully gauge-invariant way from lattice simulations of QCD+QED with \(\mathrm {C}^*\) boundary conditions obtained in Ref. [20] provide a convincing numerical evidence that, beside being an attractive theoretical formulation, the proposal of Ref. [18] is also a valid numerical alternative for the calculation of QED radiative corrections on the lattice. This motivated the present work.

Fig. 1
figure 1

Summary of salient features of openQ*D. Some features inherited from openQCD and NSPT are highlighted

In this paper we present the open-source package openQ*D, which can be used to simulate QCD+QED, QCD, the pure SU(3) and U(1) gauge theories.Footnote 2 The code allows to choose a wide variety of temporal and spatial boundary conditions. In particular, it allows to perform dynamical simulations of QCD+QED with \(\mathrm {C}^*\) but also with periodic boundary conditions along the spatial directions. Simulations of QCD with \(\mathrm {C}^*\) boundary conditions can be a valuable starting point for the application of the RM123 method [21], in which observables are calculated order-by-order in the electromagnetic coupling. A fully tested and stable release of openQ*D can be downloaded from [1].

The openQ*D package is based on the openQCD [2] package from which it inherits the core features, most notably the implementation of the Dirac operator, of the solvers and the possibility of simulating open and Schrödinger functional boundary conditions in the time direction. One of the inherited solvers implements the inexact deflation algorithm of Ref. [22]. An added value of the openQ*D package is the possibility of using more deflation subspaces in a single simulation. This is particularly important in the case of QCD+QED simulations because different deflation subspaces have to be generated for quarks having different electric charges.

Another important feature present in the openQ*D package is the possibility to use Fourier Acceleration [23, 24] for the molecular dynamics evolution of the U(1) field. The used implementation of the Fast Fourier Transform (FFT) is an adaptation of the corresponding module in the NSPT [3, 25] package.

The remaining of this paper is organised as follows. In Sect. 2 we give an overview of the theoretical background needed to understand the actions simulated by openQ*D, and we describe some peculiar aspects of the simulation algorithm. In particular, the specific implementation of \(\mathrm {C}^*\) boundary conditions and of the Fourier Acceleration for the U(1) field are discussed. In Sect. 3 we provide instructions on how to compile the code, construct a sample input file, and run the program that generates QCD+QED configurations. Section 4 is a collection of tests and performance studies. In particular, we present scalability tests, and studies of the performance of solvers for the Dirac equation for electrically charged fields. We also illustrate the outcome of some sample runs performed for testing purposes. In Fig. 1, we provide a schematic view of the openQ*D functionalities.

2 Theoretical background

An overview of the main algorithmic choices made in the code will be given in this section. The fundamental fields are the SU(3) link variable \(U_{\mu }(x)\) and the real photon field \(A_{\mu }(x)\). Since only the compact formulation of QED is implemented at present, all observables are written in terms of the U(1) link variable

$$\begin{aligned} z_{\mu }(x)&= \exp \{i A_{\mu }(x)\}, \end{aligned}$$
(2.1)

which implies that the real photon field can be restricted to \(-\pi \le A_{\mu }(x) \le \pi \) with no loss of generality. Various boundary conditions can be chosen for the gauge fields: periodic, open [26], Schrödinger Functional (SF) [27, 28] and open-SF boundary conditions [29] in the Euclidean time direction \(\mu =0\), periodic and \(\mathrm {C}^*\) boundary conditions [30,31,32,33] in the spatial directions. The implementation of \(\mathrm {C}^*\) boundary conditions is discussed in Sect. 2.1.

After integrating out the fermion fields in a usual way, the target distribution of QCD+QED if no \(\mathrm {C}^*\) boundary conditions are used is

$$\begin{aligned} \rho _\text {tar}(U,A)&\propto e^{-S_{\mathrm {g,}\mathrm {SU}(3)}(U)-S_{\mathrm {g,}\mathrm {U}(1)}(A)} \prod _{f} \det D_f, \end{aligned}$$
(2.2)

where the gauge actions \(S_{\mathrm {g,}\mathrm {SU}(3)}(U)\) and \(S_{\mathrm {g,}\mathrm {U}(1)}(A)\) are briefly discussed in Sect. 2.2, the product runs over the simulated fermion flavours indicized by f, and the Dirac operator D is introduced in Sect. 2.3. If \(\mathrm {C}^*\) boundary conditions are used, the determinant is replaced by a Pfaffian, i.e.

$$\begin{aligned} \rho _\text {tar}(U,A)&\propto e^{-S_{\mathrm {g,}\mathrm {SU}(3)}(U)-S_{\mathrm {g,}\mathrm {U}(1)}(A)} \prod _{f} \text {pf} \, (CTD_f), \end{aligned}$$
(2.3)

where C is the charge conjugation matrix and T is a field-independent matrix satisfying \(T^2=1\), whose detailed definition can be found in Sect. 2.1. While in the continuum limit the determinant and the Pfaffian are positive, this is not the case with Wilson fermions. The absolute value is considered in both cases, which amounts to replacing

$$\begin{aligned} \begin{aligned}&\det D_f \rightarrow \left| \det D_f \right| ,\\&\text {pf} \, (CTD_f) \rightarrow \left| \text {pf} \, (CTD_f) \right| = \left| \det D_f \right| ^{1/2}. \end{aligned} \end{aligned}$$
(2.4)

The sign should be separately calculated and included in the evaluation of observables as a reweighting factor [34, 35]. It is important to stress that this is a mild sign problem [18], which becomes irrelevant sufficiently close to the continuum limit, and which is also present in standard QCD simulations for the strange quark. The presented strategy is in line with state-of-the-art QCD and QCD+QED simulations, in which the sign of the determinant is simply ignored. Future work will be planned to investigate the importance of the sign especially at lighter quark masses.

After introducing the standard even–odd preconditioned operator \(\hat{D}\) [36], one rewrites the quark part of the distribution as

$$\begin{aligned} \prod _{f} \left| \det D_f \right| ^{2\alpha _f}&= \prod _{f} \det (D^{\dagger }_f D_f)^{\alpha _f} \nonumber \\&= e^{-S_{\mathrm {sdet}}(U,A)} \prod _{f} \det (\hat{D}^{\dagger }_f \hat{D}_f)^{\alpha _f}, \end{aligned}$$
(2.5)

where \(\alpha _f\) is either 1/2 or 1/4. The definitions of \(\hat{D}_f\) and \(S_{\mathrm {sdet}}\) can be found in Sect. 2.3. Instead of this target distribution, the openQ*D code simulates a slightly different distribution

$$\begin{aligned} \rho _\mathrm {sim}(U,A)&\propto e^{-S_{\mathrm {g,}\mathrm {SU}(3)}(U)-S_{\mathrm {g,}\mathrm {U}(1)}(A)} e^{-S_{\mathrm {sdet}}(U,A)} \prod _{f} \det R_f^{-1}. \end{aligned}$$
(2.6)

written in terms of a rational approximation \(R_f\) [37]

$$\begin{aligned} R_f&\simeq ( \hat{D}_f^\dag \hat{D}_f + {\mu }_f^2 )^{-\alpha _f}, \end{aligned}$$
(2.7)

where \({\mu }_f\) is a tunable parameter introduced to suppress configurations with exceptionally small eigenvalues of \(\hat{D}_f^\dag \hat{D}_f\) (twisted-mass reweighting [26, 38]). If \({\mu }_f\) is small enough and the rational approximation is accurate enough, the simulated distribution \(\rho _\mathrm {sim}(U,A)\) is very close to the target one \(\rho _\mathrm {tar}(U,A)\). The difference is corrected by means of reweighting factors \(W_f\)

$$\begin{aligned} \frac{\rho _\mathrm {tar}(U,A)}{\rho _\mathrm {sim}(U,A)}&\propto \prod _f W_f,&W_f&= \det \left[ ( \hat{D}_f^\dag \hat{D}_f)^{\alpha _f} R_f \right] , \end{aligned}$$
(2.8)

which have to be separately calculated and included in the expectation values of observables as follows

$$\begin{aligned} \langle O \rangle _\mathrm {tar}&= \frac{ \langle O \prod _f W_f \rangle }{ \langle \prod _f W_f \rangle }. \end{aligned}$$
(2.9)

The detailed discussion of the supported reweighting factors can be found in Appendix A. The rational function \(R_f\) can be decomposed in a product of positive factors \(R_{f,\ell }\) (frequency splitting [26]). More details on frequency splitting are provided in Sect. A.2. The determinant of the rational functions is finally represented by means of a pseudofermion quadratic action as in

$$\begin{aligned} \det R_f^{-1}&= \prod _\ell \det R_{f,\ell }^{-1} = \int [d\Phi ] \, e^{- \sum _\ell ( \Phi _{f,\ell }, R_{f,\ell } \Phi _{f,\ell } ) }. \end{aligned}$$
(2.10)

The distribution is generated by means of a Hybrid Monte Carlo (HMC) algorithm with Fourier acceleration for the U(1) field. The molecular dynamics (MD) Hamiltonian is given by

$$\begin{aligned} H = \frac{1}{2} ( \pi , \Delta ^{-1} \pi )_\text {U(1)} + \frac{1}{2} ( \Pi , \Pi )_\text {SU(3)} + S(U,A,\Phi ),\nonumber \\ \end{aligned}$$
(2.11)

where \(\Pi _\mu (x)\) and \(\pi _\mu (x)\) denote the momentum fields associated to the SU(3) and U(1) fields, the operator \((-\Delta )\) is a discretization of the Laplace operator, and the action is given by

$$\begin{aligned} S(U,A,\Phi )&= S_{\mathrm {g,}\mathrm {SU}(3)}(U) + S_{\mathrm {g,}\mathrm {U}(1)}(A) + S_{\mathrm {sdet}}(U,A) \nonumber \\&\quad + \sum _{f,\ell } ( \Phi _{f,\ell }, R_{f,\ell } \Phi _{f,\ell } ). \end{aligned}$$
(2.12)

Details on the implementation of the Fourier acceleration are presented in Appendix B. The HMC consists of three steps.

  1. 1.

    The momentum and pseudofermion fields are randomly generated with probability distribution given by \(e^{-H}\);

  2. 2.

    The gauge fields are evolved with a discretized version of the MD equations, i.e.

    $$\begin{aligned} \partial _t A_\mu (x)&= \Delta ^{-1} \pi _\mu (x) \nonumber \\ \partial _t U_\mu (x)&= \Pi _\mu (x) U_\mu (x)\nonumber \\ \partial _t \pi _\mu (x)&= - \partial _{A_\mu (x)} S(U,A,\Phi ), \nonumber \\ \partial _t \Pi _\mu (x)&= - \partial _{U_\mu (x)} S(U,A,\Phi ), \end{aligned}$$
    (2.13)

    where \(\partial _{U_\mu (x)}\) is the left Lie derivative with respect to \(U_\mu (x)\) while \(\partial _{A_\mu (x)}\) is the elementary derivative with respect to \(A_\mu (x)\). In practice multiple time-scale [39] symplectic integrators are used to solve the MD equation: leapfrog, 2nd and 4th order Omelyan–Mryglod–Folk integrators [40] are available (LF, OMF2, OMF4).

  3. 3.

    The evolved gauge configuration is accepted or rejected with a standard Metropolis test with probability distribution given by \(e^{-H}\).

Fig. 2
figure 2

Global geometry of extended lattice. The top diagram represents a section of the extended lattice along a (1, k) plane where \(k=2,3\) is a direction with \(\mathrm {C}^*\) boundary conditions. All fields are periodic along the extended direction 1. \(\mathrm {C}^*\) boundary conditions in the direction \(k=2,3\) are replaced by shifted boundary conditions in the extended lattice. Shifted boundary conditions are imposed by properly defining the nearest neighbours of boundary sites. Empty circles in the red (resp. green, blue) rectangle have to be identified with the corresponding solid circles in the red (resp. green, blue) rectangle. The bottom diagram represents a section of the extended lattice along a (1, k) plane where \(k=2,3\) is a periodic direction. In both diagrams, the black circles represent the sites of the physical lattice, and the grey circles represent the sites of the mirror lattice

2.1 \(\mathrm {C}^*\) boundary conditions

Other than the variety of boundary conditions in the temporal direction inherited from openQCD-1.6, the openQ*D code allows for periodic or \(\mathrm {C}^*\) boundary conditions to be chosen in the spatial directions. If the gauge fields satisfy periodic boundary conditions in all spatial directions k, the fermion fields \(\psi _f(x)\) and \({\bar{\psi }}_f(x)\) satisfy general phase-periodic boundary conditions (f is the flavour index), i.e.

$$\begin{aligned}&U_\mu (x + L_k {\hat{e}}_k) = U_\mu (x),\nonumber \\&A_\mu (x + L_k {\hat{e}}_k) = A_\mu (x),\end{aligned}$$
(2.14)
$$\begin{aligned}&\psi _f(x + L_k {\hat{e}}_k) = e^{i\theta _{f,k}} \psi _f(x), \nonumber \\&{\bar{\psi }}_f(x + L_k {\hat{e}}_k) = e^{-i\theta _{f,k}} {\bar{\psi }}_f(x). \end{aligned}$$
(2.15)

Phase-periodic boundary conditions are incompatible with \(\mathrm {C}^*\) boundary conditions. If the gauge fields satisfy \(\mathrm {C}^*\) boundary conditions in at least one direction, say k, then \(\theta _{f,j}=0\) for all f and j, and

$$\begin{aligned}&U_\mu (x + L_k {\hat{e}}_k) = U_\mu ^*(x),\nonumber \\&A_\mu (x + L_k {\hat{e}}_k) = -A_\mu (x),\end{aligned}$$
(2.16)
$$\begin{aligned}&\psi _f(x + L_k {\hat{e}}_k) = C^{-1} {\bar{\psi }}_f^T(x),\nonumber \\&{\bar{\psi }}_f(x + L_k {\hat{e}}_k) = -\psi _f^T(x) C. \end{aligned}$$
(2.17)

The charge-conjugation matrix C satisfies

$$\begin{aligned} C^T&= -C,&C^\dag&= C^{-1},&C^{-1} \gamma _\mu C&= - \gamma _\mu ^T. \end{aligned}$$
(2.18)

\(\mathrm {C}^*\) boundary conditions are implemented by means of an orbifold construction. Assume that \(k=1\) is a direction with \(\mathrm {C}^*\) boundary conditions,Footnote 3 in order to simulate a physical lattice with size \(V = L_0 \times {L}_1 \times {L}_2 \times {L}_3\) the openQ*D code allocates a lattice with size \(V_{\mathrm {C}^*} = L_0 \times (2 {L}_1) \times {L}_2 \times {L}_3\), which we will refer to as the extended lattice. Points in the physical lattice are assumed to have coordinates which satisfy \(0 \le x_\mu < {L}_\mu \). The extended lattice can be interpreted as a double-covering of the physical lattice, with coordinates satisfying \(0 \le x_\mu < {L}_\mu \) for \(\mu \ne 1\) and \(0 \le x_1< 2{L}_1\). Points outside the physical lattice constitute the mirror lattice. On the extended lattice, points x and \(x + L_k {\hat{e}}_k\) do not coincide, so Eqs. (2.16) and (2.17) have to be interpreted as constraints which define the admissible gauge and fermion fields. These are referred to as the orbifold constraints. While the admissible gauge fields in the mirror lattice are completely determined by the value of the gauge field in the physical lattice via (2.16), the orbifold constraint has a different meaning for fermion fields, providing a relation between \(\psi \) in the physical lattice and \({\bar{\psi }}\) in the mirror lattice, and vice versa. Given that the fermion fields \(\psi \) and \({\bar{\psi }}\) are independent Grassmanian variables on the physical lattice, then one can equivalently choose the value of \(\psi \) in each point of the extended lattice as a complete set of independent variables. The integration of the Grassmanian variables yields the Pfaffian of the operator CTD [18], where T is the translation operator defined by

$$\begin{aligned} T\psi (x) = \psi (x+L_1 {\hat{e}}_1). \end{aligned}$$
(2.19)

One easily proves that

$$\begin{aligned} \left| \text {pf} \, ( CTD ) \right| = \left| \det D \right| ^{1/2}, \end{aligned}$$
(2.20)

which justifies the need for \(\alpha _f=1/4\) in Eq. (2.5). Since the square of the charge-conjugation operation is the identity, all fields must obey periodic boundary conditions along the extended direction \(k=1\), i.e.

$$\begin{aligned} U_\mu (x + 2L_1 {\hat{e}}_1)&= U_\mu (x),&A_\mu (x + 2L_1 {\hat{e}}_1)&= A_\mu (x), \end{aligned}$$
(2.21)
$$\begin{aligned} \psi _f(x + 2L_1 {\hat{e}}_1)&= \psi _f(x),&{\bar{\psi }}_f(x + 2L_1 {\hat{e}}_1)&= {\bar{\psi }}_f(x) . \end{aligned}$$
(2.22)

\(\mathrm {C}^*\) boundary conditions in directions \(k = 2, 3\) are implemented by modifying the global topology of the extended lattice (see Fig. 2). In fact in these directions, \(\mathrm {C}^*\) boundary conditions in the physical lattice imply shifted boundary conditions in the extended lattice, i.e.

$$\begin{aligned} U_\mu (x + L_k {\hat{e}}_k)&= U_\mu (x + L_1 {\hat{e}}_1), \nonumber \\ A_\mu (x + L_k {\hat{e}}_k)&= A_\mu (x + L_1 {\hat{e}}_1), \end{aligned}$$
(2.23)
$$\begin{aligned} \psi _f(x + L_k {\hat{e}}_k)&= \psi _f(x + L_1 {\hat{e}}_1), \nonumber \\ {\bar{\psi }}_f(x + L_k {\hat{e}}_k)&= {\bar{\psi }}_f(x + L_1 {\hat{e}}_1). \end{aligned}$$
(2.24)

When the determinant of the Dirac operator is stochastically estimated by means of a pseudofermion action as in Eq. (2.12), the pseudofermion field \(\Phi _{f,\ell }\) is natively defined on the extended lattice, i.e. \(\Phi _{f,\ell }(x)\) are truly independent variables for each x in the extended lattice. Moreover it satisfies the same boundary conditions as \(\psi _f\) in Eqs. (2.22) and (2.24).

It is worth noticing that \(\mathrm {C}^*\) boundary conditions can be implemented in different ways. For instance, the implementation proposed in Appendix D of Ref. [18] does not double the lattice, but the number of pseudofermion fields. Roughly speaking one needs to represent quarks and antiquarks by means of independent pseudofermion fields which are mixed by the boundary conditions. The openQ*D implementation simply maps each pair of pseudofermion fields in the geometry of the extended lattice. The cost of the application of the Dirac operator implemented as in openQ*D and as in [18] is exactly identical. Therefore, as far as the application and inversion of the Dirac operator, the orbifold construction does not introduce any overhead with respect to more standard implementations of \(\mathrm {C}^*\) boundary conditions. On the other hand, the gauge field is evolved twice. In principle one could evolve the gauge field only on the physical lattice and then copy its value to the mirror lattice. This strategy will be considered in the future. However, simulations close to the physical point are dominated by the inversion of the Dirac operator and the overhead due to the evolution of the gauge field is expected to be negligible. Evidence of this fact has been presented in [41]. The orbifold construction has been chosen essentially because it requires only minimal modifications of the openQCD code. In fact the functions that impose the orbifold constraint on gauge and momentum fields are trivial, shifted boundary conditions (by half lattice) are implemented by a simple redefinition of the map of nearest neighbouring MPI processes, and finally gauge action and forces need to be multiplied by a factor 1/2. On the other hand the Dirac operators and the solvers are completely untouched by the orbifold construction.

2.2 Gauge actions

The SU(3) and compact U(1) gauge actions that can be simulated with openQ*D are

$$\begin{aligned} S_{\mathrm {g,}\mathrm {SU}(3)}= & {} \frac{\omega _{\mathrm {C}^*}}{g_0^2} \sum _{k=0}^1 c^\mathrm {SU(3)}_k \sum _{{\mathcal {C}} \in {\mathcal {S}}_k} \mathrm {tr}\,[ 1 - U({\mathcal {C}}) ], \end{aligned}$$
(2.25)
$$\begin{aligned} S_{\mathrm {g,}\mathrm {U}(1)}= & {} \frac{\omega _{\mathrm {C}^*}}{2 q_\mathrm {el}^2 e_0^2} \sum _{k=0}^1 c^\mathrm {U(1)}_k \sum _{{\mathcal {C}} \in {\mathcal {S}}_k} [ 1 - z({\mathcal {C}}) ], \end{aligned}$$
(2.26)

where \(U({\mathcal {C}})\) and \(z({\mathcal {C}})\) denote the SU(3) and U(1) parallel transports along a path \({\mathcal {C}}\) on the lattice. \({\mathcal {S}}_0\) and \({\mathcal {S}}_1\) are the sets of all oriented plaquettes and all oriented \(1 \times 2\) planar loops respectively and the overall weight \(\omega _{\mathrm {C}^*}\) is 1 if no \(\mathrm {C}^*\) boundary conditions are used. With \(\mathrm {C}^*\) boundary conditions \(\omega _{\mathrm {C}^*}=1/2\) corrects for the double counting introduced by summing over all plaquette and double-plaquette loops in the extended lattice instead of the physical lattice (c.f. Sect. 2.1). The coefficients \(c_{0,1}\) satisfy the relation \(c_0 + 8c_1 = 1\). For SU(3), the Wilson action is obtained by choosing \(c_0 = 1\), the tree-level improved Symanzik (or Lüscher–Weisz) action is obtained by choosing \(c_0 = \tfrac{5}{3}\), and the Iwasaki action is obtained by choosing \(c_0 = 3.648\). The parameters \(g_0\) and \(e_0\) are the bare SU(3) and U(1) gauge couplings respectively, which are related to the \(\beta \) parameter and the bare fine-structure constant \(\alpha _0\) by

$$\begin{aligned} \beta = \frac{6}{g_0^2}, \quad \alpha _0 = \frac{e_0^2}{4\pi }. \end{aligned}$$
(2.27)

In the compact formulation of QED, all electric charges must be integer multiples of some elementary charge \(q_\mathrm {el}\) which is defined in units of the charge of the positron. As discussed in Ref. [18], \(q_\mathrm {el}\) appears as an overall factor in the gauge action and essentially sets the normalization of the U(1) gauge field in the continuum limit. Even though in infinite volume \(q_\mathrm {el}=1/3\) would be an appropriate choice in order to simulate quarks, in finite volume with \(\mathrm {C}^*\) boundary conditions one needs to choose \(q_\mathrm {el}=1/6\) in order to construct gauge-invariant interpolating operators for charged hadrons [18, 20]. Note that by using a compact formulation of QED, no gauge fixing is added to the action, and furthermore the user is free to choose simulating (QCD+)QED without \(\mathrm {C}^*\) boundary conditions.

The actions in Eqs. (2.25) and (2.26) assume periodic boundary conditions in time. In the more general case, the actions are modified at the time boundary in order to allow for O(a) improvement. The general form of the gauge actions can be found in [42].

2.3 Dirac operator

The Dirac operator implemented in openQ*D is given by a sum of terms

$$\begin{aligned} D = m_0 + D_{\mathrm {w}} + \delta D_{\mathrm {sw}} + \delta D_{\mathrm {b}}, \end{aligned}$$
(2.28)

where \(D_{\mathrm {w}}\) is the (unimproved) Wilson–Dirac operator, \(\delta D_{\mathrm {sw}}\) is the Sheikholeslami–Wohlert (SW) term, and \(\delta D_{\mathrm {b}}\) is the time boundary O(a)-improvement term. For simplicity, periodic boundary conditions in the time direction will be assumed, which means \(\delta D_{\mathrm {b}}=0\). The definition of \(\delta D_{\mathrm {b}}\) for other boundary conditions can be found in [43]. The Wilson–Dirac operator of Eq. (2.28) can be written as

$$\begin{aligned} D_{\mathrm {w}} = \sum _{\mu =0}^3 \frac{1}{2} \left\{ \gamma _\mu ( \nabla _\mu + \nabla ^*_\mu ) - \nabla ^*_\mu \nabla _\mu \right\} , \end{aligned}$$
(2.29)

where the covariant derivatives are defined as

$$\begin{aligned} \nabla _\mu \psi (x)&= U(x,\mu ) z(x,\mu )^{{\hat{q}}} \psi (x+{\hat{\mu }}) - \psi (x), \end{aligned}$$
(2.30)
$$\begin{aligned} \nabla ^*_\mu \psi (x)&= \psi (x) - U(x-{\hat{\mu }},\mu )^\dag z(x-{\hat{\mu }},\mu )^{-{\hat{q}}} \psi (x-{\hat{\mu }}). \end{aligned}$$
(2.31)

The SW term is given by

$$\begin{aligned} \delta D_{\mathrm {sw}} = c_{\mathrm {sw}}^\mathrm {SU(3)}\sum _{\mu ,\nu =0}^3 \tfrac{i}{4} \sigma _{\mu \nu } \hat{F}_{\mu \nu } + q \, c_{\mathrm {sw}}^\mathrm {U(1)}\sum _{\mu ,\nu =0}^3 \tfrac{i}{4} \sigma _{\mu \nu } \hat{A}_{\mu \nu }.\nonumber \\ \end{aligned}$$
(2.32)

The SU(3) field tensor \(\hat{F}_{\mu \nu }(x)\) and the U(1) field tensor \({\hat{A}}_{\mu \nu }(x)\) are constructed in terms of the clover plaquette. The explicit expression of the SU(3) field tensor used in openQ*D can be found in Ref. [44], while the U(1) field tensor is given here,

$$\begin{aligned} \hat{A}_{\mu \nu }(x)&= \tfrac{i}{4 q_\text {el}} \mathrm{{Im}}\left\{ z_{\mu \nu }(x)+ z_{\mu \nu }(x-{\hat{\mu }})+ z_{\mu \nu }(x-{\hat{\nu }})\right. \nonumber \\&\quad +\left. z_{\mu \nu }(x-{\hat{\mu }}-{\hat{\nu }}) \right\} , \end{aligned}$$
(2.33)
$$\begin{aligned} z_{\mu \nu }(x)&= z(x,\mu ) z(x+{\hat{\mu }},\nu ) z(x+{\hat{\nu }},\mu )^\dag z(x,\nu )^\dag . \end{aligned}$$
(2.34)

The normalization is chosen in such a way that \( -i e_0 \hat{A}_{\mu \nu }(x) \) is the canonically-normalized field tensor in the naive continuum limit. Notice that the field tensors are anti-hermitian.

In presence of electromagnetism, the Dirac operator depends on the electric charge of the quark field. Let q be the physical electric charge in units of e (i.e. \(q=2/3\) for the up quark, and \(q=-1/3\) for the down quark). In the compact formulation of QED, all electric charges must be integer multiples of an elementary charge \(q_\mathrm {el}\), which appears as a parameter in the U(1) gauge action (2.26). The integer parameter

$$\begin{aligned} {\hat{q}} = \frac{q}{q_\mathrm {el}} \in {\mathbb {Z}} \end{aligned}$$
(2.35)

is the one appearing in the hopping term in Eqs. (2.30) and (2.31). On the other hand, notice that the SW term (2.32) is written in terms of the physical charge q. This normalization corresponds to a definition of \(c_{\mathrm {sw}}^\mathrm {U(1)}\) which is equal to 1 at tree level. The definition of the even–odd preconditioned Dirac operator \(\hat{D}\) is standard [36]

$$\begin{aligned} {\hat{D}} = D_\text {ee} - D_\text {eo} D_\text {oo}^{-1} D_\text {oe}, D = \begin{pmatrix} D_\text {ee} &{} D_\text {eo} \\ D_\text {oe} &{} D_\text {oo} \end{pmatrix} \, \end{aligned}$$
(2.36)

and so is the definition of the small-determinant action \(S_{\mathrm {sdet}}\) appearing in Eq. (2.5)

$$\begin{aligned} S_{\mathrm {sdet}}= -\sum _{f} \alpha _f \,\mathrm {tr}\,\log ({{\mathbf {1}}+ D_{f,\text {oo}}}). \end{aligned}$$
(2.37)

3 Simulating QCD+QED with openQ*D

3.1 Structure of the openQ*D program package

The openQ*D code includes several main programs, roughly divided in three categories: programs to generate configurations, programs to measure observables, and utility programs. The following programs (in the main directory) can be used to generate gauge configurations for various theories:

  • iso1: SU(3)\(\times \)U(1) gauge theory with dynamical fermions;

  • qcd1: SU(3) gauge theory with dynamical fermions;

  • ym1: SU(3) pure gauge theory;

  • mxw1: U(1) pure gauge theory.

The following programs (in the main directory) can be used to calculate simple observables:

  • ms1: reweighting factors (see Sect. 3.2 and Appendix A);

  • ms2: spectral range of \((\hat{D}^\dag \hat{D})^{1/2}\) (\(\hat{D}\) is the even–odd preconditioned Dirac operator);

  • ms3: SU(3) Wilson-flow observables;

  • ms4: quark propagators;

  • ms5: U(1) Wilson-flow observables;

  • ms6: neutral pseudoscalar–pseudoscalar and axial–pseudoscalar correlators.

Finally, the following utility programs are also included:

  • minmax/minmax: it generates the rational approximations needed for the RHMC algorithm;

  • devel/nompi/read*: they can be used to read the binary *.dat files generated by the other programs.

3.2 User guide for the dynamical QCD+QED simulation program iso1

3.2.1 Compiling and running the main program

A complete guide to the usage of all programs listed in Sect. 3.1 can be found in the headers of the source-code files, and in the README files in the corresponding directories. Often the user will be referred to other sources of documentation (e.g. README files in some of the modules subdirectories, or the headers of other source-code files, and some of the PDF files in the doc directory). This section is intended to be neither a replacement nor a duplicate of these sources of documentation, but rather an overview of the main steps that are needed to use the iso1 program to generate QCD+QED configurations.

  1. 1.

    Download the code and check the dependences. The code is publicly available on GitLab at https://gitlab.com/rcstar/openQxD. The simulation and measurement programs, i.e. all programs in the main directory, require some MPI libraries compliant with the MPI 1.2 (or later) standard. The minmax program requires the GMP (https://gmplib.org) and GNU MPFR (http://www.mpfr.org) libraries. Notice that the minmax program can be run on a personal computer and does not need MPI, therefore one does not need to install the GMP and GNU MPFR libraries on production machines.

  2. 2.

    Set the environment variables. The Makefile in the main directory assumes that the C compiler can be called by using $(GCC), the MPI header file is found at $(MPI_INCLUDE)/mpi.h, the MPI compiled library is found in the $(MPI_HOME)/lib/ directory, and the mpicc command is available. The needed environment variables can be defined in the appropriate shell initialization files, e.g.

    figure b
  3. 3.

    Choose the intrinsics acceleration options. Some pieces of code exist in several versions: plain C, inline-assembly with SSE instructions, and inline-assembly with AVX instructions. The default Makefile uses the C version of the code. In order to use the inline-assembly version, one needs to modify the CFLAGS variable defined in lines 122–124 of main/Makefile. For instance, on some x86-64 machines one can use

    figure c

    which activates AVX and FMA3 instructions and assumes that prefetch instructions fetch 64 bytes at a time. For a full description of available options, refer to the README file in the root directory.

  4. 4.

    Choose the lattice geometry. The lattice geometry is chosen at compile time by modifying the macros defined in the first part of the include/global.h file. A full description of these macros can be found in the main/README.global file. For instance the following choice

    figure d

    corresponds to an \(8^4\) local lattice, replicated on an \(8^2 \times 4^2\) MPI process grid (the code will need to be run with 1024 MPI processes), which yields a \(64^2 \times 32^2\) global lattice. As explained in Sect. 2.1, this choice of simulation parameters corresponds to a \(64^2 \times 32^2\) physical global lattice if no \(\mathrm {C}^*\) boundary conditions are used, or to a \(64 \times 32^3\) physical global lattice if \(\mathrm {C}^*\) boundary conditions are used in at least one spatial direction. In our implementation, NPROCn has to be a multiple of 2 if \(\mathrm {C}^*\) boundary conditions are used in the direction \(\texttt {n}=1,2,3\).

  5. 5.

    Compile the iso1 program and prepare for running. At this point, the code is ready to be compiled. Assuming that the root directory of the code is $HOME/openQxD, this is done by executing the following commands in a bash shell.

    figure e

    One can set up the directories and files to run the code by executing the following commands in a bash shell.

    figure f
  6. 6.

    Edit the input file. The input file input/pedro01.in must contain all adjustable parameters of the simulation (except the few ones that have been set at compile time). A rough guide on how to construct an input file for the iso1 program is found in Sect. 3.2.2. Alternatively, a sample input file can be cut and paste from Appendix C.

  7. 7.

    Start the simulation. Edit the runtest.sh script as follows:

    figure g

    The runtest.sh script contains the command that invokes the iso1 program. It can be launched via a standard mpirun command, or incorporated in a script for a job scheduler. Recall that the number of needed MPI processes has been decided at compile time, and it is equal to 1024 in this case. The iso1 program takes a number of command-line options: the input file is specified with the -i option, the -noloc option specifies that the configuration files must be saved by a single MPI process, the -rmold specifies that only the most recent configuration must be kept and all previous ones must be deleted. The program will start the simulation from a randomly generated configuration. More details about the command-line options can be found in the main/README.iso1 file.

  8. 8.

    Interrupt the simulation. Assuming that no error is produced, the simulation code will end naturally when all the configurations requested in the input file are generated. If the simulation needs to be interrupted earlier, one can just execute the following commands in a bash shell.

    figure h

    The simulation code will stop gracefully right after the next configuration is saved.

  9. 9.

    Resume the simulation. Assuming that the last generated configuration was pedro01n42, edit the input file and set the nth variable in the [MD trajectories] section to 0 (see below for a description of the input file), and edit the runtest.sh script as follows:

    figure i

    Once this is executed, the simulation will continue from where it was interrupted.

3.2.2 Constructing the input file for iso1

Most of the parameters needed to generate configurations are passed to the iso1 program by means of a human-readable input file, in this case pedro01.in in the test/input directory. For a full description of the various parameters, the reader is referred to the main/README.iso1 and doc/parms.pdf files (and references therein). A rough guide to the various sections that compose the input file is provided here, with no ambition of completeness.

  1. 1.

    Run name and output directories.

    figure j

    The program iso1 will produce several output files:

    • ./log/pedro01.log, human-readable file, with general information about the simulation;

    • ./dat/pedro01.dat, binary file, with the history of simple diagnostic observables;

    • ./dat/pedro01.ms3.dat and ./dat/pedro01.ms5.dat, binary files, with the history of SU(3) and U(1) Wilson flow observables;

    • ./dat/pedro01.par, binary file, with all simulation parameters;

    • ./dat/pedro01.rng, binary file, with the state of the random number generator at the time of the most recent saved configuration;

    • ./cnfg/pedro01n*, binary files, with the gauge configuration.

    For every file in the log and dat directories, a backup file identified by a tilde at the end of its name is created and updated every time a configuration is saved.

  2. 2.

    Schedule management.

    figure k

    The program iso1 will print one entry in the log file every 5 MD trajectories, will measure and print Wilson flow observables every 10 MD trajectories, will save a configuration every 50 MD trajectories. The first 100 trajectories are considered of thermalization (no observables are measured), a total of 800 MD trajectories will be generated and 15 configurations will be saved.

  3. 3.

    Ranlux [45] initialization.

    figure l
  4. 4.

    Boundary conditions.

    figure m

    In this case periodic boundary conditions are chosen in time, and \(\mathrm {C}^*\) boundary conditions in all 3 spatial directions. The implementation of \(\mathrm {C}^*\) boundary conditions in openQ*D is described in Sect. 2.1. If SF or open-SF boundary conditions are chosen in time, the number of parameters in this section increases, as one needs to specify the value of the fields on the SF boundaries. For a full description of these parameters, refer to doc/parms.pdf.

  5. 5.

    Gauge actions.

    figure n

    If different boundary conditions in time are chosen, the number of parameters in these sections increases, as one needs to specify the O(a)-improvement boundary coefficients. Refer to doc/gauge_action.pdf, doc/parms.pdf of all these parameters.

  6. 6.

    Quark flavours. In the terminology of the openQ*D code, a quark flavour is identified by all adjustable parameters that define the Dirac operator. For instance, in a simulation in the isospin symmetric limit, the up and down quark count as a single quark flavour. In the following example, two quark flavours are requested, and the parameters of the corresponding Dirac operators are initialized.

    figure o

    If different boundary conditions in time are chosen, the number of parameters in these sections increases, as one needs to specify the O(a)-improvement boundary coefficients. Also, if no \(\mathrm {C}^*\) boundary conditions are used, one can choose phase-periodic boundary conditions for fermions in space. Refer to doc/dirac.pdf, doc/parms.pdf for a detailed explanation of all these parameters.

  7. 7.

    Rational approximation. With \(\mathrm {C}^*\) boundary conditions, the Pfaffian of the even–odd preconditioned Dirac operator \({\hat{D}}\) is needed, whose absolute value can be generated by a pseudofermion effective action of the type \(\psi ^\dag (\hat{D}^\dag \hat{D})^{-1/4} \psi \). The fractional power of \(\hat{D}^\dag \hat{D}\) is replaced by a rational approximation, which must be generated by means of the minmax program [46, 47]. We sketch here how to use this program, see minmax/README for more details.

    First, one needs to modify the GCC and MPLIBPATH variables in minmax/Makefile. The Makefile assumes that the C compiler can be called by using $(GCC), the GMP and MPFR header files are found in the $(MPLIBPATH)/include/ directory, and the compiled libraries are found in the $(MPLIBPATH)/lib/ directory.

    figure p

    The minmax program is compiled and executed with the following commands in a bash shell.

    figure q

    A rational approximation for \((\hat{D}^\dag \hat{D})^\alpha \) is requested, with \(\alpha =(-1)/(4)\) (-p and -q options), assuming that the eigenvalues of \((\hat{D}^\dag \hat{D})^{1/2}\) are in the interval \([1.98 \times 10^{-3} , 7.62]\) (-ra and -rb options), with a target relative precision of \(6 \times 10^{-5}\) (-goal option). The spectral range of \((\hat{D}^\dag \hat{D})^{1/2}\) must be guessed at first, but after some configurations have been generated it can be calculated with the program main/ms2. The minmax program creates a directory with a very long name, in this case

    $$\begin{aligned}&\texttt {p-1q4mu0.00000000e+00ra1.98000000e}\\&\quad \texttt {-03rb7.62000000e+00} \end{aligned}$$

    which contains several files named n*.in. The integer in the file name corresponds to the order of the generated rational approximation. Only the highest order rational approximation, n10.in in this case, meets the requested precision. The full content of the n10.in must be pasted in the input file in a section of the following type,

    figure r

    Notice that more than one rational approximation can be used in the same input file (e.g. one may want to use different rational approximations for the up, down and strange quarks). Each rational approximation is identified by the integer in the section title.

  8. 8.

    MD Hamiltonian and integrator.

    figure s

    The MD Hamiltonian is given by the canonical kinetic term of the SU(3) gauge field, the kinetic term of the U(1) gauge field, and a sum of terms which do not depend on the MD momenta and are referred to as actions. The kinetic term of the U(1) gauge field can be chosen to be of two types: the canonical one (facc=0), or the Fourier-accelerated one (facc=1). Refer to doc/fourier.pdf and Sect. 2 for details on Fourier acceleration. The MD equations are solved by means of an approximate symplectic multilevel integrator, built in terms of standard elementary integrators. For each level, one needs to specify how many times the elementary integrator needs to be applied and which forces need to be integrated. Refer to doc/parms.pdf and module/update/README.mdint for details on the integrator.

    The actions and forces are uniquely identified by an ID. Obviously there is a one-to-one correspondence between actions and forces. Corresponding actions and forces must share the same ID. The gauge actions and forces must be included, i.e.

    figure t

    In this example, two pseudofermion actions are used (notice that this number matches the number of pseudofermion fields requested in the [HMC parameters] section), one for up quark and one for the down quark.

    figure u

    Notice that openQ*D allows for frequency splitting (not used in this example): the poles and zeroes of the rational approximations can be separated in different pseudofermion actions. This is convenient because one may want to integrate different poles and zeroes in different levels of the integrator, and also one may want to use different solvers for different poles. For details on the pseudofermion actions and forces, and on the frequency splitting, one should refer to doc/rhmc.pdf and Sect. 2.

  9. 9.

    Solvers. Two multi-shift CG solvers are used in this example, with different residue for the actions and the forces.

    figure v

    For details on the usage of other solvers, one should refer to doc/parms.pdf. The deflated solver (DFL_SAP_GCR) requires to set parameters for the generation and update of the deflation subspaces, also described in doc/parms.pdf. See also Sect. 4.4.

  10. 10.

    Wilson flow parameters. The iso1 program measures on the fly a number of simple observables (actions, SU(3) topological charge, electromagnetic fluxes) at positive flow time.

    figure w
Fig. 3
figure 3

Results for strong (left) and weak (right) scaling of the application of the Dirac operator and SAP preconditioner as explained in the text. The speedup factors for the Dirac operator are multiplied by a factor 10 for better visibility. The dashed lines indicate perfect scaling behaviour accordingly

4 Performance and testing

4.1 Code performance on parallel machines

For future reference and comparison, benchmark measurements have been performed for the timing of the application of the double precision Wilson–Dirac operator and the SAP (Schwartz-Alternating-Procedure) preconditioner. The HPC cluster at CERN has been used, which features 72 nodes, each of them with two 8-core Intel® Xeon processors (E5-2630 v3, Haswell) running at about 2.4 GHz base frequency (3.6 GHz max.). Nodes are connected with Mellanox® Infiniband FDR (56 Gb/s).

The timings are obtained with the time2 programs located in the subdirectories devel/dirac and devel/sap. All measured times have been normalised to the smallest partition (one node or 16 cores). The results of these scaling tests are shown in Fig. 3. A QCD+QED setup with open boundary conditions in time and \(\mathrm {C}^*\) boundary conditions in one spatial direction has been used.

The weak scaling test has been performed with a local lattice size of \(8 \times 16 \times 8 \times 8\), giving an extended lattice with total volume . Because of the \(\mathrm {C}^*\) boundary conditions this corresponds to a physical lattice with volume , cf. Sect. 2.1. While for the Dirac operator, parameters similar to the Quark flavours example (point 6) in Sect. 3.2 have been used, the SAP preconditioner specifically employs a block size of \(4^4\) with five SAP cycles (ncy 5) and five iterations (nmr 5) of the even–odd preconditioned Minimal Residue (MinRes) block solver. The setup is similar for the strong scaling study but with a constant total volume of \(V_{\mathrm {C}^*} =2 \cdot 64\times 32^3\) and varying local lattice sizes. In case of the double precision Wilson–Dirac operator, a much larger lattice volume with \(V_{\mathrm {C}^*} =2\cdot 64^4\) total lattice points was probed as well. As it can be seen in the left panel of Fig. 3 the larger lattice is performing even better than the smaller one.

In summary, the overall scaling studied here is close to optimal and small deviations may partly result from remaining indigestions of the underlying network. Similar studies have to be done on other machines but the overall behaviour is expected to be similar to the original openQCD code. Indeed, as already stressed, the openQ*D solvers are identical to the openQCD one. The Dirac operator is almost identical in the two codes, with the only difference that openQ*D uses the precalculated U(3) gauge field \(U z^{{\hat{q}}}\) instead of the SU(3) gauge field U. At fixed gauge background, the number of operations per lattice site performed by the Dirac operator is identical in the two codes, and so is the number of operations per lattice site per cycle performed by the solvers.

4.2 Low-level tests

The openQ*D code has been tested by means of an extensive battery of check programs, which can be found in the subdirectories of devel.Footnote 4 These programs have been taken over from openQCD-1.6 and NSPT-1.4, and extended in order to test the specific feature of the openQ*D code. Roughly speaking, the check programs in each devel subdirectory test features of the corresponding module subdirectory. Many check programs test also interactions between different modules. These programs are meant to be used by developers only and contain very limited documentation. Providing a description of the check programs is outside of the scope of this paper, and a short description can be found in the INDEX files in each devel subdirectory. However, it is worth to point out a few facts. All check programs have been run with all possible combinations of boundary conditions in the space and temporal directions. Whenever possible, all check programs have been run in a pure QCD setup (i.e. only the SU(3) gauge field is allocated), a pure QED setup (i.e. only the U(1) gauge field is allocated), and a QCD+QED setup (i.e. both gauge fields are allocated). All check programs have been run with various geometric configurations, i.e. lattice size and processor grid. Besides a plethora of minor details, specific check programs have been written to test:

  • the implementation of \(\mathrm {C}^*\) boundary conditions for both gauge fields and for the Dirac operator;

  • general properties of the Dirac operator with generic electric charge (e.g. gauge convariance, translational covariance, \(\gamma _5\)-hermiticity, comparison to analytic expression in case of zero gauge field);

  • the rational approximation of generic powers, and the associated reweighting factors;

  • the forces for the U(1) field, the QED action, the U(1) Wilson flow, the U(1) observables (e.g. clover field tensor, electromagnetic fluxes);

  • the MD with the U(1) field, with and without Fourier acceleration.

Fig. 4
figure 4

Violations of MD Hamiltonian conservation \(\Delta H\) as a function of the MD integration step-size \(\Delta \tau \), for all available integrators (LF, OMF2, OMF4), with and without Fourier Acceleration (FA). The lines represent the fit functions provided in the legend

Table 1 Details of test runs employing \(\mathrm {C}^*\) boundary conditions in 3 spatial directions, and periodic boundary conditions in the temporal direction. Note that due to the \(\mathrm {C}^*\) boundary conditions, the global (simulated) lattice \(V_{\mathrm {C}^*}\) is two times larger than the physical lattice because of the orbifold construction. \(N_\mathrm {f}=3\) simulations of QCD\(+\)QED (Q*D1) use the tree-level improved Symanzik gauge action (LW) for the \(\mathrm {SU}(3)\) gauge field with \(c_{\mathrm {sw}}^\mathrm {SU(3)}\) taken from [49], and the Wilson plaquette action (W) for the electromagnetic field with \(c_{\mathrm {sw}}^\mathrm {U(1)}=1\). Furthermore, the electromagnetic coupling is set to \(\alpha _\mathrm {0}=0.05 \approx 7 \alpha _\mathrm {0}^\text {phys}\) with \(q_\mathrm {el}=1/6\), i.e., the doublet \((d\, s)_{-1/3}\) and \((u)_{+2/3}\) have been simulated. The \(N_\mathrm {f}=2\) pure QCD simulation (QCD1) uses the plaquette action with non-perturbative \(c_{\mathrm {sw}}^\mathrm {SU(3)}\) of Ref. [50], and the lattice spacing was determined in Ref. [51]. All runs have degenerate quarks with hopping parameter \(\kappa \). Values for the neutral pseudoscalar mass \(m_{\mathrm {PS}}\) are given, as well as the flow time \(t_0/a^2\) from which we naively derive the approximate lattice spacing of Q*D1 using results of Ref. [52]. The number of simulated Molecular Dynamics Units (MDUs) after thermalization is reported in the last column. In both cases a MD trajectory length \(\tau =2\) has been used.

4.3 Conservation of the Hamiltonian with Fourier acceleration

The use of Fourier Acceleration in QCD+QED simulations modifies the MD Hamiltonian and, consequently, the MD equations. In order to test the consistency between the two, one can look at the violation \(\Delta H\) of Hamiltonian conservation as a function of the MD integration step-size \(\Delta \tau \). The violation should vanish as a positive power of the integration step-size in the \(\Delta \tau \rightarrow 0\) limit. The power depends on the chosen integrator. When the total trajectory length is kept constant, the leap-frog integrator (LF) and 2nd order Omelyan–Mryglod–Folk (OMF2) integrators yield \(\Delta H \sim (\Delta \tau )^2\), while the 4th order Omelyan–Mryglod–Folk (OMF4) integrator yields \(\Delta H \sim (\Delta \tau )^4\).

Figure 4 shows the violation \(\Delta H\) as a function of \(\Delta \tau \) for all integrators, with and without Fourier Acceleration. A two parameter function \(\Delta H = a \, \Delta \tau ^b\) has been fitted to the data points. In all cases the obtained exponent is reasonably close to the expected one. This test has been performed on a single thermalized configuration taken from the Q*D1 ensemble (Table 1). As expected there is a clear hierarchy among the three integrators. More interestingly, Fourier Acceleration has the effect of reducing significantly \(\Delta H\). While no definite conclusion can be drawn from a single-configuration experiment in this regard, the same phenomenon has been observed in the generation of ensembles with the same parameters as the Q1 and Q2 runs described in [48], Table 2, with and without Fourier Acceleration: when Fourier Acceleration is turned on, if one wants to keep the acceptance rate the same, larger values of \(\Delta \tau \) can be typically chosen. Obviously this does not mean that it is always convenient to use Fourier Acceleration. In order to understand whether this is the case, one should take into account the computational overhead and the variation in autocorrelations. Fourier acceleration is known to reduce significantly autocorrelations in the case of the free scalar theory, but also in the case of non-compact pure U(1) theory [11], which is a theory of free photons. However, in the experiments with the Q1 and Q2 ensembles discussed above, no significant difference could be detected in the autocorrelation times after thermalization. This may indicate that autocorrelations are unaffected by Fourier Acceleration in the interacting case. Substantiating this statement certainly requires a much more detailed study.

4.4 Performance of locally deflated solver in QCD+QED

Fig. 5
figure 5

Mass of the \({\bar{Q}}' \gamma _5 Q\) valence pseudoscalar neutral meson has been calculated as a function of q and \(am_0=1/(2\kappa )-4\). QCD + qQED setup: SU(3) configurations are taken from the QCD1 ensemble (Table 1) and pure U(1) configurations are generated with \(\alpha _\mathrm {0}= 0.05\) and \(q_\text {el}=1/6\). The dashed curves are fits to the expected (leading order) quark mass dependence, \([m_{\mathrm {PS}}(q)]^2 = B(q) \{m_0-m_\mathrm {cr}(q)\}\), and are shown only to guide the eye. The gray dashed line indicates the mass of the unitary point of the QCD simulation. In all cases, 50 gauge configurations separated by 26 MDUs have been used

The use of efficient solvers is a key factor in enabling simulations at quark masses close to the physical point. The openQ*D code inherits all the solvers of the openQCD-1.6 package: Conjugate Gradient (CG), Multi-Shift Conjugate Gradient (MSCG), Generalized Conjugate Residual algorithm with Schwartz-Alternating-Procedure as preconditioning (SAP+GCR), and a deflated version of it (DFL+SAP+GCR). The deflated solver implements the idea of inexact deflation introduced in [22, 53] and an improvement involving inaccurate projection in the deflation preconditioner proposed in [54].

As the Dirac operator is passed as an argument to these solvers, their implementation is blind to the coupling to the U(1) field and to \(\mathrm {C}^*\) boundary conditions. The efficiency of these solvers may be affected in principle by the coupling to the U(1) field, i.e. may depend on the electric charge of the Dirac operator. However this turns out not to be the case. The goal of this section is to describe two tests in support of this statement. These tests have been run on Altamira HPC at IFCA-CSIC, which consists of 158 computing nodes, each of them with two Intel® Xeon processors (E5-2670) at 2.6 GHz. Nodes are connected with Mellanox® Infiniband FDR (56 Gb/s).

An electroquenched (QCD+qQED) setup has been considered for both tests, with SU(3) configurations from the QCD1 ensemble (Table 1) and pure U(1) configurations generated with \(\alpha _\mathrm {0}= 0.05\) and \(q_\mathrm {el}=1/6\). Two degenerate valence quarks Q and \(Q'\) have been considered, with electric charge q and bare mass \(m_0\). The mass \(m_{\mathrm {PS}}\) of the \({\bar{Q}}' \gamma _5 Q\) valence pseudoscalar neutral meson has been calculated as a function of q and \(m_0\) and is shown in Fig. 5. Notice that the critical bare mass depends very heavily on the electric charge, as expected. For this reason it makes sense to compare the solver performance for different electric charges keeping fixed the value of \(m_{\mathrm {PS}}\) (rather than the bare mass).

In the first test, the time needed to invert the even–odd preconditioned Dirac operator (with a representative QCD+qQED configuration) on 15 random sources has been measured, using the CG, SAP+GCR, and DFL+SAP+GCR solvers. The shortest time has been plotted in Fig. 6 for electric charges \(q=0,-1/3,2/3\) and a range of values of \(m_{\mathrm {PS}}\). It is evident that the performance of all solvers is insensitive to the electric charge.

Fig. 6
figure 6

Comparison of performance of various solvers and various electric charges as a function of the mass \(m_{\mathrm {PS}}\) of the valence neutral pion. In all cases, the inverse of the even–odd preconditioned Dirac operator has been calculated on random sources. One representative QCD+qQED configuration has been used (SU(3) configuration from the QCD1 ensemble, Table 1, and pure U(1) configuration generated with \(\alpha _\mathrm {0}= 0.05\) and \(q_\mathrm {el}=1/6\)). The same residue of \(10^{-10}\) has been chosen for the three solvers. The solver performance is insensitive to the electric charge

Fig. 7
figure 7

Comparison of performance of various solvers and various electric charges as a function of the twisted mass \(\mu \). In all cases, the inverse of \((\hat{D}^\dag \hat{D}+ \mu ^2)\) has been calculated on random sources. The mass of the valence neutral pion (calculated at \(\mu =0\)) has been chosen to be \(m_{\mathrm {PS}}\simeq 329\,\mathrm {MeV}\). One representative QCD+qQED configuration has been used (SU(3) configuration from the QCD1 ensemble, Table 1, and pure U(1) configuration generated with \(\alpha _\mathrm {0}= 0.05\) and \(q_\mathrm {el}=1/6\)). The same residue of \(10^{-8}\) has been chosen for the three solvers. The solver performance is insensitive to the electric charge. As expected, the deflated solver loses efficiency at large values of \(\mu \) and eventually fails to converge for the three highest value

One important caveat needs to be pointed out for the DFL+SAP+GCR solver. Before applying this solver, one needs to generate the deflation subspace, which is constructed from approximate eigenvectors of the Dirac operator. The code allows the possibility to choose different parameters for the Dirac operator used in the solver and the one used to generate the deflation subspace. This is very useful in practice since having a slightly heavier bare mass or even a twisted mass for the generation of the deflation subspace generally speeds up the calculation without affecting the performance of the solver. On the other hand, it is crucial to generate the deflation subspace with the same electric charge of the Dirac operator that needs to be inverted. If this is not done, the DFL+SAP+GCR solver loses efficiency dramatically. For this reason, in contrast to openQCD-1.6, the openQ*D code can handle simultaneously several deflation subspaces. These deflation subspaces can be generated with different parameters and will all be updated during the MD evolution. The user can specify in the input file which deflation subspace should be used for each DFL+SAP+GCR solver independently. In practice, in a realistic QCD+QED simulation, one would need to generate only two deflation subspaces, one for up-type quarks and one for down-type quarks. It has been checked also that the time needed to generate the deflation subspace is insensitive to the electric charge as long as \(m_{\mathrm {PS}}\) is kept fixed.

In the second test, a single value of \(m_{\mathrm {PS}}\simeq 354\,\mathrm {MeV}\) has been chosen, and the time needed to invert \((\hat{D}^\dag \hat{D}+\mu ^2)\) has been measured for various values of the twisted mass \(\mu \), using the CG and DFL+SAP+GCR solvers. One representative QCD+qQED configuration and 48 random sources have been used. The shortest time has been plotted in Fig. 7 for electric charges \(q=0,-1/3,2/3\) and a range of values of \(\mu \). The inversion of \((\hat{D}^\dag \hat{D}+ \mu ^2)\) is relevant to calculate the rational approximation of non-integer powers of \(\hat{D}^\dag \hat{D}\) (see Sect. 2). Also in this case, the performance of the two solvers is seen to be insensitive to the electric charge as long as \(m_{\mathrm {PS}}\) is kept fixed.

Fig. 8
figure 8

Selected observables for simulation Q*D1 including thermalisation part. Left–right/top–bottom: HMC energy violations \(\Delta H\), average plaquette for SU(3) and U(1) gauge fields, energy density E(t) for SU(3), energy density for U(1), topological charge Q(t), lowest eigenvalue \({\hat{\lambda }}_\mathrm{min}\) in the spectrum of \(|\gamma _5 \hat{D}|\), and reweighting factors \(W_q\) for two different numerical accuracies, \(\delta =O(10^{-11})\) (left) and \(\delta =O(10^{-9})\) (right)

4.5 Key observables for HMC simulations of QCD\(+\)QED

Beside the electroquenched tests in the previous section, a new set of tests is done using dynamical QCD+QED simulations with Wilson fermions and \(\mathrm {C}^*\)  boundary conditions. The dynamical degrees of freedom of the U(1) gauge field are included in the simulation labeled Q*D1 in Table 1. Q*D1 takes over the parameters from the H200 ensemble of the \(N_\mathrm {f}=2+1\) CLS [55] effort, except that the lattice extent is halved in each of the space-time directions. As the dynamical U(1) degrees of freedom contribute to the renormalization of the bare parameters, the estimate for the lattice spacing and pion mass cannot be taken over from the CLS ensembles,Footnote 5 but rather need to be estimated independently. Such an endeavour is beyond the scope of this paper. However, an estimate for \(t_0/a^2\) is given in Table 1 for future reference. The reference flow time \(t_0\) is implicitly given by \([t_0^2\langle E(t_0)\rangle ]=0.3\) using the Wilson flow and clover discretisation of the SU(3) field strength tensor in the definition of the energy density E(t) [56]. A rough estimate of a is given after naively matching \(t_0/a^2\) to the data provided in Table III of Ref. [52].

Although openQ*D allows for twisted-mass reweighting, that option is not required for Q*D1 (\({\mu }=0.0\)). All three bare sea quark masses, \(am_{0,i}=1/(2\kappa _i)-4\), are taken to be degenerate. As demonstrated in the previous section and shown in Fig. 5, this necessarily leads to a large difference in the neutral pseudoscalar masses due to the differences in quark charges. One thus ends up with a degenerate pair of down-type quarks (\(q=-1/3\)), and a single but significantly heavier up-type quark (\(q=2/3\)). Hence, the simulations are essentially probing a somewhat unphysical version of the \(N_\mathrm {f}=2+1\) theory, but are sufficient to probe standard observables and performance of the code.

In Fig. 8 a summary of selected observables is given for simulation Q*D1. The run was stable and did not show any particular issue during the course of the simulation. Most of the observables presented in the following include the thermalisation part. Starting from a random configuration, the HMC energy violations, measured every trajectory (\(\tau =0.7\) MDU), drop after a few iterations and stably fluctuate in the range \([-\,0.5,+\,0.5]\). The simulation employs the OMF4 integrator without Fourier acceleration and the spectral ranges of the individual quark flavours have been properly set. Next the average plaquette for the SU(3) and U(1) gauge fields are presented. The former is shifted by a constant amount for better comparison. The SU(3) plaquette has much larger statistical fluctuations and requires longer thermalisation times than the U(1) plaquette even without Fourier acceleration. The next two plots show the two available definitions of the (renormalized) energy density E(t) at a flow time \(t=3.2\) for the SU(3) and U(1) part, respectively. The topological charge Q (measured at the same flow time) fluctuates well after rapid changes during the thermalisation phase of the run. The smallest eigenvalues of \(|\gamma _5 \hat{D}_\mathrm {u}|\) and \(|\gamma _5 \hat{D}_\mathrm {d/s}|\) follow, confirming that the lower end of the spectral ranges of the rational approximations have been chosen correctly. No exceptionally small values are present, which is not surprising considering the heavy pseudoscalar mass simulated here.

The Q*D1 run has been produced with a rational approximation with relative precision \(\delta = O(10^{-11})\). A second run has been performed with the same parameters as Q*D1 except for the rational approximation, which has been chosen with relative precision \(\delta = O(10^{-9})\). The logarithms of the reweighing factors for both runs are shown in the last two panes of Fig. 8. As expected, the reweighting factor for the run with a better rational approximation is closer to 1 (and its logarithm is closer to 0).

5 Summary and outlook

We presented openQ*D [1], the first open source package which allows to perform full lattice simulations of QCD+QED, QCD or QED. The code implements the proposal of Ref. [18] and allows to choose \(\mathrm {C}^*\) boundary conditions along the spatial directions but also periodic boundary conditions can be simulated efficiently. Moreover, the chosen theory can be simulated by choosing either periodic, Schrödinger Functional or open boundary conditions along the time direction.

The new code is based on the openQCD [2] package from which it inherits the highly optimized implementation of the Dirac operator, of the solvers, of the HMC and of the RHMC algorithms. The openQ*D package extends the algorithmic functionalities of the openQCD code by giving the possibility of using multiple deflation subspaces in a single simulation, of implementing rational approximations of generic powers of the Dirac operator (with and without twisted-mass preconditioning) and by implementing Fourier Acceleration for the evolution of the U(1) field.

We presented the main functionalities of the code and discussed the theoretical motivations behind the algorithmic choices and their specific implementations. We also presented a guide to instruct the user to run a full QCD+QED simulation with openQ*D and discussed the results of some tests. These include low-level tests aiming at assessing the correctness of the implementation of the different algorithms but also some benchmarks to measure the performance of the code.

In future releases we plan to add a number of features. Concerning the configuration generation, we will include the possibility to use a gauge-fixed non-compact formulation of QED. We will also provide programs to calculate a number of observables, in particular charged-meson two-point functions with QED dressing-factors along the lines of [18], and quark gradient flow observables [57]. Finally we will consider incorporating some of the algorithmic developments discussed in [58], in particular stabilized Wilson fermions.

Given the good performance and high scalability on modern supercomputing cluster architectures, openQ*D can profitably be used to generate QCD+QED gauge configurations with \(\mathrm {C}^*\) boundary conditions (but not only) in a realistic setup with the aim of computing QED radiative corrections to phenomenologically relevant observables.