BasisGen: automatic generation of operator bases

BasisGen is a Python package for the automatic generation of bases of operators in effective field theories. It accepts any semisimple symmetry group and fields in any of its finite dimensional irreducible representations. It takes into account integration by parts redundancy and, optionally, the use of equations of motion. The implementation is based in well-known methods to generate and decompose representations using roots and weights, which allow for fast calculations, even with large numbers of fields and high-dimensional operators. BasisGen can also be used to do some representation-theoretic operations, such as finding the weight system of an irreducible representation from its highest weight or decomposing a tensor product of representations.


Introduction
Effective field theory is a widely used framework for parameterizing the physics of systems whose degrees of freedom and symmetries are known. An effective Lagrangian is a linear combination of all local operators that can constructed with the fields in the theory, with the restriction that they are invariant under the action of the symmetry group. Usually, there is some other constraint that reduces the number of possibilities to a finite one, such as imposing a maximum canonical dimension. In this context, it is often convenient to obtain a complete set of independent operators, which is called a basis. BasisGen automatizes this task.
The input data needed for this calculation are the symmetry group G of the theory and the representation of G corresponding to each field. Once they are specified, one can obtain, for every monomial in the fields, the number of independent ways of forming an invariant under the action of G out of it. It must also be taken into account that total derivative terms can be added to the Lagrangian without changing the physics (except for effects of surface terms in the action). This means that some operators with derivatives can be rewritten in terms of others. Moreover, at each order in the effective Lagrangian, the addition of an operator proportional to the equations of motion does not change the S matrix up to higher order effects [1][2][3][4][5]. It follows that the equations of motion can be used, for example, to obtain a basis in which all the operators proportional to the functional derivative of the kinetic term have been removed [6][7][8][9][10]. For the Standard Model Effective Field Theory (SMEFT) (see ref. [11] for a review), several bases and (incomplete) sets of independent operators have been computed taking all these facts into account [12][13][14][15]. Computer tools can be used to translate from one basis to another [16][17][18][19].
In the last few years, many developments have been made in the automatization of the generation of operator bases. Hilbert series methods provide an elegant way to compute invariants [20][21][22][23][24]. They can be directly implemented in a computer system with symbolic capabilities, as done for the SMEFT case in the auxiliary Mathematica notebook of ref. [23]. One possible drawback of this approach, when used in computer code, is its performance, as an overhead due to the symbolic nature of the calculations might be introduced. The program DEFT [19], written in Python, uses a different approach to check and generate bases of operators for theories with a symmetry group given by a product of unitary groups.
BasisGen uses yet another approach, which is valid for any semisimple symmetry group and avoids the need for symbolic calculations. The algorithms that it uses to deal with representations of semisimple Lie algebras are the classical ones, based weight vectors. They are reviewed, for example, in ref. [25], and implemented in several computer packages with different purposes [26][27][28][29][30][31]. To remove integration by parts redundancy, an adaptation of the method in ref. [24] is used. BasisGen is ∼ 150 times faster than the implementation in the auxiliary notebook of ref. [23]. For example, BasisGen takes 3 seconds to compute the 84 dimension-6 operators of the 1-generation SMEFT (in a laptop with a 2,6 GHz Intel Core i5 processor), while the notebook of ref. [23] takes 7 minutes. According to ref. [19], DEFT also takes minutes for the same calculation.
For computations with effective field theories, BasisGen assumes 4-dimensional Lorentz invariance. In addition, an internal symmetry group must be specified. This is, in general, the product of the global symmetry group and the gauge group. Derivatives are assumed to be gauge-covariant derivatives, so that the derivative of any field has the same representation under the internal symmetry group as the field itself. The gauge field strengths to be included in a calculation should be provided by the user. The fields must belong to linear irreducible representations of both the Lorentz group and the internal symmetry group. Finally, it is required that a power counting based in canonical dimensions can be used.
In this context, BasisGen generates bases of invariant operators. Sets of all possible covariant operators, with their corresponding irreducible representations (irreps), can also be computed. The basic representation-theoretic functionalities needed for these calculations are: obtaining weight systems of irreps and decomposing their tensor products. An interface for their direct use is provided.
BasisGen can be installed using pip by doing: pip install basisgen. It requires Python version 3.5 or higher. Its code can be downloaded from the GitHub repository https://github.com/jccriado/basisgen, where some examples of usage can be found. A simple script using BasisGen is presented in listing 1. It defines an effective theory with internal symmetry group SU(2) × U(1) for a complex scalar SU(2)-doublet field with charge 1/2. It computes a basis of operators of dimension 8 or less. The output is presented in listing 2. Each line gives the number of independent invariant operators that can be constructed with each field content.
The rest of this article is divided in two sections (apart from the conclusions). They describe BasisGen's implementation (section 2) and interface (section 3).
In the Dynkin basis, which we use in what follows, all weights are tuples of integers. Thus, the operations done here involve only addition and multiplication of integer numbers. Each irrep of a simple algebra is uniquely characterized by its highest weight Λ, which is a tuple (a 1 . . . a n ) of non-negative integers. Every such tuple is the highest weight of one irrep. The complete weight system of an irrep may be obtained from its highest weight by the following procedure: 3. For each positive component λ i > 0, select the ith row α of the Cartan matrix.
Append to W new all weights of the form λ − kα, with 0 < k ≤ λ i . This produces the set W of all weights. The multiplicity n λ of each weight λ can then be obtained recursively using the Freudenthal formula: where δ = (11 . . . 1) and the summation for α runs over all positive roots. The algorithm for the decomposition of a reducible representation as a direct sum of irreps is straightforward: from the collection of weights of the representation in question, find the highest and remove from the collection all the weights in the corresponding irrep. Repeat until the collection is empty. Then, the successive highest weights that were found in the process are the highest weights of the irreps in the decomposition. A direct application of this functionality is to decompose the tensor product of irreps. Let W 1 and W 2 be the weight systems of two representations R 1 and R 2 . The weight system W of R 1 ⊗ R 2 is the collection of all λ 1 + λ 2 for (λ 1 , λ 2 ) ∈ W 1 × W 2 . Once W is constructed, it can be decomposed using the general decomposition algorithm.
In some cases, the symmetric or anti-symmetric tensor power of some representation is needed. If W = {λ i } i∈{1,...,n} is the weight system of some representation R, the weight system of the symmetric tensor power Sym k (R) is the collection of weights computed as λ 1 + · · · + λ k for every k-tuple (λ i 1 , . . . , λ i k ) where i 1 ≤ · · · ≤ i k . The weight system of the anti-symmetric power Λ k (R) is constructed in a similar way, but using all k-tuples (λ i 1 , . . . , λ i k ) with i 1 < · · · < i k instead.

Constructing invariants in effective theories
BasisGen can do calculations for 4-dimensional Lorentz-invariant effective field theories whose internal symmetry group is of the form G × U(1) n , where G is semisimple. An effective theory is specified when the following data are provided: • The semisimple Lie algebra g of G.
• A collection of fields φ 1 , . . . , φ m . Each φ i must be equipped with: of charges under the U(1) factors.
-The statistics S i . Either boson or fermion.
-A positive real number d i , specifying the canonical dimension of the field.
It is assumed that a power counting based in canonical dimensions of the fields, with derivatives having dimension 1, can be applied. This is used to reduce the number of possible operators to a finite one.
The main functionality of BasisGen is to compute the number of independent invariant operators, constructed with the fields φ i and their (covariant) derivatives, and having dimension less than or equal to some fixed d max . To do this, first, all the possible operator field contents are found. The field content for some operator is identified by a tuple C = (e 1 , . . . , e m ), representing the exponents of each field in the operator: O ∼ (φ 1 ) e 1 · · · (φ m ) em . For each C, the following (possible reducible) representation is computed: where T k i (V ) is the symmetric power Sym k (V ) if the statistics S i are bosonic, and the anti-symmetric power Λ k (V ) if they are fermionic. Once Rep(C) is obtained, it is decomposed into a direct sum of irreps. The number of independent invariant combinations of the fields in C is then easily obtained as the number of singlet irreps in the decomposition.
To take into account (covariant) derivatives, the same procedure is used, but now including the fields D µ φ i , {D µ , D ν }φ i , etc. Anti-symmetric combinations of derivatives are automatically discarded, as they are equivalent to field strength tensors. Optionally, the equations of motion of the fields can be applied. This means that, for each D µ 1 . . . D µm φ i , only the totally symmetric representation is retained (see ref. [22]). Integration by parts redundancy can be eliminated by removing from the results all irreps obtained from the decomposition of D µ O, {D µ , D ν }O, etc.

Basic objects
The basic objects for the usage of BasisGen are presented here. All of them can be imported with: from basisgen import ( algebra , irrep , Field , EFT , boson , fermion , scalar , L_spinor , R_spinor , vector , L_tensor , R_tensor ) Functions algebra Creates a (semi)simple Lie algebra from one string argument. The returned object is of the class SimpleAlgebra or SemisimpleAlgebra from the module algebra.

Examples of arguments:
irrep Creates an irreducible representation from 2 string arguments: the first represents the algebra and the second the highest weight 2 . The returned object is of the class representations.Irrep.
The weight system of the a representations.Irrep object can be shown by calling its weights view method. Irreps with the same algebra can be multiplied to get the decomposition of their tensor product. Any two irreps can be added to give an irrep of the direct sum of their algebras.
Examples, showing the weights of the octet irrep of SU(3) (which has highest weight (11)) and the decomposition of the product of a triplet (10) and an antitriplet (01) as an octet plus a singlet: Either boson or fermion boson dimension Canonical dimension of the field 1 number of flavors Number of different copies of the same field 1 EFT Constructor arguments: internal algebra The semisimple Lie algebra of the internal symmetry group.
fields A list of Field objects representing the field content of the theory. Methods: invariants Returns a basis of operators, encapsulated in an EFT.Invariants object. These can be directly printed (implement str ). They have a method count to calculate the total number of operators in the basis, and a method show by classes, which returns a simplified string representation of the basis, provided a dictionary whose keys are the fields and values are strings representing classes of fields.
covariants Returns a collection of all operators with all possible irreps, in the form of a EFT.Covariants instance. Its only purpose is to hold the information until it is printed (implements str ).
Both receive the same arguments: max dimension, the maximum dimension of the operators computed; use eom (default: True) a boolean to specify whether the equations of motion should be used; ignore lower dimension (default: False), a boolean to specify whether operators with dimension less than max dimension should be included in the results; and verbose (default: False), a boolean enabling/disabling messages about the progress of the calculations.

Other
The following irreps of the Lorentz group have been defined, for ease of use: scalar, L spinor, R spinor, vector, L tensor, R tensor. L spinor and R spinor correspond to left and right Weyl spinors, respectively. L tensor and R tensor correspond to the left and right parts of an antisymmetric tensor with two indices. The statistics of a field can be specified by using the variables boson and fermion, which are set to the values BOSON and FERMION of the enum class Statistics from the module statistics.

The smeft module
The smeft module contains the definitions of all the Standard Model fields: • The Higgs doublet phi and its conjugate phic.  (1))) print ( " Number of invariants : {} ". format ( invariants . count ())) • The left and right parts GL and GR of the SU(3) field strength.
• The left and right parts WL and WR of the SU(2) field strength • The left and right parts BL and BR of the U(1) field strength.
• The quark doublet Q and its conjugate Qc.
• The lepton doublet L and its conjugate Lc.
• The up-type quark singlet u and its conjugate uc.
• The down-type quark singlet d and its conjugate dc.
• The electron singlet e and its conjugate ec.
The bosons are objects of the Field class. The fermions are functions that take the number of generations and return a Field. Similarly, the function smeft takes the number of fermion flavors and returns an EFT object representing the SMEFT. The algebra su 3 ⊕ su 2 is named sm internal algebra. A dictionary named sm field classes is included, to simplify the presentation of the results by passing it as an argument to the method show by classes of an EFT.Invariants object. Listing 3 contains an example script for the computation of bases of arbitrary dimension (passed as an argument to the script) for the 1-generation SMEFT. It gives 84 operators for dimension 6 (in about 3 seconds in a personal computer with a 2,6 GHz Intel Core i5 processor) and 993 operators for dimension 8 (in around 40 seconds in the same computer).

Conclusions
BasisGen computes bases of operators for effective field theories in a general setting: the internal symmetry group can be any product of a semisimple group and an arbitrary number of U(1) factors. 4-dimensional Lorentz invariance is assumed to provide support for concrete applications, although adaptations to other spacetime dimensions can be easily made, due to the generality of the core functionalities.
The decision of using the equations of motion is left to the user, as it may be convenient to work with redundant bases in some cases (see ref. [5]). It is also possible not only to compute invariants but to generate all covariant operators, classified by their irreps. This can be useful, for example, to find the representation of fields that couple linearly to an already known theory, which are often the most relevant ones for phenomenology [32][33][34][35][36]. An interface for doing basic operations with representations of semisimple groups is also provided.
BasisGen's speed for large numbers of fields and high-dimensional operators makes it possible to calculate bases for the SMEFT or for other effective theories for physics beyond the Standard Model, in times ranging from seconds (for the dimension-8 operators in the SMEFT) to minutes (for higher-dimensional operators or larger number of fields) in personal computers.