1 Introduction

A central problem in alloy theory is the ab-initio treatment of partially disordered systems, i.e. systems that exhibit different degrees of short-range order (SRO). In general, the problem has been tackled with a good measure of success by introducing simplifications either in the description of the electronic structure, e.g. by means of a tight-binding Hamiltonian, or in the state of disorder of the alloy, e.g. by assuming complete randomness.[1] In particular, the single site coherent potential approximation (CPA)[2] and the special quasirandom structures (SQS) method[3] are two approaches that fully incorporate ab-initio density functional theory (DFT) into the description of the electronic structure of random alloys. However, by construction, both methods explicitly exclude SRO effects.

An approach that aims at incorporating SRO into the ab-initio treatment of disordered alloys is the Connolly-Williams (CW) method,[4] which is also referred to in the literature as the Cluster Expansion (CE) method.[5] We state at the outset, however, that one of the objectives of this work is to establish a clear distinction between the CW method and a properly implemented CE method. Nevertheless, the methodology has been widely used in conjunction with DFT calculations in order to predict and model physical and thermodynamic properties in a variety of disordered systems.

One of the central difficulties of incorporating SRO in ab-initio theories of alloys is the fact that the description of the state of order (or disorder) of a binary or multicomponent system requires, in principle, an infinite number of configurational variables. Thus, having a robust theory capable of describing partial order in alloys is essential to develop useful approximations to the problem. One such formal framework, which provides a full description of the state of order, follows straightforwardly from the construction of a complete and orthonormal basis in configurational space.[5] The existence of such a complete basis set provides a solid conceptual framework to describe functions of configurations in general and, in particular, to describe the energy of formation of alloys in terms of pair and multisite correlation functions and their corresponding Effective Cluster Interactions (ECIs).[5]

The use of complete sets of correlation functions was initially introduced to describe the probabilities of finite clusters appearing in the Cluster Variation Method (CVM) free energy functional.[6,7] Such studies were aimed at the calculation of prototype alloy phase diagrams based on phenomenological Ising-like models for the description of the energy, and on the CVM for the treatment of the configurational entropy.[7-10] These early phenomenological studies of alloy phase diagrams led Connolly and Williams to propose the use of DFT total energy calculations in order to obtain Ising-like ECIs. In particular, these authors used the energy of formation of three fcc ordered compounds—A3B, AB and AB3 in the L10 and L12 structures—in order to obtain the ECIs for nearest neighbor pair, triangle and tetrahedron. Thus, in this seminal contribution, Connolly and Williams established an important link between DFT total energy calculations and phenomenological alloy models.[4]

An alternative set of basis functions was subsequently introduced with the objective of describing alloys configurations with fixed concentrations in the canonical ensemble.[11,12] These basis sets are, in fact, members of an infinite family of closely related basis sets, which include the one introduced in Ref 5, and that, under a specific definition of the scalar product, are complete and orthogonal in the full configurational space.[13,14] Thus, as we point out in section 2.1, a CE cannot be carried out in the canonical ensemble since completeness of the basis functions, which is central to the theory, holds only when the scalar product is defined in the space with 2N configurations; i.e. in the grand canonical ensemble.

An obvious consequence of the completeness of the different basis sets is that they are all related by a linear transformation and, therefore, a CE expansion using a given basis should be strictly equivalent to a CE using a different one. However, the fact that we can work with an infinite number of basis sets provides some distinct advantages in actual implementations of the CE. One such advantage is that we can account in a relatively straightforward manner for the concentration dependence of the equilibrium volume of the alloy, which gives rise to concentration dependent ECIs and to a much faster convergence of the expansion relative to the CW method. In particular, an approach that has been referred to as the Variable Basis Cluster Expansion (VBCE)[14] describes the energies of an alloy with fixed concentration in terms of basis functions defined in such a manner that their expectations values (or averages over the crystal) vanish in the random state with the same concentration. This distinguishes the VBCE from the usual CW method since, in the latter, the expectation values of the basis functions vanish in the disordered state only at the equiatomic concentration (50/50). Furthermore, the ECIs obtained using the CW method are constant, while in the VBCE the ECIs are concentration dependent. We will return to this topic in subsequent sections since one of the objectives of this work is to compare the relative convergence of the truncated VBCE and the CW method.

The CE of the energy of alloys effectively separates the configurational degrees of freedom from the structural degrees of freedom. The configurational degrees of freedom are, of course, described in terms of the complete configurational basis, while the structural degrees of freedom (such as the alloy volume, unit cell parameters and unit cell internal relaxations) are described by the ECIs. As an example, consider the cases in which the volume is the only structural relaxation present (or allowed). In such a case the ECIs will be functions of the equilibrium volume that, in turn, depend on concentration and, to a lesser degree, on the state of configurational order of the system. It follows that the ECIs will be functions of concentration and of the degree of order, a dependence that is obviously absent in implementations of the CW method that, by construction, result in Ising-like descriptions of the energy with constant ECIs. The issue is compounded in applications of the CW method that use the energies of totally relaxed structures and ascribe constant ECIs to the system. Thus, the CW method is a phenomenological fit to the energies of a set of ordered compounds with an Ising-like model and, as such, might be considered a zeroth order approximation to the energy of disordered systems. In contrast to the CW method, correct implementation of the CE requires that the ECIs be obtained (or inferred) using procedures that recognize that they are projections of the energy of the alloy defined on a fixed lattice framework, i.e. without relaxations, and that such projections are done using a specific definition of the scalar product in configurational space.[5,13,14]

As mentioned, the obvious advantage of the VBCE over the commonly use CW method is that the VBCE naturally captures the dominant concentration dependence of the equilibrium volume through the concentration dependence of the ECIs.[14] In its simplest form, the VBCE can be implemented by fitting a set of ECIs, at each concentration, to the energies of sets of compounds at their equilibrium volume, all with the same concentration. This approach obviously neglects the dependence of the equilibrium volume on the remaining configurational variables but it represents, nonetheless, a significant improvement over the standard CW method. In that sense, we could think of the VBCE as a first order approximation to the energy of disordered alloys.

A small variation to the VBCE described above would be to fit the ECIs to the energies of a set of compounds, all of the same concentration, calculated at the volume of the random alloy. The volume of the random alloy could be determined, for example, by doing a cluster expansion of the equilibrium volumes of the same set of compounds prior to doing the cluster expansion of the energy. This approach, which also neglects the dependence of the equilibrium volume on other configurational variables, should result in ECIs sufficiently accurate to describe the energy of the alloy at high temperatures.

In order to go beyond the VBCE and incorporate the full configurational dependence of the volume of the alloy, and hence of the ECIs, it is necessary to do the CE in conjunction with some approximation to the free energy of the system, such as Monte Carlo or the Cluster Variation Method, and determine the ECIs self-consistently. At a given concentration and temperature, one would: (1) start with an input value of the equilibrium volume (at high temperatures that of the random alloy would be a good guess), (2) proceed to do the CE of the energy (in any basis) to obtain the ECIs in the neighborhood of the initial guess volume, and (3) calculate the free energy and obtain the equilibrium volume for a given external pressure corresponding to the input ECIs. The process can be repeated until self-consistency is achieved. Thus, in general, the ECIs will depend on concentration and on all configurational variables and, through them, on temperature. These dependences are, in fact, dictated by the very nature of the configurational energy that can be shown to be a homogeneous function of degree one in the correlation functions, with the ECIs being the Euler derivatives of the energy with respect to the correlation functions.[14]

The organization of the paper is as follows. We begin with a general review of the CE in section 2. In section 2.1 we introduce the basis sets and develop the theory to show that proper implementation of the CE requires that the scalar product be carried out in the full configurational space, i.e. in the grand canonical ensemble and not in the canonical ensemble. The space group symmetry of the undecorated lattice and of the ECIs is addressed in section 2.2. The main purpose of this section is to draw a clear distinction between the CW and the CE methods since, in the latter, the ECIs are to be determined as a function of volume from the energies of unrelaxed structures. In section 2.3 we show that the configurational energy is a homogeneous function of degree one in the correlation functions and that the ECIs are the corresponding Euler derivatives. Section 2.4 dwells briefly on some practical aspects related to the inversion of the CE in order to obtain the ECIs and, in section 3, we implement the different versions of the expansions for the Mo-Ta system with the goal of assessing their relative convergence and predictive capabilities. Summary comments are presented in section 4.

2 The Cluster Expansion Method

For completeness we briefly review the formal aspects of the cluster expansion for the case of binary alloys. Extension to multicomponent systems is conceptually straightforward and it has been treated in the CE literature (see for example Ref 5). We begin this section by defining the basis functions in configurational space. Following the basic definitions, we explicitly consider the symmetry of the undecorated lattice (i.e. the lattice without distinction between different atoms) and show that in the thermodynamic limit (N  ) the energy is a homogeneous function of degree one in a set of extensive variables related to the correlations functions (or expectation values of the basis functions). As mentioned in the Introduction, it follows from Euler’s homogeneous function theorem that the ECIs are the Euler derivatives of the energy with respect to the correlation functions. Thus, this simple and basic property of the energy of alloys shows that the ECIs will not only depend on concentration but will also depend on the full state of order in the system.

2.1 Basis Functions

The configurational space of a binary system, with 2N configurations where \(N\) is the number of lattice sites, is spanned by a vector of the form \(\vec{\sigma } = \left\{ {\sigma_{1} ,\sigma_{2} , \ldots ,\sigma_{N} } \right\}\) where \(\sigma_{p}\) equals 1 if lattice site \(p\) is occupied by an \(A\) atom and 1 if it is occupied by a \(B\) atom. We note that in some applications, such as the CVM, \(N\) is finite and equal to the number of sites in the largest cluster considered while for other applications, such as the CE of the energy of a binary system, \(N \to \infty .\)

The basis functions are central to the theory and they are defined as:[13,14]

$$\Phi _{0}^{x} \left( {\vec{\sigma }} \right) = 1$$
(1)
$$\Phi _{\alpha }^{x} \left( {\vec{\sigma }} \right) = \mathop \prod \limits_{p \in \alpha } \frac{{\left( {\sigma_{p} - x} \right)}}{{\sqrt {1 - x^{2} } }}$$
(2)

where the constant function \(\Phi _{0}^{x}\) corresponds to the “empty” cluster (\(\alpha = 0\)) and the remaining functions \(\left\{ {\Phi _{\alpha }^{x} } \right\}\) are defined over all clusters \(\alpha\) in the crystal. The total number of clusters in the lattice, including the empty cluster, is \(2^{N}\) and they can be labeled by a set \(\left\{ {\eta , p, \nu } \right\}\), where \(\eta\) stands for the type of cluster (e.g., nn pair, a given triangle, etc.), \(p\) refers to the location of the cluster in the lattice (e.g., the location of its center of mass), and \(\nu\) labels the orientation (or star) of the cluster. The particular form of the basis functions is chosen so that their expectation values vanish for the random alloy when the atomic concentration is such that \(c_{A} - c_{B}\) equals \(x\), and the denominator in Eq 2 follows from the condition that the norm of the basis functions be equal to 1 under the following definition of the scalar product:[13,14]

$$< f,g >_{x} = \mathop \sum \limits_{{\vec{\sigma }}} x_{A}^{{N_{A} }} x_{B}^{{N_{B} }} f\left( {\vec{\sigma }} \right)g\left( {\vec{\sigma }} \right),$$
(3)

where the sum is carried out over all \(2^{N}\) configurations,\(f \left( {\vec{\sigma }} \right)\) and \(g\left( {\vec{\sigma }} \right)\) are two general functions of configuration, \(N_{A}\) and \(N_{B}\) are, respectively, the number of \(A\) and \(B\) atoms in configuration \(\vec{\sigma }\), and \(x_{A} = 1 - x_{B} = \left( {1 + x} \right)/2\). We note that \(x_{A} \left( {x_{B} } \right)\), which labels the basis set, corresponds to the concentration of \(A\left( B \right)\) atoms at which the expectation values of the basis functions vanish in the random state. For the particular value of \(x = 0\) (i.e. \(x_{A} = x_{B}\) ), the basis functions defined by Eq 2 are the basis functions introduced in Ref 5.

It follows from the definition of the scalar product that the functions defined by Eq 1 and 2 form a complete and orthogonal basis in configurational space. Orthogonality is expressed by \(<\Phi _{\alpha }^{x} ,\Phi _{\beta }^{x} >_{x} = \delta_{\alpha , \beta }\) and completeness by:[14]

$$x_{A}^{{N_{A} }} x_{B}^{{N_{B} }} \mathop \sum \limits_{\alpha }\Phi _{\alpha }^{x} \left( {\vec{\sigma }} \right)\Phi _{\alpha }^{x} \left( {\overrightarrow {{\sigma^{,} }} } \right) = \delta_{{\vec{\sigma },\vec{\sigma }^{{\prime }} }} .$$
(4)

The completeness of the basis functions imply that we can write any function of configuration, e.g. the configurational energy \(E\left( {\vec{\sigma }} \right)\), as:

$$E\left( {\vec{\sigma }} \right) = \mathop \sum \limits_{\alpha } j_{\alpha }^{x} \Phi _{\alpha }^{x} \left( {\vec{\sigma } } \right)$$
(5)

where the coefficients \(j_{\alpha }^{x}\) are projections of the function \(E\left( {\vec{\sigma }} \right)\) onto the basis functions:

$$j_{\alpha }^{x} = < E,\;\Phi _{\alpha }^{x} >_{x} = \mathop \sum \limits_{{\vec{\sigma }}} x_{A}^{{N_{A} }} x_{B}^{{N_{B} }} E(\vec{\sigma }) \Phi _{\alpha }^{x} \left( {\vec{\sigma }} \right).$$
(6)

The scalar product in Eq 6 can also be written as:

$$j_{\alpha }^{x} = < E,\Phi _{\alpha }^{x} >_{x} = \mathop \sum \limits_{{N_{A} = 0}}^{N} \left[ {G\left( {x_{A} ,c_{A} ,N} \right)} \right]^{N} (E,\Phi _{\alpha }^{x} )_{{c_{A} }} ,$$
(7)

with the function \(G\left( {x_{A} , c_{A} , N } \right)\) given by:

$$G\left( {x_{A} ,c_{A} ,N} \right) = x_{A}^{{c_{A} }} x_{B}^{{c_{B} }} \left[ {\frac{N!}{{N_{A} ! N_{B} !}}} \right]^{{\frac{1}{N}}}$$
(8)

and where \((E,\Phi _{\alpha })\) stands for the scalar product in the canonical ensemble defined as:

$$\left( {E,\Phi _{\alpha }^{x} } \right)_{{c_{A} }}= \frac{1}{W_N(c_A)} \sum_{\vec \sigma | c_A} E(\vec \sigma) \Phi_\alpha^x(\vec \sigma)$$
(9)

In Eq 9, the sum is restricted to the \(W_{N} \left( {c_{A} } \right) = N!/\left( {N_{A} ! N_{B} !} \right)\) configurations in the canonical ensemble with concentration \(c_{A}\).

It has been incorrectly stated that the scalar products in the grand canonical and in the canonical ensembles are equal when \(N \to \infty\).[13,14] However, although in the thermodynamic limit \(G_{N} \left( {x_{A} ,c_{A} ,N } \right) \to \delta_{{c_{A} ,x_{A} }}\), it is straightforward to show that to the power \(N\) it goes to zero as \(1/\sqrt {N }\). Therefore, the scalar product of Eq 7 involves contributions from all concentrations in the space of \(2^{N}\) configurations. Thus, explicitly:

$$< E,\Phi _{\alpha }^{x} >_{x} \ne \left( {E,\Phi _{\alpha }^{x} } \right)_{{c_{A} }}$$
(10)

We note that the CE can be shown to be a transformation that maps the \(2^{N}\) configurations \(\sigma\) into the \(2^{N}\) basis functions \(\left\{ {\Phi _{\alpha }^{x} } \right\}\).[14] For the basis with \(x = 0\), the CE expansion is in fact the well-known Hadamard transformation. Thus, carrying out the CE in the canonical ensemble is incorrect since, among other things, a set of \(2^{N}\) basis functions \(\left\{ {\Phi _{\alpha }^{x} } \right\}\) cannot be mapped into the \(N!/\left( {N_{A} ! N_{B} !} \right)\) configurations \(\sigma\) in the canonical ensemble. In fact, if restricted to the canonical ensemble, the basis set \(\left\{ {\Phi _{\alpha }^{x} } \right\}\) is not complete, which is the important property of the basis rather than the convenience of its orthogonality.

The sum in Eq 5 for the configurational energy runs over all 2N clusters in the lattice and, although quite general and applicable to both crystalline or non-crystalline lattices, it is not particularly useful unless we bring the full symmetry of the undecorated lattice into the picture. We will address this point in the next subsection where we also underscore the fact that the expansion coefficients, given by Eq 6, should be calculated for configurations defined on a fixed (i.e. unrelaxed) undecorated lattice.

Most implementations of the expansions to date use constant ECIs to described fully relaxed structures with different concentrations and, as such, they do not conform to the CE as defined by Eq 5 and 6. Such applications are, in fact, fits of DFT total energy calculations to a phenomenological Ising-like model as first proposed by Connolly and Williams.[4] Nevertheless, as it has been extensively documented in the CE literature, such phenomenological Ising-like models work generally well in the sense that they can accurately describe the energies of formation of some prescribed set of ordered compounds. However, as we show in the next section, proper implementation of the CE method reveals that the configurational energy of a periodic system is a homogeneous function of degree one in the correlation functions, with the ECIs being the corresponding Euler derivatives. Thus, at least from the formal and conceptual point of view, an Ising-like model with constant ECIs is clearly an oversimplification of the energy of real alloys.

2.2 Cluster Expansion for Periodic Systems

The CE is almost invariably used to describe functions of configuration for disordered or partially ordered crystalline systems, i.e. alloys with varying degrees of short-range order. If the configurations are defined on a fixed lattice framework, the expansion coefficients \(j_{\alpha }^{x}\), given by the projections of Eq 6, will be invariant to the space group symmetry operations of the undecorated lattice.[14] The symmetries of the coefficients \(j_{\alpha }^{x}\) are critically important since they allow us to reduce the expansion of Eq 5 to a tractable form and to identify the dependence of the configurational energy on a full (or complete) set of extensive variables. Namely, using the symmetry of the expansion coefficients, the CE of Eq 5 can be written as:

$$E\left( {\vec{\sigma }} \right) = J_{0}^{x} + \mathop \sum \limits_{\eta > 0} \omega_{\eta } J_{\eta }^{x} z_{\eta }^{x} \left( {\vec{\sigma }} \right)$$
(11)

where the first term in the right hand side corresponds to the empty cluster and the sum over \(\eta\) is now carried out only over all different type of clusters. The coefficients \(\omega_{\eta }\) are the number of clusters of type \(\eta\) per lattice point, and the \(J_{\eta }^{x}\) are the ECIs that are trivially related to the expansion coefficients of Eq 6 as follows:

$$J_{0}^{x} = j_{0}^{x}$$
(12)
$$J_{\alpha }^{x} = N j_{\alpha }^{x}$$
(13)

In Eq 11, the correlation functions \(z_{\eta }^{x} \left( {\vec{\sigma }} \right)\) are averages over the crystal of the basis functions and are given by:

$$z_{\eta }^{x} \left( {\vec{\sigma }} \right) = \frac{1}{{N \omega_{\eta } }} \mathop \sum \limits_{p} \mathop \sum \limits_{\nu }\Phi _{{\left\{ {\eta ,p,\nu } \right\}}}^{x} \left( {\vec{\sigma }} \right)$$
(14)

where the sums are carried out over all distinct locations \(p\) and orientations \(\nu\) of clusters of type \(\eta\).

The procedure allowing the transition from Eq 5 to the actually useful expression for the energy, Eq 11, is only possible because the expansion coefficients \(j_{\alpha }^{x}\) are invariant under the symmetry operations of the undecorated lattice. This fact, in turn, implies that the ECIs are different for alloys of different volumes or for alloys that are defined on different undecorated lattices due to relaxations. That is, the CE effectively separates the configurational degrees of freedom in the energy, which are accounted for by the \(z_{\eta }^{x} \left( {\vec{\sigma }} \right)\), from the structural degrees of freedom that must be incorporated into the \(J_{\eta }^{x}\).

2.3 Effective Cluster Interactions and Euler Derivatives

As shown by Eq 14, the correlation functions are specific versions of a set of extensive variables \(X_{\eta } \left( {\vec{\sigma }} \right) = N z_{\eta }^{x} \left( {\vec{\sigma }} \right)\). Furthermore, the CE of Eq 11 shows that the total configurational energy, \(E_{N} = N E\), is an extensive and homogenous function of degree one in the extensive variables \(\vec{X} = \left\{ {X_{\eta } } \right\}\), with \(X_{0} = N\), and where, for easy of notation, we do not make explicit the dependence of the extensive variables on the basis label \(x\). Thus, from Euler’s homogeneous function theorem, the total configurational energy is given by:

$$E_{N} \left( {\vec{\sigma }} \right) = \mathop \sum \limits_{\eta \ge 0} \left( {\frac{{\partial E_{N} }}{{\partial X_{\eta } }}} \right)_{{\vec{X}}} X_{\eta } \left( {\vec{\sigma }} \right).$$
(15)

which, in view of the fact that the \(\left\{ {X_{\eta } } \right\}\) form a complete set of configurational variables, allow us to identify the ECIs with the Euler derivatives as follows:

$$J_{0}^{x} = \left( {\frac{{\partial E_{N} }}{N}} \right)_{{\vec{X}}}$$
(16)
$$\omega_{\eta } J_{\eta }^{x} = \left( {\frac{{\partial E_{N} }}{{\partial X_{\eta } }}} \right)_{{\vec{X}}} = \left( {\frac{\partial E}{{\partial z_{\eta }^{x} }}} \right)_{{\vec{z}^{x} }}$$
(17)

Note that the Euler derivatives in Eq 15 are to be evaluated in the state of configurational order specified by the correlation functions. Thus, such derivatives, and consequently the ECIs in the CE, will naturally depend on the full set of correlation functions, which once again clearly underscores the limitations of using an Ising-like model to represent the configurational energy of alloys.

2.4 Inversion of the Cluster Expansion

A common application of the CE consists of fitting the ECIs for a predetermined set of \(m\) clusters to the energies of a set of \(n\) ordered structures. Irrespective of which basis functions are used, the problem can be cast in the form of a linear relation between a vector \(\overrightarrow{E}\) that collects the energies of the \(n\) compounds, a matrix \(\varvec{Z}^{x}\) of dimension \(n \times m\) that characterizes the configurations of the \(n\) structures in the basis of choice, and a vector \(\overrightarrow{V}\) of length \(m\) that contains all relevant information of the expansion coefficients for the selected clusters. Thus:

$$\vec{E} = \varvec{Z}^{x} \vec{V}.$$
(18)

The rows of the matrix \(\varvec{Z}^{x}\) will be denoted by \(\vec{Z}_{i}^{x}\) and they are given, for each of the \(n\) structures, by the \(m\) correlations functions of the selected clusters. The components of the vector \(\vec{V}\) are then the CE coefficients \(\omega_{\eta } \,J_{\eta }^{x}\).

A number of different approaches have been implemented in order to optimize the fitting of the ECIs.[15-29] Although in most cases these implementations were aimed at fitting constant Ising-like ECIs to the energies of a set of training structures, they have provide significant insight into the fitting procedures. In general, the different implementations of the inversion of Eq 18 can be formulated as a regularized least square problem of the form:

$$\vec{V} = \arg \mathop {\hbox{min} }\limits_{{\vec{V}}} \left[ {\mathop \sum \limits_{i = 1 }^{n} w_{i} \left( {E_{i} - \vec{Z}_{i}^{x} \cdot \vec{V} } \right)^{2} + R\left( {\vec{\lambda },\vec{V}} \right)} \right]$$
(19)

where the \(w_{i}\) assign, if so desired, different weights to the fit for each structure and where \(R\left( {\vec{\lambda }, \overrightarrow {V } } \right)\) plays the role of a Tikhonov regularization function.[30] The use of some form of Tikhonov regularization is necessary when the least square minimization procedure is ill-posed, as it is the case when the \(n \times m\) matrix \(\varvec{Z}^{x}\) is such that the dimension of its null-space is different from zero.[14]

The form of the regularization function in Eq 19 is chosen in order to impose, for example, a particular decay of pair ECIs with distance,[28] or to endow the ECIs with specific characteristics that are inferred from physical intuition. A particularly elegant approach was proposed by Mueller and Ceder[29] who used Bayesian statistics to construct a prior probability, in the usual Gaussian approximation, based on assumptions about the general properties of the ECIs, and determined the ECIs by maximizing the posterior Bayesian probability.

In order to implement the VBCE we use a form of the regularization function that ensures a smooth dependence of the ECIs with concentration, without excessively biasing the least square fit of the energy. This is achieved by imposing a penalty in the least square fitting to big differences in the derivatives of the ECIs with respect to concentration between two adjacent points in concentration. Thus, the function \(R\left( {\lambda , \vec{V}} \right)\) is of the form:

$$R\left( {\lambda ,\vec{V}} \right) = \lambda \mathop \sum \limits_{\eta = 1}^{m} \mathop \sum \limits_{{\left\{ {c_{i} } \right\}}} \left[ {\left( {\frac{{\Delta V_{\eta } }}{{\Delta z_{1}^{0} }}} \right)_{i} - \left( {\frac{{\Delta V_{\eta } }}{{\Delta z_{1}^{0} }}} \right)_{i + 1} } \right]^{2}$$
(20)

where \(\lambda\) is a regularization parameter and \(\left( {\Delta V_{\eta } /\Delta z_{1}^{0} } \right)_{i}\) stands for the derivative of \(V_{\eta }\) the with respect to the point correlation function \(z_{1}^{0}\) at concentration \(c_{i}\).

3 Application to Mo-Ta

In this section we implement different versions of the CE using the Mo-Ta system as an example. The Mo-Ta system is chosen, primarily, because it has been the subject of several previous studies,[28,31,32] which provides a good base line to compare the CW method with different implementations of the CE. Total energy calculations of an initial set of 653 unrelaxed structures were carried out using the WIEN2K[33] version of the Full Potential Linearized Augmented Plane Wave (FPLAPW) method, with the exchange-correlation functional treated in the generalized gradient approximation of Perdew-Burke-Ernzerhof.[34] The energy cut-off for the plane wave basis was set at \(R K_{max} = 10\) and the muffin-tin radii were set at \(R_{Mo} = 2.3\) au and \(R_{Ta} = 2.5\) au. The energies of formation of the unrelaxed structures were fitted to a four parameter Murnaghan equation of state.[35]

We note that the number of structures used in this study is sufficiently large so that we can choose a wide range of ECIs without running into an ill-posed inversion problem. Thus, in general, the inversion of the CE can be done without using Tikhonov-like regularization schemes, which invariably bias the ECIs in ways that cannot be fully justified with the available information. However, we do build some basic assumptions regarding the ECIs that can be justified, at least partially, a posteriori. In particular, we assume that: i) the ECIs become less important as their number of points increases and their compactness decreases; ii) if a given multisite cluster is included in the expansion so are all of its sub-clusters; and iii) the concentration dependence of the ECIs is smooth. For compactness we use a simple measure given by the normalized sum of the lengths of all pairs in the cluster. The inclusion of all sub-clusters of a given cluster included in the expansion follows, of course, from the first assumption above and it is in agreement with the cluster selection rules proposed in Ref 23. As mentioned in section 2.4, when implementing the VBCE we impose a condition of smoothness with concentration on the ECIs by using the regularization function given by Eq 20.

Before proceeding with the implementation of different versions of the CE we reemphasize that the energies were calculated for unrelaxed structures and, therefore, all results in this section apply to the Mo-Ta alloy in its high temperature bcc phase and do not incorporate the effects of possible atomic displacements from the equilibrium lattice sites.

We begin our analysis by revisiting the energy expansion carried out by Blum and Zunger for the Mo-Ta system.[28] These authors used an elaborate procedure to fit the energies of 56 totally relaxed ordered compounds with constant ECIs. A very good fit to the energies, with cross validation scores of less that 4 meV/atom, were obtained with the first eight pairs and with multisite interactions consisting of four triangles and a single four-points cluster (see Table 1).[28] Although the optimum set of multisite clusters used in the fitting does not conform to the compactness and sub-cluster inclusion rules mentioned above we use, for the purpose of comparison, the same choice of structures and clusters as Blum and Zunger. As seen in Fig. 1, a very good fit to the 56 structures is obtained with a CW expansion, which results in a root mean square (RMS) error of 2.3 meV/atom, a leave-one-out cross validation score of 3.9 meV/atom, and a maximum fitting error of 6.4 meV/atom. Furthermore, and despite the fact that we are using the energies of unrelaxed structures, we obtain values of the ECIs, shown in Fig. 2, that are close to those reported in Ref 28.

Table 1 Effective Interactions and cluster coordinates in the bcc lattice used in Ref 28 for the expansion of energy in the Mo-Ta system
Fig. 1
figure 1

Energies of formation for the 56 Mo-Ta structures used in Ref 28. Open circles are the energies of the unrelaxed structures calculated from first principles and the filled circles are obtained from a Connolly-Williams expansion using the first 8 pairs plus the clusters defined in Table 1

Fig. 2
figure 2

Effective cluster interactions for the first 8 pairs and the clusters defined in Table 1 obtained using the Connolly-Williams method for the 56 unrelaxed structures used in Ref 28

The situation is less satisfactory if we use the ECIs thus obtained to calculate the energies of all the structures in our pool of 653 compounds. In such a case we find that the maximum error between the DFT energies and those predicted by the CW method exceeds 30 meV/atom, which is about one order of magnitude larger than the maximum error one could infer from the results of the expansion. Although not particularly surprising, this result points to the fact that the convergence of the CW method is generally slower than it is usually assumed.

To address the issue of convergence we implemented both the CW method and the VBCE for the full set of 653 structures. In each case, the optimum set of clusters for the expansion is obtained by systematically increasing the number of ECIs until no further improvement in the fitting is achieved. For the CW method, a stable fitting can be achieved with 60 pairs, 15 triangles, 20 four-points, 6 five-points and 2 six-points clusters, which result in a RMS error of 1.81 meV/atom, a cross validation score of 2.21 meV/atom, and a maximum error of 6.55 meV/atom. A comparison between the energies calculated from first principles and those obtained with the CW method reveal a good agreement, as seen in Fig. 3. On the other hand, the poor convergence of the ECIs for the CW method is illustrated in Fig. 4 for the pair interactions, which show a persistent oscillation but no clear evidence that higher order pairs become less important.

Fig. 3
figure 3

Comparison between the energy of formation of 653 unrelaxed structures obtained from first-principles (open circles) and from the Connolly-Williams method (filled circles) using 60 pairs, 15 triangles, 20 four-points, 6 five points and 2 six points clusters

Fig. 4
figure 4

Effective cluster interactions \(\left( {\omega_{\eta } J_{\eta } } \right)\) for 59 pairs obtained from the Connolly- Williams method applied to 653 unrelaxed structures of Mo-Ta. The nearest-neighbor pair interaction, \(4 J_{2,1} = 111.66\) meV, is not included in the figure

Using the VBCE we can achieve an improved fitting of the same set of 653 structures with only 10 pairs and 4 triangles and a regularization parameter \(\lambda = 0.5\) (see Eq 20). The values of the mean square error, leave-one-out cross validation score, and maximum error are 0.90, 1.02 and 3.57 meV, respectively. Thus, by including the concentration dependence of the ECIs, which itself derives from the concentration dependence of the equilibrium volume of the alloy, the VBCE converges significantly faster than the CW method. The results of the VBCE are compared to the first principles calculations in Fig. 5, and the ECIs for all clusters included in the expansion are shown in Fig. 6, 7, 8 and 9. Of note is the strong concentration dependence of the ECIs in the VBCE representation of the energy of formation.

Fig. 5
figure 5

Comparison between the energy of formation of 653 unrelaxed structures of Mo-Ta obtained from first-principles (open circles) and from the Variable Basis Cluster Expansion (filled circles) using 10 pairs and 4 triangles

Fig. 6
figure 6

Effective cluster interactions for the empty cluster (open circles) and the nearest- neighbor pair (full circles) in the VBCE using 653 unrelaxed structures of Mo-Ta. The ECI for the empty cluster corresponds to the energy of the random alloy

Fig. 7
figure 7

Effective cluster interactions for pairs in the VBCE using 653 unrelaxed structures of Mo-Ta. Shown are the ECIs (\(\omega_{\eta } J_{\eta }^{x}\)) for second (open circles), third (filled circles), fourth (open squares) and fifth (filled squares) pairs

Fig. 8
figure 8

Effective cluster interactions for pairs in the VBCE using 653 unrelaxed structures of Mo-Ta. Shown are the ECIs (\(\omega_{\eta } J_{\eta }^{x}\)) for sixth (open circles), seventh (filled circles), eight (open squares), ninth (filled squares), and tenth (filled triangles) pairs

Fig. 9
figure 9

Effective cluster interactions for triangles in the VBCE using 653 unrelaxed structures of Mo-Ta. Shown are the ECIs (\(\omega_{\eta } J_{\eta }^{x}\)) for first (open circles), second (filled circles), third (open squares), and fourth (filled squares) most compact triangles in the bcc lattice

We next address the issue of the predictive power of these two expansions. To that effect, we use the ECIs obtained from the CW method and the ECIs obtained from the VBCE to determine the energies of all ordered compounds with up to 12 atoms per unit cell, a total of 10,850 structures. We then identify those structures for which the difference in the energies predicted by the two expansions is larger than the maximum fitting error of each expansion (6.55 meV/atom for the CW method and 3.57 meV/atom for the VBCE). Obviously, these structures are such that either the CW method or the VBCE fail to predict their energies within the expected accuracies of the expansions. We find a relatively small number of structures (117 out of 10,850) for which the energies predicted by the two expansions differ by more than 6.55 meV/atom. In order to assess the predictive power of each expansion we calculated the energies of these structures from first-principles and compared them with the energies predicted by the expansions. The result of the comparison is shown in Fig. 10. From the figure we see that, for about half of the structures, the difference between the energy predicted by the CW method and the one obtained from first principles exceeds the expected maximum error of 6.55 meV (gray dashed line in the figure). On the other hand, most the energies predicted by the VBCE and the ones calculated from first principles are within the expected maximum error of 3.57 meV/atom (black dashed line).

Fig. 10
figure 10

Difference between the energies calculated from first-principles and those predicted by the Connolly-Williams method (gray) and the Variable Basis Cluster Expansion (black) for structures not included in the expansions. The horizontal lines correspond to the maximum fitting errors of the 653 structures used in the implementation of the CW method (gray) and the VBCE (black)

Our final example is the determination of the ECIs for the Mo-Ta system as a function of volume. As mentioned in the introduction, the dependence of the equilibrium volume on the degree of order in the system implies that the ECIs, which are in fact the Euler derivatives of the configurational energies, will be functions of all the correlation functions. Thus, at finite temperatures, the ECIs need to be determined self-consistently by some procedure that involves the minimization of the configurational free energy. Here we will limit the determination of the ECIs to the random state (T >> 0) and to the configurations of the ordered compounds used in the expansion. The volume of the random alloy is obtained, in turn, by doing a CE of the equilibrium volume of the alloy using the equilibrium volumes of the ordered compounds at zero pressure.

We carry the CE expansion with the same 10 pairs and 4 triangles used in the previous VBCE, but choose the basis functions for \(x = 0\); i.e. we use the basis of Ref 5 which is in fact the basis used in most applications of the CW method. The difference with the usual CW method is that the set of ECIs at each volume \(v\) is now determined from the energies of the 653 compounds calculated at the same volume \(v\). The ECIs resulting from the fitting are shown in Fig. 11, 12, 13, 14, 15 and 16. In these figures, the solid lines are the ECIs for the random alloy while the points correspond to the ECIs, or Euler derivatives, evaluated at the equilibrium volume of the ordered compounds used in the expansion. The different values of the ECIs at the same concentration are due to the dependence of the ECIs on the state of order of the system. We see from the figures that, for the Mo-Ta system, the dependence of the ECIs on the state of order at a fixed concentration is relatively weak. However, one might expect a stronger dependence for alloys for which the atomic size differences of the constituent elements is larger than in the Mo-Ta system.

Fig. 11
figure 11

Effective Cluster Interaction for the empty cluster in the basis of Ref 5 (x = 0). The line corresponds to the ECI, or Euler derivative with respect to N, for the random state. The open circles are the values of ECI, or Euler derivatives, in the state of order of the compounds used in the fitting

Fig. 12
figure 12

Effective Cluster Interaction for the point cluster in the basis of Ref 5 (x = 0). The line corresponds to the ECI, or Euler derivative with respect to point correlation function for the random state. The open circles are the values of ECI, or Euler derivatives, in the state of order of the compounds used in the fitting

Fig. 13
figure 13

Effective Cluster Interaction for the nearest-neighbor pair in the basis of Ref 5 (x = 0). The line corresponds to the ECI, or Euler derivative with respect to pair correlation function for the random state. The symbols are the ECIs, or Euler derivatives, evaluated in the state of order of the compounds used in the fitting

Fig. 14
figure 14

Effective Cluster Interactions for the second (open circle), third (full circle), fourth (open square), and fifth (closed square) pairs in the basis of Ref 5 (x = 0). The line corresponds to the ECI, or Euler derivative with respect to pair correlation functions for the random state. The open circles are the ECIs, or Euler derivatives, evaluated in the state of order of the compounds used in the fitting

Fig. 15
figure 15

Effective Cluster Interactions for the sixth (open circle), seventh (full circle), eight (open square), ninth (closed square), and tenth (open triangles) pairs in the basis of Ref 5 (x = 0). The line corresponds to the ECI, or Euler derivative with respect to pair correlation functions for the random state. The symbols are the values of ECI, or Euler derivative, in the state of order of the compounds used in the fitting

Fig. 16
figure 16

Effective Cluster Interactions for the first (open circle), second (full circle), third (open square), and fourth (closed square) most compact triangles in the basis of Ref 5 (x = 0). The line corresponds to the ECI, or Euler derivative with respect to correlation functions for the random state. The symbols are the values of ECI, or Euler derivative, in the state of order of the compounds used in the fitting

With the ECIs thus obtained, the CE reproduces the energies of the compounds at their equilibrium volumes with a RMS error of 2.75 meV/atom and a maximum error of 8.04 meV/atom. While the RMS error for the fit is excellent, the maximum fitting error of 8.04 meV/atom is higher than that obtained by the VBCE (3.57 meV/atom) or the CW method (6.4 meV/atom). A possible reason for the higher value of the maximum error is the fact that, for a large number of structures used in the fitting, the energies are calculated using the Murnagham equation of state for volumes that are of the order of 20% higher (or smaller) than their equilibrium values and, thus, outside the reliable extrapolation range of the equation of state. Although this issue could be addressed by calculating the energy of the compounds for a wider range of volumes, such calculations will require reducing the muffin-tin radii used in the FPLAPW method, resulting in a significant computational overhead for the large number of structures used here. A more practical solution to the problem is to do a constant volume implementation of the VBCE in which the energies of all the compounds at a given concentration are calculated for the same volume, which could be taken equal to the volume of the random alloy at that concentration. Thus, all the energies needed for the implementation of the constant volume VBCE are at volumes very close to the equilibrium volumes of the compounds, which can be reliably calculated using, for example, the Murnagham equation of state. As mentioned, implementation of the VBCE requires the use of a regularization function, such as that of Eq 20, in order to impose smoothness of the ECIs with concentration. Such regularization step is, however, a minor technical issue. The ECIs obtained from the constant volume VBCE, using the same conditions as the VBCE discussed previously (i.e. 10 pairs, 4 triangles and a regularization parameter \(\lambda = 0.5\)), are shown in Fig. 17, 18, 19 and 20 for the random state (full lines) and for the state of order corresponding to the ordered compounds used in the fitting (symbols). We see that the results for the ECIs are very close to those obtained with the standard VBCE, except that we now capture the dependence of the ECIs on the correlation functions. Furthermore, the constant volume VBCE reproduces the energy of the 653 compounds at their equilibrium volumes with a RMS error of 0.90 meV/atom, a cross validation score of 1.02 meV/atom, and a maximum error of 3.56 meV/atom.

Fig. 17
figure 17

Euler derivatives (or ECIs) for the empty cluster (open circles) and the nearest-neighbor pair (full circles) in the constant volume VBCE of the Mo-Ta system evaluated at the state of order of the structures used in the expansion. The solid lines correspond to the Euler derivatives for the random state

Fig. 18
figure 18

Euler derivatives (or ECIs) for pairs in the constant volume VBCE of the Mo-Ta system. Shown are the ECIs (\(\omega_{\eta } J_{\eta }^{x}\)) for second (open circles), third (filled circles), fourth (open squares) and fifth (filled squares) pairs evaluated at the state of order of the structures used in the expansion. The solid lines correspond to the Euler derivatives for the random state

Fig. 19
figure 19

Euler derivatives (or ECIs) for pairs in the constant volume VBCE of the Mo-Ta system. Shown are the ECIs (\(\omega_{\eta } J_{\eta }^{x}\)) for sixth (open circles), seventh (filled circles), eight (open squares), ninth (filled squares), and tenth (open triangles) pairs evaluated at the state of order of the structures used in the expansion. The solid lines correspond to the Euler derivatives for the random state

Fig. 20
figure 20

Euler derivatives (or ECIs) for triangles in the constant volume VBCE of the Mo-Ta system. Shown are the ECIs (\(\omega_{\eta } J_{\eta }^{x}\)) for first (open circles), second (filled circles), third (open squares), and fourth (filled squares) most compact triangles in the bcc lattice evaluated at the state of order of the structures used in the expansion. The solid lines correspond to the Euler derivatives for the random state

4 Conclusions

In this paper we reviewed the mathematical foundation of the cluster expansion method in order to emphasize some basic aspects of the theory. A valuable element of the theory is the fact that we can construct an infinite number of basis sets that are complete and orthogonal in configurational space under a specific definition of the scalar product. A direct consequence of the definition of the scalar product is that the completeness of the basis sets holds only in the full configurational space, which for a lattice of \(N\) points, consists of \(2^{N}\) configurations. Thus, the CE cannot be carried out in the canonical ensemble but must done in the grand canonical ensemble.

Another aspect that emerges from the theory is that the CE requires the use of unrelaxed structures in order to determine the ECIs. Such requirement simply reflects the fact that the CE separates the configurational degrees of freedom, which are described by the correlation functions, from all other structural degrees of freedom, including volume, which are incorporated into the ECIs. As we discussed in section 2.2, the use of fully relaxed structures, or even structures of different volumes, breaks the symmetry of the ECIs, which invalidates the representation of the energy in terms of the crystal averages of the basis functions. Thus, an important distinction exists between a properly implemented CE and the commonly used CW method that approximates the energy by an Ising-like model with constant ECIs.

The representation of the energy in terms of the crystal averages of the basis functions, or correlation functions, was used in section 2.3 to show that the configurational energy is a homogeneous function of degree one in the correlation functions. Thus, the ECIs are the corresponding Euler derivatives of the energy, which implies that they will depend on all configurational variables.

These observations lead us to establish a hierarchy of approximations. At the lowest level is the CW method that approximates the energy by an Ising-like model. At the next level is the VBCE that incorporates only the concentration dependence of the ECIs. Finally, the correct implementation of CE requires the determination of the ECIs as a function of the full set of correlation functions, which needs to be done self-consistently with the calculation of the configurational free energy of the system

By way of example, we implemented the different levels of approximations to the Mo-Ta system. For this system, we found that the CW method, or Ising-like model, is generally a good approximation to describe a given set of structures but has little predictive power and, in general, does not converge or it does so very slowly. On the other hand, both the VBCE and the CE that captures the full dependence of the ECIs on the correlation functions, result in a much-improved predictive power and a significantly faster convergence than the CW method. The improved convergence of the VBCE over the CW method is firmly rooted on the fact the energy is a homogeneous function of degree one in the configurational variables \(\{ z_{\eta }^{x} \}\), with the ECIs being the corresponding Euler derivatives. Consequently, the ECIs will depend on all such configurational variables and, in particular, on the concentration of the alloy. Thus, the VBCE, which explicitly includes the concentration dependence of the ECIs, results in a significantly improved description of the energy relative to that achieved with the CW method that uses constant ECIs.