1 Introduction

The increasing use of all-electron four-component methodology for relativistic effects in molecular structure calculations brings with it a need for high-quality basis sets. Several other authors have generated basis sets at the SCF level for part or all of the periodic table, either with the Dirac Hamiltonian or with the Douglas-Kroll-Hess Hamiltonian [119]. While some of these basis sets include correlating functions, a number of them lack the functions needed for valence and outer core correlation and polarization of the SCF sets. In a series of papers [2027], Dyall has focused on providing consistent basis sets of double-zeta (dz), triple-zeta (tz), and quadruple-zeta (qz) quality for the heavy elements, including correlating functions for the valence and outer core orbitals in the style of the correlation consistent basis sets [2831]. So far these basis sets have covered the 4s, 5s, 6s, and 7s blocks; the 4p, 5p, and 6p blocks; the 4d and 5d transition series; and the actinides. This paper continues the series by presenting basis sets for the lanthanides. Work is in progress on the 3d and 6d transition series and the 7p block.

The lanthanides have similar challenges to the actinides, but also have some unique features, as far as basis sets are concerned. As for the actinides, the 5d orbital is occupied for several members of the series (La, Ce, Gd, and Lu), and is low-lying for many of the early members of the series. It is in fact doubly occupied in some excited states early in the series. It must therefore be included in the basis set optimization. The 6p orbital is not occupied, but is needed for polarization of the 5d and 6s orbitals. However, unlike the actinides, this orbital must be doubly occupied in the exponent optimization to prevent it from becoming too tight and thus not serve its purpose as a polarization orbital. Correlation of the 5s and 5p shells is critical as it is for the actinides. Obviously, correlation of the 4f is critical, but since the 4f is the first shell of its symmetry, it does not have the orthogonality constraint that the 5f has in the actinides, and so is much more compact relative to the other shells. This influences the extent to which the ranges of the correlating functions can overlap. Because of the compactness of the 4f, correlation of the 4d is also important.

2 SCF basis sets

The methods used for the SCF basis set optimization have been described previously [20, 22, 32, 33]. The basis sets were optimized in Dirac-Hartree-Fock calculations using the Dirac-Coulomb Hamiltonian with the standard Gaussian nuclear charge distribution [34] for the most abundant (or stable) isotope. As for previous basis sets in this series, ℓ-optimization was employed. In the SCF optimizations, the exponents are varied only within a given angular space, i.e. for a given ℓ value, with all other exponents fixed. This is because the exponents in each angular space are to a large degree independent of those in the other angular spaces. The angular spaces are cycled through the optimization process until there is no significant change in the total energy and the gradient with respect to the logarithms of the exponents.

The electron configuration used for the basis set optimization was as follows. For the elements in which the 5d is occupied (La, Ce, Gd, Lu), the ground configuration was used for all angular symmetries. For the remaining elements, the ground 4fn  6s2 configuration was used for all symmetries except d, where the 4fn−1 5d1 6s2 configuration was used. For the optimization of the 6p function, the configuration in which 6s2 was replaced with 6p2 in the ground configuration was used.

The size of the basis sets was determined for consistency with basis sets across the row, taking into account the basis set sizes of the 6s, 6p and 5d blocks, and for the quality of the representation of the outermost maxima. As a consequence of this consistency criterion, some revisions were deemed necessary for the 5d block: these revisions are described elsewhere [35].

The size of the dz basis set optimized for Ba [27] is 24s16p10d, and the revised 5d basis set size is 24s17p12d7f [35]. The s set size was therefore chosen to be 24s. For the p set, the choices are 16p or 17p. Optimization of both 16p and 17p sets across the lanthanide series revealed that the 16p sets were too tight by the end of the block, so the larger 17p sets were selected. Because the 5d is more diffuse for the lanthanides than for the 5d transition series, the d set size chosen for the lanthanides was 13d. These choices yielded a 24s17p13d7f dz basis set. Two p functions were then added for the 6p, and the outermost 9 functions were reoptimized. These functions describe the outermost maxima of the 3p through 6p orbitals. The final SCF basis set size including the 6p functions is 24s19p13d7f.

For triple-zeta, the size of the basis set for Ba is 30s21p13d, the revised 5d set size is 30s21p15d10f, and the 6p set size is 30s26p16d10f. The s and p set sizes were therefore taken to be 30s and 21p. As for the dz basis sets, an extra d function was added for the lanthanides relative to the 5d series because the 5d is more diffuse in the lanthanides. These choices yielded a basis set size of 30s21p16d10f. Three p functions were added for the 6p, and the outermost 12 exponents were reoptimized. The final SCF basis set size including the 6p functions is 30s24p16d10f.

For quadruple-zeta, the basis set size for Ba is 35s26p17d, and the 5d basis set size is 34s26p19d12f, with 34s31p20d12f for the 6p series. After optimizing both a 34s and a 35s set, the 35s set was chosen for the lanthanides. Addition of an extra d function to make a 20d set was found to yield 6 functions for the 5d. For reasonable balance, 5 functions are required, which was obtained with a 19d set. For the p set, 4 functions were added to the 26p set for the 6p, and the outermost 14 functions were reoptimized. The final SCF basis set size including the 6p functions is 35s30p19d12f.

Total energies for the ground configuration are given in Table 1. The basis sets used in these calculation include the 6p set and the f functions added for 5d and 6s correlation in the double-zeta and triple-zeta basis sets, as described below. The energies for the quadruple-zeta basis sets fall below the numerical energy, due to the phenomenon of “prolapse” in relativistic basis set calculations [3].

Table 1 Total SCF configuration average energies in E h for uncontracted basis set and numerical calculations on the ground configuration

3 Correlating and dipole polarizing functions

Correlating functions for the 4d, 4f, 5s, 5p, 5d, and 6s shells were optimized in MR-SDCI calculations, using the RAMCI program [33] modified for basis set optimization. All exponents are fully optimized. Exponents for the double-zeta basis sets are given in Table 2. Exponents for the triple-zeta basis sets are given in Tables 3, 4, 5, 6, 7. The f functions reported for 5d6s correlation are the outermost SCF f function and the even-tempered extension. Exponents for the quadruple-zeta basis sets are given in Tables 8, 9, 10, 11. The procedure for the various shells is described below.

Table 2 Exponents of correlating and dipole polarizing functions for the double-zeta basis sets
Table 3 Exponents of 4f correlating 2g1h functions for the triple-zeta basis sets
Table 4 Exponents of 5s5p correlating 2d1f functions for the triple-zeta basis sets
Table 5 Exponents of 5d6s correlating 2f1g functions for the triple-zeta basis sets
Table 6 Exponents of 4d correlating 2f1g functions for the triple-zeta basis sets
Table 7 Exponents of 4f dipole polarization functions for the triple-zeta basis sets
Table 8 Exponents of 4f correlating 3g2h1i functions for the quadruple-zeta basis sets
Table 9 Exponents of 5d6s correlating 3f2g1h functions for the quadruple-zeta basis sets
Table 10 Exponents of 4d correlating 3f2g1h functions for the quadruple-zeta basis sets
Table 11 Exponents of 4f dipole polarization functions for the quadruple-zeta basis sets

3.1 4f correlation

In the basis set optimizations for other blocks, including the 5f, it has been assumed that it is sufficient to add a single function with one unit higher in angular momentum to correlate the main shell for the dz basis set, then increase the maximum correlating angular momentum by one and add one function for each correlating angular momentum, going up the basis set series to tz and qz. Owing to the nodeless nature of the 4f shell, this assumption was tested in a series of calculations. These tests assessed the effect of adding functions in a single symmetry (g, h, i, k) in CI calculations involving double excitations from the 4f shell into shells of that symmetry. The tests showed that the 1g and 2g1h sets are fairly well balanced. The first i function has about the same energy gain as the fourth g function and the third h function, making 4g3h1i fairly well balanced. The next well-balanced set is 5g4h3i2k (k was the highest angular momentum in the study). Balance between the symmetries is not the only consideration, however: it is also important to be able to extrapolate the basis sets smoothly within symmetries. It was considered that the additional g and h functions in the 4g3h1i set might unduly perturb the series used for extrapolation. Therefore, a 3g2h1i set was chosen for the qz basis set, and the scheme used for other basis sets is deemed appropriate for the 4f shell.

For the 4f shell the functions were optimized on the ground fns2 or fn−1d1s2 configuration. The configuration space included all double excitations from the 4f shell into the correlating space, which consisted of 1g for the double-zeta basis sets, 2g1h for triple-zeta, and 3g2h1i for quadruple-zeta. Only configuration state functions (CSF) coupling to the ground state J value were included in the optimization, mainly due to the large number of such CSF. Due to the number of CSF generated for elements in the middle of the block with the quadruple-zeta basis sets, it was necessary to perform separate CI calculations, splitting the correlating set into a 3g set and a 2h1i set. Since the angular spaces are fairly well decoupled, this is not a serious approximation.

3.2 5d and 6s correlation

For the 5d and 6s shells, correlating functions were optimized on the fn−1d1s2 configuration. The configuration space included all double excitations from these two shells into the correlating space, which consisted of 1f for the double-zeta basis sets, 2f1g for triple-zeta, and 3f2g1h for quadruple-zeta. For the double-zeta and triple-zeta sets, the energy was averaged over all possible coupled angular momenta for the CSF. For the quadruple-zeta basis sets, the 4f shell was coupled to the highest possible angular momentum to reduce the number of configurations to a minimum, and the average taken over all possible couplings to this angular momentum. Since the f shell is frozen in these optimizations, this approximation has little impact on the results.

Because the f functions have some overlap with the SCF functions, it is only necessary to add to the basis set those functions that are not already represented. For the double-zeta basis sets, the f function was added to the SCF f set. For the quadruple-zeta basis sets, the SCF set covers the range of the correlating f functions, so the optimized functions are not needed. The choice for the triple-zeta sets was more involved, and is discussed next.

As for the 5f block, two minima were found for the outer d correlating 2f1g functions in the triple-zeta basis sets: one in which the f functions were tighter and one in which the f functions were more diffuse. The g functions in both sets were about the same. The tighter set exists for all elements except Pm; the looser set exists from Pm through Lu. It is likely that the orthogonalization to the 4f influences the optimization. The exponents in both sets are only partly covered by the outer SCF f functions, but an even-tempered extension of one function serves to cover both the correlating f space and a reasonable representation of the dipole polarizing f space, so the even-tempered extension has been used to supplement the SCF f set instead.

3.3 5s and 5p correlation

Correlating functions for the 5s and 5p shells were optimized in MR-SDCI calculations on the ground configuration, for the state with the maximum J value. Single and double excitations out of the 5s and 5p shells into the correlating space were coupled to J = 0, to ensure that only configurations representing 5s and 5p correlation were included. The correlating space consisted of 1d for the double-zeta basis sets, 2d1f for triple-zeta, and 3d2f1g for quadruple-zeta. These optimizations are not strictly necessary for the dz and tz basis sets, because the range of correlating functions is covered by the SCF occupied set. However, they are a useful guide to the functions that need to be uncontracted. The d and f functions are not included in the final basis set.

For the qz basis sets, it is necessary to at least examine the g function to see if it is covered by any of the other correlating sets. It turns out that the the g function has a fair overlap with the outer g for 4f correlation, but the gradient with Z is quite different. A compromise must therefore be made in the selection to avoid linear dependence. Some of the options are to use the outer g from the 4f correlating set for the 5s5p correlation, and to substitute the outer g from the 4f correlating set with the g from the 5s5p correlating set, with or without reoptimization of the other g functions. All of these options result in a loss of correlation energy of a few millihartrees, so none of them is to be preferred. The final choice was to use the outer g from the 4f correlating set for the 5s5p correlation.

3.4 4d correlation

Correlating functions for the 4d shell were optimized on the ground configuration for the state with the maximum J value, with single and double excitations out of the 4d shell into the correlating space coupled to J = 0. The correlating space consisted of 1f for the double-zeta basis sets, 2f1g for triple-zeta, and 3f2g1h for quadruple-zeta. Again, the f functions are not strictly necessary and are not included in the basis set, but serve as a guide to the contraction pattern. The 4d correlating set is needed for La, but for the other elements, 4d correlation is adequately covered by the SCF f functions and the 4f correlating functions.

3.5 Dipole polarization

Functions for dipole polarization of the 4f shell were determined as follows. For the double-zeta basis sets, a single g function was determined for the 4f dipole polarization by maximizing the polarizability calculated by second-order perturbation theory. The basis states for the perturbation theory consisted of the eigenfunctions of the Dirac Hamiltonian for the configurations generated by a single f → g excitation from the ground configuration. For the triple-zeta and quadruple-zeta basis sets, the exponents of the dipole polarization functions were determined by multiplying the outermost correlating exponent for each angular momentum by a factor obtained from the double-zeta results. This factor is the exponent of the double-zeta dipole polarizing g function divided by the exponent of the double-zeta correlating g function. In this way 1g1h and 1g1h1i sets for 4f dipole polarization were generated for the triple-zeta and quadruple-zeta basis sets.

For the 5d dipole polarization, a single f function was determined for the dz basis sets in the same manner, using the eigenfunctions of the Dirac Hamiltonian for the configurations generated by a single d → f excitation from the fn−1d1s2 configuration for the perturbation basis. The functions so generated for the dipole polarization were sufficiently similar to the correlating functions that it was not considered necessary to generate functions for the triple-zeta and quadruple-zeta basis sets.

4 Contraction patterns

Contraction coefficients for the occupied spinors, including the 5d and 6p, were taken from SCF calculations on a weighted average of the valence configurations, as follows:

  • For La, Ce, Gd, and Lu, where the 5d is occupied in the ground state, the 4fn−1 5d1 6s2 and 4fn−1 5d1 6p2 configurations were used with a 9:1 weight ratio (90% s, 10% p). Note that for La the 4f is empty: this element is treated as a transition metal.

  • For the rest (Pr–Eu, Tb–Yb), the 4fn 6s2, 4fn−1 5d1 6s2, and 4fn−1  6p1 6s2 configurations were used with a weight ratio of 6:3:1.

These ratios were chosen so that the primary configuration is not greatly perturbed by the inclusion of orbitals that are empty in the ground state, but still give a reasonable representation of these orbitals. Thus, a weight of 10% was chosen for 6p configurations, and a 2:1 ratio was chosen for the 4fn 6s2 and 4fn−1 5d1 6s2 configurations. These choices are somewhat arbitrary. The higher weight of the 4fn−1 5d1 6s2 configuration reflects the greater importance of the 5d than the 6p.

The f functions that were added for 5d and 6s correlation have been included in the SCF set, and therefore in the contraction calculations. References to the SCF set include these functions.

The total energies for the weighted average used for generating the basis sets are given in Table 12.

Table 12 Configuration average total SCF energies in E h for uncontracted basis set calculations on the weighted average of the configurations used for the contractions

To determine which primitive functions should be uncontracted, a sequence of MR-SDCI calculations was performed on several of the elements across the row, in which different primitive functions were included in the correlating space. For each basis set size, the appropriate number of primitive functions was used in the MRCI calculations. For example, for 5s5p correlation in the triple-zeta basis sets, the correlating set was 2s2p2d1f. Excitations into the 4f shell were not considered in these calculations. The large and small component coefficients of these correlating functions were determined by diagonalizing the Fock matrix in the space that consists of the DHF occupied spinors used as contracted functions and the additional primitive functions that are to be used for correlation. The virtual functions from this diagonalization were then orthogonalized to the original DHF occupied functions, to ensure strict orthogonality. (This procedure also allows the elimination of any linearly dependent functions, but no linear dependence was observed in this work.)

The contraction pattern is described first for the outer valence shells, then the additional functions required for the inner valence are described, and finally the functions for correlation of the 4d. To any of these contractions, the relevant polarization functions listed in Tables 2, 7 and 11 can be added. In the descriptions, functions are counted by increasing exponent size, from the smallest. In cases where linear dependence might be a problem, alternative prescriptions are given.

4.1 Double-zeta basis sets

  • Outer valence 5d/6s/6p To the SCF functions, add the second s primitive, the second p primitive, the first d primitive, and the first f primitive (the 5d/6s correlating f function for La).

  • Inner valence 4f/5s/5p To the outer valence set, add the fourth s primitive, the third p primitive, the third d primitives, the second and third f primitives, and the correlating g function for the 4f shell.

  • Core 4d To the inner valence set, add the fifth s primtive, the fifth p primitive, the sixth d primitive, and the fifth f primitive.

4.2 Triple-zeta basis sets

  • Outer valence 5d/6s/6p To the SCF functions, add the first and third s and p primitives, and the first and second d primitives. For La add the correlating 2f1g functions for the 5d/6s shells; for Ce–Lu add the first and third f primitives and the 5d/6s correlating g function.

  • Inner valence 4f/5s/5p To the outer valence set, add the fourth and sixth s primitives and the fifth and sixth p primitives. For La add the fourth and fifth d primitives, and the 5s5p correlating f function; for Ce–Lu add the third through fifth d and f primitives, and the 4f correlating 2g1h set.

  • Core 4d To the inner valence set, add the seventh and ninth s and p primitives, and the seventh and eighth d primitives. For La, add the correlating 2f1g functions for the 4d shell; for Ce–Lu add the sixth and seventh f primitives from the SCF set. The g function for 4d correlation is covered by the 4f correlating set.

4.3 Quadruple-zeta basis sets

  • Outer valence To the SCF functions, add the first, second, and fourth s and p primitives, the first through third d primitives. For La add the correlating 3f2g1h for the 5d and 6s shells; for Ce–Lu add the first through third f primitives and the correlating 2g1h functions for the 5d and 6s shells.

  • Inner valence To the outer valence set add the fifth, sixth, and eighth s primitives and the sixth, seventh and eighth p primitives. For La add the the third, fourth, and fifth d primitives and the correlating 2f1g set for the 5s and 5p shells; for Ce–Lu add the third through sixth d and f primitives, and the 4f correlating 3g2h1i set.

    If linear dependence problems are encountered in the d space, do not include the second d primitive.

  • Core 4d To the inner valence set, add the tenth through 12th s and p primitives, and the seventh through ninth d primitives. For La add the 4d correlating 3f2g1h set; for Ce–Lu add the seventh and eighth f primitives. The 4d correlating 2g1h set is well enough represented by the 4f correlating functions to be omitted.

5 Application

To exemplify the performance of the basis sets developed here in molecular calculations we have investigated the YbF molecule. This system has received a fair amount of theoretical attention because of its potential for the observation of parity–violating interactions [36, 37], in connection with the determination of the electric dipole moment of the electron (see, for instance, [3842] and references therein).

YbF is an open-shell molecule with a 2Σ+ ground state. Experimental [43, 44] and theoretical [45, 46] investigations indicate that the unpaired electron is located in a σ orbital with dominant contributions from the 6s orbital of Yb, corresponding to a Yb(f 14 sσ)F configuration.

5.1 Computational details

The calculations were performed with the Dirac program suite [47], using the Dirac-Coulomb Hamiltonian. The valence double-zeta, triple-zeta, and quadruple-zeta sets from the present work were used for Yb, including the correlating functions and the functions for dipole polarization of the 4f and the 5d. The matching augmented correlation-consistent basis sets of Dunning [28] (aug-cc-pVnZ, n = 2, 3, 4) were used for F. All basis sets were kept uncontracted, with the small component basis generated by restricted kinetic balance. Furthermore, all two-electron integrals over small component (S) basis sets (the so-called (SS|SS)-type integrals) were replaced by a simple correction [48].

In addition to the potential energy curves for individual basis sets, we have extrapolated the points on the potential energy curves for the triple- and quadruple-zeta sets to obtain a curve for the complete basis set limit (E ), using the relation [49]

$$ E_\infty ({\mathbf{R}}) = \frac{4^3 E_4({\mathbf{R}}) - 3^3 E_3({\mathbf{R}})}{4^3 - 3^3} $$
(1)

where the subscripts denote the cardinal numbers for the basis sets and E n (R) is the potential energy for a given geometry and electronic structure method for a basis of cardinal number n (=2, 3, 4, ∞). It is possible to extrapolate the potential energy rather than the individual energies because the expression used to fit the energy as a function of the cardinal number n is linear in the parameter of the fit, A(R):

$$ E_n ({\mathbf{R}}) = E_\infty ({\mathbf{R}}) + A({\mathbf{R}})/n^3 $$
(2)

Any combination of energies merely produces the same combination of the fit parameters, which can then be treated as a single parameter and the combination of energies can be extrapolated.

The E n (R) energies were obtained at the SCF, MP2, CCSD, and CCSD(T) levels of theory for the ground state [50], using as reference wave function that of the Yb(sσ1/2)F configuration. For the active occupied space, orbitals arising from the combination of the Yb 4f, 5s, 5p, 6s and the F 2s and 2p atomic orbitals were explicitly considered in the calculations. Due to computational constraints, we have truncated the active virtual space so that 117, 230 and 296 orbitals were used in the double-zeta, triple-zeta and quadruple-zeta calculations.

Spectroscopic constants (r e , D e , ω e and ω e x e ) were determined from a fifth degree polynomial fit in the vicinity of the potential energy minima, which corresponded to bond lengths between 1.96 and 2.10 Å, spaced by 0.01 Å  between 1.98 and 2.06 Å, and by 0.02 Å  for the outer regions. In the calculation of D e the asymptotic dissociation limit is calculated from the energies of the isolated neutral atoms, fluorine in the 2 P 3/2 state and Yb in the 1 S state.

5.2 Results and discussion

The results of our calculations for the 174Yb19F isotopomer are presented in Table 13, along with the experimental results [43, 44, 5153] and other theoretical results [39, 5456]. The overall qualitative trend clearly shows that improvement of the basis set and correlation treatment leads to progressively better agreement with experiment for the properties under consideration, as expected from basis sets constructed in such a systematic fashion.

Table 13 Spectroscopic properties for the ground state (2Σ+) of 174YbF, obtained with double-, triple- and quadruple-zeta basis sets, and extrapolated to the complete basis set

For the main spectroscopic constants (r e , ω e , and ω e x e ) the extrapolated CCSD results agree remarkably well with experiment: within 0.001 Å for the bond length, 1 cm−1 for the harmonic frequency, and 0.1 cm−1 for the anharmonicity. The agreement is better than that of the CCSD(T) results, for which the results differ from experiment by about 0.01 Å, 20 cm−1, and 0.3 cm−1 respectively. The agreement of the CCSD results must be considered fortuitous, however, because it is clear from the perturbative triples results that the calculations are not converged. The current calculations do not include correlation of the 4d shell, and it is possible that 4d–4f correlation is significant. Truncation of the virtual space might also contribute to the discrepancy. In addition, the size of the triples correction suggests that a better treatment of the higher-order excitations is necessary to approach the experimental results for the right reasons.

Nevertheless, the current results are comparable to or better than previous theoretical predictions. The all-electron results of Nayak and Chaudhuri [39] do not include any higher angular momentum functions and therefore are not very accurate. The all-electron results of Su et al. [56] do not include higher angular momentum functions on the Yb atom but use the aug-cc-pVQZ basis set for F, which leads to a very unbalanced, and consequently inaccurate, calculation. The remaining calculations all include g functions on Yb, but no higher, and use either triple-zeta or quadruple-zeta basis sets on F. Although these calculations are also somewhat unbalanced, the bulk of the correlation of the Yb 4f is included with the inclusion of g functions. The best of these calculations appears to be the CISD + Q calculations of Cao et al. [54], which include the correlation of the 4d shell. However, given the size of the triples correction in our results, it is likely that these results also represent a fortuitous cancellation of errors. Our results are the only results to include angular momentum beyond g on the Yb atom, and since both the triple-zeta and the quadruple-zeta basis sets include higher angular momentum functions, it is difficult to compare directly with the other results.

The current double-zeta results should be treated with caution. Both the Yb basis and the F basis are too small for accurate calculations, so that a large part of the correlation is missing. This is true of double-zeta calculations in general, which are usually only suitable for qualitative work. However, for YbF there is another reason. Dolg et al. [45] found that SCF calculations pointed to an f13 s2 ground state for YbF whereas in MRCI calculation the f14 s1 configuration was preferentially stabilized and became the ground state. The inadequate correlating space in the present double-zeta calculations could therefore lead to a poor description of the ground state. This possibility is supported by the large triples corrections for the predicted properties, not only for the double-zeta basis, but also for the larger basis sets.

The existence of a perturbing state is verified by calculations done with the triple-zeta basis set with two different virtual spaces. The first included 198 virtuals, and the second included 230 virtuals, which corresponds to an additional s, d, f, g, and h shell. The potential energy curves relative to the minimum for the CCSD and CCSD(T) calculations are shown in Fig. 1, along with the T1 diagnostic, as a function of r. The CCSD curve for the smaller virtual space shows an abrupt variation at around 1.9 Å  that indicates the presence of another state, which is manifested by the large values of the T1 diagnostic in the vicinity of this point. The T1 diagnostic for the larger virtual space at these geometries is in the normal range for a single reference. It is clear from the figures, however, that the same abrupt change in the CCSD potential observed for the smaller virtual space occurs for the larger virtual space, but at a shorter bond length (about 1.78 Å).

Fig. 1
figure 1

CCSD and CCSD(T) potential energy curves (relative to the minima) and T1 diagnostic in the triple-zeta basis set, using different numbers of active virtuals (198 and 230)

As this perturbation occurs relatively far from the CCSD minimum and without significantly changing the shape of the potential, there should only be a small effect on the calculated spectroscopic constants. This is not the case for CCSD(T) due to the sharp rise in energy in the repulsive region, which explains why the CCSD(T) constants are further from experiment than the CCSD constants. This observation also helps to explain the poor accuracy of other calculations, notably those of Su et al. [56], who observe a shoulder between 1.7 and 2.0 Å  in their pseudopotential calculations (which no doubt influences the fitting of the curve and consequently the quality of the spectroscopic constants).

These results confirm the idea that there is more than one state to consider, and that the position of these two states is strongly dependent on the degree of correlation included. The results of single-point calculations with several virtual spaces, given in Table 14, show that the bulk of the extra correlation, and the biggest change in the T1 diagnostic, comes from the addition of a d and a g shell. Moreover, it is clear that the perturbative treatment for the triple excitations is not reliable in the regions of large T1 values, and a more thorough treatment of higher excitations and of the reference space is mandated.

Table 14 Contributions to the electron correlation energy for the coupled cluster methods

For the dissociation energies, the extrapolated CCSD(T) values are the closest to experiment, but are still 20–30 kJ mol−1 too low. Treatment of higher-order excitations, and a better treatment of the perturbing state, are likely to increase the binding and thus bring the results into better agreement with experiment. The results of Cao et al. are closer to experiment than ours, which might be due to the correlation of the 4d shell in their calculations. However, improvements in the basis set and the correlation treatment are likely to move their results to a value that is too high. The fact that the BSSE in their results is about 14 kJ mol−1 indicates that neither the 1-particle nor the N-particle space is saturated. Although the current calculations do not account for BSSE, the effect of BSSE on our results should be smaller than on the results of Cao et al. or of Heiberg et al. Our quadruple-zeta basis set is larger than any of the others used for Yb and includes higher angular momentum functions, which will reduce the size of the BSSE relative to calculations that lack these higher angular momentum functions. The PP values of Su and coworkers are too high and are yet another indication of the poor quality of their calculations.

The experimental numbers are, however, obtained by indirect methods: the results of Ref. [52] are derived from molecular beam chemiluminescence studies, whereas those of Ref. [53] are based on thermochemical data and a ligand-field model. They are therefore rather dependent upon assumptions and approximations in the underlying models. This, together with the variations in the theoretical values, leaves some uncertainty about the magnitude of the error in the current calculations.

6 Conclusion

Basis sets of double-zeta, triple-zeta, and quadruple-zeta quality have been optimized for the lanthanide elements La–Lu, including functions for correlation of the 6s and 5d, the 4f, the 5s and 5p, and the 4d shells. The full tables of basis sets including spin-free relativistic SCF [57] and Dirac-Fock SCF coefficients are available in ASCII format from the Dirac web site, http://dirac.chem.sdu.dk. The spin-free relativistic SCF coefficients include the Foldy–Wouthuysen transformed large component coefficients that can be used in the scalar one-electron NESC approximation [58].

These basis sets are shown to perform well in the determination of the ground-state spectroscopic constants of YbF, and extrapolations to the basis set limit using the triple- and quadruple-zeta basis yields results close to experiment for r e , ω e and ω e x e . For D e , the results are outside the experimental limits by some tens of kJ mol−1, but should be improved by the inclusion of more outer core orbitals and a better treatment of higher excitations. The double-zeta basis sets provide qualitatively correct results, but cannot be used for high accuracy calculations, especially in difficult cases such as YbF where electron correlation effects are extremely important for correctly describing the ground-state wave function.

The importance of using proper, flexible correlating basis sets in order to allow single-reference based perturbative approaches, such as CCSD(T), to work properly in cases where dynamical correlation can strongly influence the nature of the wavefunction was shown here. In light of our results, we therefore suggest the use of at least the triple-zeta basis set, and including in the virtual space all correlation functions, in order to obtain a qualitative and quantitatively correct picture.

7 Internet archive

This paper includes an internet archive in ASCII format. The archive contains the Dirac-Fock SCF coefficients and the spin-free relativistic SCF coefficients, including the Foldy–Wouthuysen transformed large component coefficients, and the correlating and polarizing functions. Prescriptions are given in the archive for the construction of various basis sets.