1 Introduction

Spectroscopy in the X-ray region has become firmly established as a key technique for the study of the electronic and geometrical structure of chemical and biological systems. Furthermore, the development of X-ray free-electron lasers that can deliver short femtosecond pulses of X-rays has opened up the possibility of resolving ultrafast chemical processes at an atomic level. Recent examples of experimental work include studies on liquids [1, 2], the composition of an active site in a metalloprotein [3], the nature of bonding in metal containing complexes [4] and the real-time monitoring of bond breaking of a carbon monoxide molecule absorbed on a metal surface [5]. Computational simulations of X-ray spectroscopy often play an important role in the interpretation of experimental data. Within the context of quantum chemical calculations, core-electron binding energies (CEBEs) are most commonly computed using a \(\varDelta \)self-consistent field (\(\varDelta \)SCF) approach [6,7,8,9], although an unrestricted generalised transition state method has also been proposed for the calculation of CEBEs [10, 11]. X-ray absorption spectra can be computed using the transition potential method [12], time-dependent density functional theory (TDDFT) [13, 14], Bethe–Salpeter equation [15], coupled cluster theory [16, 17], the algebraic diagrammatic construction (CVS-ADC) scheme [18] and multi-reference methods [19]. TDDFT and EOM-CCSD methods have also been used to study X-ray emission spectroscopy through the use of a reference determinant with a core–hole [20,21,22,23]. More recently, resonant inelastic X-ray scattering spectra have been simulated based upon multi-reference wavefunction methods [24] and also using Kohn–Sham density functional theory with a core-excited reference determinant [25].

Common to all these approaches for simulating core-electron spectroscopies is the choice of basis set used in the calculation. In many applications, the simulation of core-electron spectroscopy can be computationally demanding, for example, in the study of large systems such as metalloprotein active sites or the study of liquids where is necessary to incorporate averaging over conformations. Consequently, it is important to understand which are the most efficient basis sets for these calculations, i.e. which basis sets provide a good approximation to the complete basis set limit with fewest basis functions. There are a number of well-established families of Gaussian basis sets, including the split-valence basis sets of Pople [26,27,28,29,30,31,32] and the correlation-consistent basis sets of Dunning [33,34,35,36,37,38]. More recently, Jensen has introduced the polarisation-consistent basis sets [39,40,41]. A common feature of all of these basis sets is that they are designed for the calculation of properties that primarily depend on the nature of the valence electrons. Consequently, their performance for the calculation of properties that depend on the core-electrons is less well understood. In order to describe core orbitals accurately, it is often necessary to add additional tight basis functions and correlation-consistent basis sets are available with functions that describe core-electron correlation (cc-pCVXZ). The calculation of nuclear magnetic (NMR) shielding constants represents an example of a molecular property that requires a good representation of the orbitals near the nuclei, and basis sets have been designed for the calculation of NMR properties. For example, the individual gauge for localised orbitals (IGLO) basis sets [42, 43] are often used for the calculation of magnetic properties; however, they have only been defined for hydrogen and the first- and second-row p-block elements. A family of segmented contracted basis sets, denoted pcSseg-n, optimised for the calculation of nuclear magnetic shielding constants has also been reported [44].

Similar to calculations of NMR spectroscopy, accurate calculations of spectroscopy in the X-ray region also rely on a correct description of the core orbitals. Several studies have considered the basis set dependence of CEBEs calculations using a \(\varDelta \)SCF approach in conjunction with density functional theory (DFT) or Møller–Plesset perturbation theory [9, 45,46,47,48,49,50]. It has been reported that the core-valence correlated triple-zeta basis set (cc-pCVTZ) to be accurate and efficient compared with the cc-pV5Z basis set. It was also found that exponent scaled basis sets did not perform well [45]. In another study using a \(\varDelta \)SCF approach, it was found that CEBEs calculated with large basis sets could be reproduced to within 0.2 eV by optimising the exponents and contraction coefficients of relatively small basis sets for the core–hole state [47]. The use of Slater-type basis functions has also been explored for the calculation of CEBEs and the results indicate that polarised triple-zeta basis set of Slater-type orbitals to be adequate [48]. In more recent work, the performance of a range of basis sets and exchange–correlation functionals was investigated for the calculation of CEBEs of first-row hydrides and glycine [50]. The inclusion of polarisation and diffuse functions on the heavy atoms was found to have a significant effect for medium-sized basis sets, such as 6-311G. Several density functionals were found to perform well with large basis sets that have considerable flexibility in the core region. For example, the B3LYP5 and TPSSh functionals had a mean unsigned error of less than 0.2 eV with a fully uncontracted triple-zeta quality basis set augmented with diffuse and polarisation functions. The majority of studies of CEBEs have only considered excitations from the 1s orbitals of first-row elements. Segala and Chong assessed DFT-based calculations of CEBEs for a set of 40 sulphur and phosphorous containing molecules [51]. An additional complication for second-row elements is that relativistic effects become significant and an empirical correction to account for these effects was used. A wide variety of exchange–correlation functionals were considered, and the most accurate functional was found to be VS8 [52], with an average unsigned error of 0.43 eV. A few studies have considered core-excitation energies in addition to CEBEs. In calculations on the formaldehyde molecule, it was concluded that diffuse basis functions were important for simulating X-ray absorption spectra but less important for CEBEs and X-ray emission spectra [49]. In a study of core-excitation energies and CEBEs, it was found that using the 6-311G(d,p) basis set with uncontracted basis functions gave results that were comparable with the much larger cc-pCVQZ basis set [9]. There has been much less attention to the basis set dependence of X-ray emission energies, although one study has found that X-ray emission energies of transition metal complexes computed using TDDFT were highly dependent on the basis set used [23].

In this paper, we explore the basis set dependence of DFT-based calculations of a range of core-electron spectroscopies, including CEBEs computed using a \(\varDelta \)SCF approach, core-excitation energies computed using \(\varDelta \)SCF and TDDFT, and X-ray emission energies computed using TDDFT. We consider a set of 34 molecules that includes excitations from first- and second-row s- and p-block elements and wide range of basis sets with the goal of identifying the most efficient and reliable basis sets for calculations of core-electron spectroscopies.

2 Computational details

Table 1 Molecules used in the study

CEBEs were computed using a \(\varDelta \)SCF approach in conjunction with DFT using the B97-1 exchange–correlation functional [53]. The focus of this study is the variation of the computed core-electron properties with basis set, and the B97-1 functional is assumed to be representative of DFT exchange–correlation functionals. This is explored further in this work where the basis set dependence of difference exchange–correlation functionals is compared. The core-ionised states were optimised using an overlap criterion [54] to maintain the core–hole during the SCF process. In a similar manner, \(\varDelta \)SCF core-excitation energies were computed for the lowest excitation energy arising from an excitation from the relevant core orbital to the lowest unoccupied molecular orbital (LUMO). We note that simulation of X-ray absorption spectra will typically involve excitation to higher-lying orbitals that may be diffuse (Rydberg) in nature. The performance of the basis sets for these states is not assessed directly, and as noted later, the addition of diffuse basis functions would be needed to describe these states adequately. Core-excitation energies were also computed for the core to LUMO transitions with TDDFT. TDDFT can be applied to compute core-excitation energies through limiting the single excitation subspace to include only excitations from the relevant core orbital(s) [13]. It is well known that standard exchange–correlation functionals underestimate core-excitation energies when computed using TDDFT, and several groups have developed functionals specifically designed for core-excitation energies [55,56,57,58,59]. However, in this study we use the B97-1 functional throughout since we are not primarily concerned with a direct comparison with experiment. X-ray emission energies where computed for the highest occupied molecular orbital (HOMO) \(\rightarrow \) core transition by applying TDDFT to a Kohn–Sham determinant with a core hole in a procedure described in more detail elsewhere [21]. All calculations use an unrestricted Kohn–Sham formalism except the TDDFT calculations of core-excitation energies. A range of molecules (shown in Table 1) including first- and second-row s- and p-block elements was considered, with the structure of the molecules optimised at the B97-1/6-311G(d,p) level of theory. This set of molecules was chosen to include the core-excitations from the range of elements in the first and second rows of the periodic table.

The basis sets considered in this study are shown in Table 2 and include small, medium and large basis sets from three widely used families of basis sets including split-valence, correlation-consistent and polarisation-consistent basis sets. Correlation-consistent basis sets with additional core-valence correlation functions are also considered. For these basis sets, the standard regular correlation-consistent basis set is used for hydrogen. In addition, the IGLO-II and IGLO-III basis sets are included since these have been optimised for the calculation of NMR and have been observed previously to perform well for the calculation of core-electron excitations [60, 61] as well as the basis sets of Ahlrichs [62] and a triple-zeta quality atomic natural orbital (ANO) basis set [63, 64]. Here we only include widely used basis sets that are available for the majority of elements and we do not consider specially designed or modified basis sets, with the exception that the 6-31G(d,p) with uncontracted core basis functions (denoted u6-31G(d,p)) is included. The errors in the computed energies are assessed relative to the largest segmented polarisation-consistent (pcSseg-4) basis set. This is a very large basis set and is considered to give a good approximation to the complete basis set limit. However, the error relative to the largest correlation-consistent basis set (cc-pV5Z) is also considered. In order to compare the performance of the different basis sets with respect to their size, N, where N is the number of contracted basis functions for a first- and second-row p-block atom is introduced as a measure of the size of the basis set. In determining N, a pure representation of the d and higher angular momentum basis functions is assumed. All calculations were performed with the Q-Chem software package [65].

Table 2 Basis sets used in this study

3 Results and discussion

3.1 \(\varDelta \)SCF core-electron binding energies and core-excitation energies

Table 3 Error in calculated core-electron binding energies (in eV)

Table 3 gives the mean absolute deviations (MADs) with the associated standard deviation and maximum absolute errors between the computed CEBEs and the values computed with the pcSseg-4 basis set. These values are denoted \(\varDelta ^\mathrm{MAD}\)(pcSseg-4). Values are given for s-block and p-block elements, as well as a combined value. This is necessary since the IGLO-II and IGLO-III basis sets are only available for the p-block elements. The small basis sets of double zeta and lower quality show a large deviation from the large basis set for both s- and p-block elements, and a typical error of 4.5 to over 8 eV is found. For the large majority of molecules, the smaller basis sets give a CEBE that is too large. This is a result of the smaller basis set providing a relatively poor description of the core-ionised state. The average change in energy between the pcSseg-4 and 6-31G(d,p) basis sets is 1.8 eV and 8.9 eV for the ground state and core-excited state, respectively. For the heavier nuclei, this effect can be significant; for example, for H\(_2\)S the ground-state energy is lowered by 1.3 eV with the larger basis set, while for the core-ionised state the energy is lowered by 15.9 eV. If CEBEs were computed for the 1s electrons of heavier nuclei, such as transition metal elements, the size of this disparity would increase further. One explanation for this is that the smaller basis sets, such as 6-31G(d,p), do not have the flexibility to describe the core-ionised state which has a different effective nuclear charge. Adding core-correlation functions to cc-pVDZ with the cc-pCVDZ basis set leads to a large improvement in the calculated CEBEs. The error with respect to the large basis set is reduced by over 4 eV. The overall error of cc-pCVDZ is 2.25 eV which is a little larger than for the pcSseg-1 basis set. It is surprising that the 6-31G(d,p) basis set with uncontracted core basis functions does not lead to a large improvement in performance, since this is contrary to the findings of earlier work [9]. As will be discussed in more detail later, this error is associated with the CEBEs of the second-row nuclei and these systems were not included in previous studies. Overall none of these small basis sets provides an adequate description of the complete set of molecules, with some large maximum errors observed.

Table 4 Error in core-electron binding energies for row 1 and row 2 nuclei (in eV)

The larger basis sets Ahlrichs VTZ, pcS-1 and pcSseg-1 lead to a significant reduction in the error associated with the basis set. Both pcS-1 and pcSseg-1 perform well with \(\varDelta ^\mathrm{MAD}\)(pcSseg-4) of 2.28 and 2.14 eV, respectively. Although these two basis sets perform particularly well for the s-block elements, for the p-block elements the error remains over 2 eV. For the p-block elements, the IGLO-II basis set has a much lower error (<0.9 eV) than basis sets of comparable size. It is difficult to understand in detail the reason for the excellent performance of the IGLO-II basis set since it may be a consequence of a balance between several different factors. One notable difference between the IGLO-II basis set and 6-31G(d,p) is that the exponents of the Gaussian functions of the inner 1s orbital are larger in IGLO-II. If H\(_2\)S is taken as an example, the CEBE is computed to be 2485.0 eV with 6-31G(d,p) and 2468.9 eV with IGLO-II compared with a value of 2470.3 eV for pcSseg-4, consistent with the better performance of IGLO-II. However, if the exponents of the Gaussian functions of the core basis function of 6-31G(d,p) are scaled by 1.1 for sulphur, the CEBE becomes 2473.4 eV. This demonstrates that relatively modest changes to the basis function of the core orbital can have a large effect on the computed CEBE. The large split-valence basis set is also relatively accurate with an overall error of 1.42 eV, with good performance for both s- and p-block elements.

Table 5 Variation in basis set error for different exchange–correlation functionals (in eV)

The large basis sets pcS-2, pcSseg-2, pcS-3 and pcSseg-3 all have overall MADs of 1 eV or less with a well-balanced performance between s- and p-block elements. For the pcSseg-2 basis set, the errors for the large majority of the molecules lie in the range of 0–1.5 eV with a few molecules showing significantly larger errors with a largest error of 4.3 eV for H\(_2\)CS. The inclusion of diffuse basis functions in aug-pcSseg-2 leads to a decrease in the accuracy of the calculated CEBEs compared with pcSseg-4. However, assessment of this basis set relative to a very large basis set that includes diffuse functions might be more appropriate. The pcSseg-3 basis set has an overall error of 0.35 eV with an error of 0.15 eV for the p-block elements which represents a very good level of accuracy. The IGLO-III basis set also performs well, but it is surprising that IGLO-III shows no improvement over IGLO-II. The results also show that the basis set errors for the pcSseg-n basis sets are smaller than for their counterpart pcS-n basis sets, with the exception of pcS-0 and pcSseg-0. This and the good performance of the IGLO basis sets demonstrate that basis sets designed for the prediction of NMR also perform well in the calculation of CEBEs. One feature of the data is the poor performance of the standard correlation-consistent basis sets. Similar behaviour is observed for the ANO basis set. In general, the correlation-consistent basis sets perform poorly, particularly for the s-block elements. For these basis sets, large errors remain for the cc-pVTZ and cc-pVQZ basis sets and this is significantly reduced for the cc-pV5Z basis set. This trend is also observed in the MADs evaluated relative to the cc-pV5Z values, with an error of over 4 eV for the cc-pVQZ basis set relative to the cc-pV5Z basis set. In comparison, the pcSseg-2 and pcSseg-3 basis sets have overall \(\varDelta ^\mathrm{MAD}\) of 1.19 and 0.61 eV relative to the cc-pV5Z basis set. There is a substantial improvement in performance of the correlation-consistent basis sets with core-valence correlation functions. The cc-pCVTZ basis set has an error of <1 eV for s- and p-block elements. This basis set has been observed to be accurate in previous studies [45].

Table 3 shows that the maximum error observed generally occurs for a CEBE for second-row nuclei. To explore this further, Table 4 shows the errors in the CEBEs for a selection of the basis sets studied partitioned into nuclei from the first and second rows of the periodic table. For the first-row nuclei, several basis sets have \(\varDelta ^\mathrm{MAD}\)(pcSseg-4) of less that 1 eV. In particular, IGLO-II, IGLO-III, cc-pCVTZ and pcSseg-3 are very accurate with errors of less that 0.1 eV. For these elements, the standard correlation-consistent basis sets cc-pVTZ and cc-pVQZ also perform well and uncontracting the core basis function in 6-31G(d,p) also leads to a significant improvement. Much larger errors are observed for CEBEs for the second-row nuclei. For these systems, the standard correlation-consistent basis sets perform poorly and there is a large improvement with the inclusion of core-correlation basis functions. The polarisation-consistent basis sets show the best performance for these elements, and pcSseg-2 is noteworthy in that it has a similar error for first- and second-row nuclei.

Fig. 1
figure 1

Variation in the basis set error for core-electron binding energies with the size of the basis set

All of the results presented are for the B97-1 exchange–correlation functional and it is of interest whether the behaviour observed for B97-1 is representative of other functionals. Table 5 shows that variation in computed CEBEs for B97-1 and two other functionals, PBE0 [66] and M06 [67], in addition to Hartree–Fock (HF) theory. The computed CEBE for the pcSseg-4 basis set has been corrected for relativistic effects. Here, we have applied corrections of + 0.08, + 0.17, + 0.31, + 6.50 and + 8.28 eV for carbon, nitrogen, oxygen, phosphorous and sulphur, respectively, consistent with previous studies [68]. For all five of the molecules, the experimental value lies within the range of values predicted by the different methods with the pcSseg-4 basis set, demonstrating that the large basis set leads to values consistent with experiment. The variation in the computed CEBEs for the different basis sets is remarkably similar for the different methods. This indicates that the general observations for the B97-1 functional can be interpreted more broadly to other functionals.

Table 6 Error in calculated \(\varDelta \)SCF core \(\rightarrow \) LUMO excitation energies (in eV)

In order to identify efficient basis sets that give a good approximation to the complete basis set limit with the fewest basis functions, \(\varDelta ^\mathrm{MAD}\)(pcSseg-4) is plotted against a measure of the size of the basis set (N) for the basis sets with \(\varDelta ^\mathrm{MAD}\)(pcSseg-4) < 3 eV (Fig. 1). N is the number of contracted basis functions for a first- and second-row p-block atom and provides a rough measure of the size of the basis sets. The graph shows that, in general, the error arising from the basis set decreases as the size of the basis increases. However, it does highlight basis sets that show good or poor performance in relation to their size. For the correlation-consistent basis sets without core-valence correlation functions, the large cc-pV5Z basis set is required to achieve a reasonable level of accuracy with respect to the basis set limit. However, the use of such a large basis set is not practical for the majority of calculations.

Table 7 Error in core \(\rightarrow \) LUMO excitation energies for row 1 and row 2 nuclei (in eV)

The IGLO basis sets show the best performance of the smaller basis sets, and IGLO-II is a particularly cost-effective basis set. Unfortunately, these basis sets are not available for the s-block elements. The pcS-2 and pcSseg-2 are the smallest basis sets that perform well for both s- and p-block elements (an error of about 1 eV or less), and the large split-valence basis set also does quite well in this regard. To achieve higher levels of accuracy, it is necessary to use the cc-pCVTZ or pcSseg-3 basis sets, with an overall error of less than 0.5 eV for the pcSseg-3 basis set. To put this in context in terms of the computational cost, a single point energy calculation for PF\(_3\) with the pcSseg-2 and pcSseg-3 basis sets take about 1% and 6% of the time for the pcSseg-4 basis set when run on a single processor.

Extending the analysis to consider core-excitation energies, Tables  6 and 7 along with Fig. 2 show the corresponding analysis for the \(\varDelta \)SCF calculated core \(\rightarrow \) LUMO excitation energies. The trends evident for the calculated CEBEs are largely observed for these excitation energies, and there is a strong correlation with the performance of the basis sets for CEBEs. Basis sets of double zeta or lower quality have large average errors of over 4 eV. The series of polarisation-consistent basis sets (pcS-n and pcSseg-n) perform well. There is a reduction in the size of the errors as n increases. The pcSseg-1 basis set has a low overall error of 1.87 eV, and cc-pCVTZ is the smallest basis set to achieve an error of less than 1 eV for both s- and p-block elements. There is a very small error for the pcSseg-3 basis set. Focusing on the p-block elements, the IGLO basis sets perform well and there is a large reduction in the error for the IGLO-III basis set compared with IGLO-II. The large split-valence basis set performs less well compared with the CEBEs calculations. Again for these core \(\rightarrow \) LUMO excitation energies, there is a large error for the correlation-consistent basis sets with core-correlation basis functions and it is necessary to use the very large cc-pV5Z basis set to reduce the error to a reasonable level. Similar to the CEBEs, the correlation-consistent basis sets with core-valence correlation functions lead to a significantly improved performance and the cc-pCVDZ and cc-pCVTZ basis sets have overall errors of 1.42 eV and 0.82 eV, respectively. Simulation of X-ray absorption spectra would require calculation of excitations to higher-lying orbitals. Although the small molecules considered here will have LUMOs of both valence and Rydberg character, it is likely that additional diffuse basis functions would be necessary to adequately describe excitation to orbitals of higher energy. Augmented polarisation-consistent basis sets are available, and diffuse basis functions are also available for the IGLO basis sets [71].

Fig. 2
figure 2

Variation in the basis set error for core \(\rightarrow \) LUMO excitation energies with the size of the basis set

Figure 3 shows the radial behaviour of the 1s orbital at the beryllium and carbon nuclei in BeF\(_2\) and CH\(_4\), respectively. For both orbitals, the large pcSseg-4 basis set results in a core orbital that has a greater value and is more steeply peaked at the nuclei compared with the relatively low quality 6-31G(d,p) basis set. This can be largely attributed to the much larger value of the highest exponent basis functions present in the larger basis set. For BeF\(_2\) the orbital for the pcSseg-2 basis set closely follows that of the pcSseg-4 basis set apart from in the region very close to the nucleus where it has a lower value. The 1s orbital for the cc-pVTZ basis set is close to the pcSseg-2 basis set as the nucleus but is close to the 6-31G(d,p) basis set further from the nucleus. This is indicative of the poor performance of the correlation-consistent basis sets for the s-block elements. For CH\(_4\) the orbitals arising from the pcSseg-2 and cc-pVTZ basis sets show a similar behaviour. However, for this molecule the orbital for the IGLO-III basis set is significantly closer to the large basis set and may underlie the strong performance of this basis set for the p-block elements. The description of the cusp at the nuclei is a well-known deficiency of Gaussian basis sets. Basis sets of Slater-type orbitals will describe the cusp behaviour more accurately, and some preliminary studies using basis sets of Slater-type orbitals have been reported in the literature [48].

Fig. 3
figure 3

Radial behaviour of the core 1s orbitals for BeF\(_2\) and CH\(_4\) with different basis sets

3.2 TDDFT core-excitation and emission energies

Table 8 Error in calculated TDDFT core-excitation energies (in eV)

Table 8 shows \(\varDelta ^\mathrm{MAD}\)(pcSseg-4) for the core \(\rightarrow \) LUMO transition energies computed using TDDFT. These calculations are performed for a subset of the basis sets. The variation in the computed excitation energies between the different basis sets is considerably smaller for the TDDFT calculations compared to the \(\varDelta \)SCF calculations. As a consequence, very large basis sets are not necessary for TDDFT calculations of core-excitations. With the exception of the Ahlrichs VTZ basis set, all of the basis sets shown have an overall error of less than 1 eV, and this includes the 6-31G(d,p) and correlation-consistent basis sets. The pcSseg-1 and pcSseg-2 basis sets reproduce the values for the larger basis set very well, and for the p-block elements, the IGLO basis sets also perform well. As discussed earlier, the large error associated with the basis set in \(\varDelta \)SCF calculations arises predominantly from the calculation of the core-ionised or core-excited state. The TDDFT calculations are based upon the ground-state molecular orbitals which are described relatively well by the small basis sets with the result that the basis set is a less crucial factor.

In contrast to TDDFT calculations of core-excitation energies, the TDDFT calculations of the X-ray emission energies (shown in Table 9) do show a larger dependence on the basis set particular for the second-row elements. This sensitivity to the basis set has been noted in previous work [23]. The approach to simulating the emission energies uses the core-ionised Kohn–Sham determinant in the TDDFT calculation. Consequently, by extension of the arguments made above, a large dependence on the basis set is to be expected since the smaller basis sets do not describe the core-ionised state well. The size of the basis set error for the smaller 6-31G(d,p) and cc-pVDZ basis sets is over 15 eV which is considerably larger than for the \(\varDelta \)SCF calculations of CEBEs and core-excitation energies. Another factor that appears to be significant is in TDDFT calculations of X-ray emission energies and the transition is to an unoccupied core orbital, which also seems to lead to greater sensitivity to the basis set. For the emission energies, the pcSseg-2 basis set has an error of 1.74 eV compared with 0.10 eV for the core-excitation energies. The pcSseg-3 basis set has an error of 0.50 eV and shows that large basis sets are required to eliminate error associated with the basis set in these calculations.

Table 9 Error in calculated TDDFT emission energies (in eV)

For simulations of X-ray absorption spectra, it is also important to accurately determine the intensities of the transitions. Table 10 shows the computed oscillator strengths for the core \(\rightarrow \) LUMO transitions of a selection of molecules for a range of basis sets. The oscillator strengths have been computed within the \(\varDelta \)SCF and TDDFT approaches. The oscillator strengths computed from the \(\varDelta \)SCF calculations show little variation between the different basis sets, and even the cc-pVDZ basis set gives oscillator strengths in close agreement with the much larger basis sets. For the TDDFT calculations, some variation in the computed oscillator strengths is observed. For HCN, which has the most intense transition, the smaller basis sets and cc-pCVTZ predict intensities that are a little larger than pcSseg-4.

Table 10 Basis set dependence of the computed oscillator strengths for core \(\rightarrow \) LUMO transitions

4 Conclusion

The basis set dependence of DFT-based calculations of spectroscopy in the X-ray region has been assessed. \(\varDelta \)SCF calculations of CEBEs and core \(\rightarrow \) LUMO excitation energies, and TDDFT calculations of core-excitation and emission energies have been studied for a range of molecules including excitations from the core orbitals of elements from the first and second rows of the periodic table. A range of widely used basis sets have been used with the aim of identifying relatively small basis sets that perform well for these calculations. For \(\varDelta \)SCF calculations at the K-edge of first-row elements, relatively small basis sets can accurately reproduce of core-electron binding energies and core-excitation energies of much larger basis sets. The IGLO-II, IGLO-III and cc-pCVTZ basis sets perform well, with IGLO-II being a particularly cost-effective basis set. However, application of the IGLO basis sets is limited to p-block elements, and for s-block elements, the cc-pCVTZ or large split-valence (at least 6-311G(d,p) quality) is recommended. Calculations for second-row elements are more challenging, and the pcSseg-2 basis set has the best performance of the smaller basis sets, with pcSseg-3 required for a greater level of accuracy significantly less than 1 eV. The standard correlation-consistent basis sets have large errors. However, the core-valence correlation versions of the basis sets (cc-pCVXZ) are much more accurate but less accurate than pcSseg-n basis sets of comparable size. For these calculations, the basis set error is predominantly associated with the calculation of the core-excited or core-ionised state. This suggests that many of the smaller basis sets lack the flexibility to adjust for the change in effective nuclear charge associated with removing a core-electron. Furthermore, smaller basis sets provide a poor description of the radial behaviour of the 1s orbital and some correlation between the quality of the description of the core orbital and size of the basis set error. The TDDFT calculations of the core-excitation energies show less sensitivity to the basis set used, and relatively small basis sets reproduce the excitation energies of the larger basis sets well. For these calculations, the pcSseg-1 and pcSseg-2 perform well, although the versions augmented with diffuse basis functions may be more appropriate for some applications. In contrast, TDDFT calculations of X-ray emission energies show a high dependence on the basis set used. This can be rationalised by the fact that TDDFT calculations of excitation energies are based upon the ground-state Kohn–Sham determinant, while TDDFT calculations of emission energies used the core-ionised determinant. For these calculations, the IGLO-II, IGLO-III and pcsSeg-2 basis sets provide a good level of accuracy for the K-edge of both row 1 and row 2 elements.