1 Introduction

There is a distinction between the concepts of “numbers” and “data.” This is seemingly trivial, but reading current literature on atomic spectroscopy indicates that not everybody understands it. There are many properties of physical and mathematical objects that can be expressed as exact numbers. A typical example is the ratio of circumference to the diameter of a circle. It is the number \(\pi \), which can be calculated with any required precision. So, we can say that it is known exactly. However, if one tries to measure circumference or diameter of any circle with available measuring tools, the result is never exact. Repeated measurements with the same instrument or with different instruments will give a set of slightly differing values. The result of such measurement can be calculated as a weighted mean of individual results with weights equal to inverse squares of measurement uncertainties.Footnote 1 Uncertainty of this result can be reliably estimated by studying the distribution of individual measurements around the weighted mean. This is a typical example of what we call “data.” Uncertainty is its intrinsic property that cannot be omitted. Without uncertainties, the results of a measurement or a calculation become unusable collections of senseless numbers, as they lack any indication of their dependability. For example, if we omit the uncertainty when giving our result of the measured diameter of some cylinder, users of this result can never be sure that this cylinder will fit in a hole of given dimensions.

As indicated by the above example, uncertainty is a statistical property, and its determination largely relies on comparisons of repeated measurements or calculations. For measurements, there exists a very detailed guideline to determination and expression of uncertainties [2]. This guideline is not exact, as there are many difficult circumstances preventing an exact knowledge of all sources of measurement errors.Footnote 2 Nevertheless, there are good recipes for most circumstances.

For measurements of wavelengths of atomic spectral lines and estimation of uncertainties associated with them and propagating to other indirectly determined atomic properties, such as energy levels, there exists a methodology described in my old review [3]. Since that time, there appeared some new developments in statistical theory, which now allow a more robust estimation of uncertainties. In particular, Dr. Andrew L. Rukhin of the National Institute of Standards and Technology, USA (NIST) has developed a procedure to estimate statistical uncertainties in heterogeneous measurements [4,5,6]. This procedure was successfully utilized in recent atomic spectroscopy works [7, 8]. One of the purposes of this article is to explain this new statistical approach and promote its further use in atomic spectroscopy.

Estimation of uncertainties in theoretical calculations is a more difficult problem. Here, if one repeats the same calculations using the same computer code and input parameters, the results are normally exactly the same, which by no means implies that they are infinitely accurate. Calculating the same atomic properties, e.g., transition probabilities, with a different code or with different input parameters usually gives different results, often by orders of magnitude. Comparisons of results obtained with different computational approaches can provide means for a robust estimation of uncertainties [3, 9]. The methods suggested in the above papers have been extended by several authors (see, e.g., [10, 11]).

Back in 2016, in a talk on this subject at the 12\(^{\text {th}}\) International Colloquium on Atomic Spectra and Oscillator Strengths (ASOS12) [12], a report on the current status of the literature on atomic transition probabilities was given. In particular, it was mentioned that less than 2 % of published theoretical papers contained estimates of uncertainties of the reported theoretical values. There has been a significant progress in this regard: during the last two years, the percentage of such theoretical works containing uncertainty estimates grew to about 10 %. There is still a long way to go, and many researchers still make inadequate estimates. The second purpose of the present article is to provide an update on the procedures of estimation of uncertainties of theoretical transition probabilities in order to disseminate the newly developed methodologies.

2 Uncertainties in wavelength measurements

A common problem in all wavelength measurements is estimation of measurement uncertainties. As with all measurements, researchers must rely on the current recommendations of the International Bureau of Weights and Measures (BIPM) [2]. The NIST Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results [1], despite narrowly targeting NIST researchers in the title, are of general use as well. From these recommendations, one can see that uncertainty should be considered as a purely statistical concept.Footnote 3 One makes a series of measurements, looks at the statistical distribution of the results, adds estimated contributions of possible systematic effects, calculates the weighted mean of the results, and somehow derives an uncertainty of this weighted mean. There is a long-standing problem with this procedure: there is no strict statistical recipe that fits all scenarios.

Let us look at one simple example. The Newtonian gravitational constant, G, has been precisely measured by several independent teams, and a weighted mean of these measurements is adopted by the Committee on Data of the International Science Council (CODATA2018) [13]. The currently recommended value is 6.67430(15) \(\times 10^{-11}\) m\(^3\) kg\(^{-1}\) s\(^{-2}\). Let us take it as one measurement and consider it together with a result of a hypothetical very imprecise undergraduate physics experiment. In this experiment, a student measured the acceleration of the free fall at the Earth’s surface, say, with a reversible pendulum. Then the Earth’s radius and mass data obtained from a web search were plugged into the Newtonian gravitational law to derive a value of G equal to 6.690(3)\(\times 10^{-11}\) m\(^3\) kg\(^{-1}\) s\(^{-2}\). Now, let us treat these two results as a series of repeated measurements and find the standard statistical weighted mean value with instrumental weights (inverse squares of measurement uncertainties) from the formulae below:

$$\begin{aligned} v_{\text {wm}}= & {} \frac{\sum {v_iw_i}}{\sum {w_i}}, \end{aligned}$$
(1)
$$\begin{aligned} u_{\text {wm}}= & {} \frac{1}{(\sum {w_i^2})^{1/2}}, \end{aligned}$$
(2)

where \(v_i\) is the i-th measured value and \(w_i=1/u_i^2\) is its weight in the averaging (\(u_i\) being the standard uncertainty of the i-th measurement).

The weighted mean of our two measurements turns out to be 6.67434(15) \(\times 10^{-11}\) m\(^3\) kg\(^{-1}\) s\(^{-2}\). We seem to have slightly changed G compared to the CODATA value. However, we want to be careful and account for the fact that the standard statistical formulae (1, 2) give biased results. A seemingly good formula for unbiased uncertainty of weighted mean of a series of measurements can be found in Wikipedia [14] (referring to [15]):

$$\begin{aligned} u_{\text {biased}}^2= & {} \frac{\sum {w_i(v_i - v_{\text {wm}})^2}}{V_1}, \end{aligned}$$
(3)
$$\begin{aligned} u_{\text {unbiased}}= & {} \frac{u_{\text {biased}}}{(1 - V_2/V_1^2)^{1/2}}, \end{aligned}$$
(4)

where \(V_1 = \sum {w_i}\) and \(V_2 = \sum {w_i^2}\).

From Eq. (3), we get \(u_{\text {biased}} = 0.0008\), and from Eq. (4), \(u_{\text {unbiased}} = 0.011\) (in the same units as G), 5 and 73 times greater than the CODATA uncertainty, respectively. This implies that the undergraduate experiment has hopelessly spoiled the joint effort of several leading laboratories!

The reason for this apparently wrong result is that Eqs. (3) and (4) are applicable only to the case where all data belong to a sample from a series of measurements having the same statistical distribution. This is obviously not true in this case, as our set of two measured values is inhomogeneous: each of them has its own statistical distribution and its own set of systematic uncertainties, about which there is no information at this stage of data processing. In this situation, the concept of dark uncertainties comes in very naturally. To rectify the result, we need to add in quadrature some unknown dark uncertainties to each of the values participating in the averaging. This is where the new statistical theory of Rukhin [4,5,6] is indispensable. Both his clustered maximum likelihood and clustered reduced maximum likelihood estimators (CMLE and CRLME) unequivocally and correctly single out our undergraduate experiment as the culprit of disagreement between the two values. They both assign a large value of dark uncertainty of 0.016 (in the same units as G) to our undergraduate experiment, which results in the weighted mean value and its uncertainty coinciding with CODATA.

However, the reader must be warned right away that the averaging procedure in this example is methodologically wrong. The result we got from the refined statistics with dark uncertainties should only be used to make a conclusion that one of our experiments was faulty (i.e., the measured value and/or its uncertainty were determined incorrectly) and must be investigated and corrected. It is appropriate to quote here the reference Guide to the Expression of Uncertainty in Measurement (GUM) [2] (section 3.4.7): “Blunders in recording or analyzing data can introduce a significant unknown in the result of a measurement... Measures of uncertainty are not intended to account for such mistakes”. The point I want to make here is that statistics helped us to identify the erroneous measurement, but it is wrong to average correct measurements together with incorrect ones; the latter should rather be corrected (from physical considerations) or excluded from the averaging.

Turning to wavelength measurements, I note that many of them are intrinsically heterogeneous even in the case when a large number of spectral lines are measured in the same experiment (i.e., using the same equipment and methodology). This is because the lines have different spectral profiles (e.g., some are broadened by self-absorption or saturation effects), and many lines are usually overlapping in a quasi-random fashion with other lines or are affected by hyperfine structure and/or isotope shifts.

Rukhin’s method can be used not only in direct comparisons of several measured wavelengths of the same spectral line but also in indirect comparisons, such as analysis of deviations between observed and Ritz wavelengths. Such analysis was suggested in [3] as an easy means to estimate the wavelength uncertainties where they are not available from measurement reports. However, at that time Rukhin’s theory was yet unpublished. Now it is clear that it can nicely supplement this analysis by allowing one to easily detect abnormally deviating wavelengths. One has to keep in mind that no statistical theory can replace a thorough analysis of measurement errors. It can only serve as an indicator of problems in some measurements. Finding the causes of these problems and eliminating them is a task of the physicist.

As mentioned above, in the course of our joint work on Li-like ions [7] I have developed a statistical toolbox implementing Rukhin’s CMLE and CRMLE methods, as well as Mandel–Paule and DerSimonian–Laird dark uncertainty estimators (see [4]) and the functions necessary to make the so-called normal probability plots [16]. This toolbox is implemented in a Visual Basic module embedded in a Microsoft ExcelFootnote 4 file included as the Supplementary Information with this article. Its use is illustrated in the following subsection.

Fig. 1
figure 1

Normal probability plots of normalized residuals (\(R_i\); see text) for calculations made with the Mandel–Paule (MP) and clustered maximum likelihood (CMLE) estimators of dark uncertainties. The quantity \(G(U_i)\) on the horizontal axes of these plots is the percent point function of the uniform order statistic median of the normal distribution [16]. The dotted lines are linear fits to the data points

2.1 Statistical toolbox

To understand the general ideas used in this statistical toolbox, the reader is advised to study Section 4 of Ref. [7], as well as Rukhin’s articles [4,5,6]. The content and usage of the Toolbox are explained in Appendix A.

The most useful of the many functions included in the Toolbox are dark_unc_MP (the Mandel–Paule estimator), which provides for an exact reduction of reduced \(\chi ^2\) (defined further below) to unity in the cases where the measurements participating in the averaging disagree beyond the stated uncertainties, and the two functions based on Rukhin’s clustered estimators, dark_unc_CMLE and dark_unc_CRMLE.

The example included in the toolbox is quoted from our work on Li-like ions [7]. It consists of a set of 15 measurements of energy of the \(1\!s(^2\!S)2s2p(^3P^{\circ })^2P^{\circ }_{1/2}\) level of Li-like oxygen, O VI (for explanation of values and references to their sources, see [7]). To see the dark uncertainties and corresponding weighted mean values, click on the Run button. As explained in [7], there is only one value that significantly disagrees with the rest of the data. This is the energy of 4539260(470) cm\(^{-1}\) deduced from the measurement of an absorption feature reported by Liao et al. [17]. Abnormality of this value is unanimously detected by the CMLE and CRMLE estimators, which assign to it a dark uncertainty of about 1600 cm\(^{-1}\), five times greater than the stated uncertainty (470 cm\(^{-1}\)). One can see that the Mandel-Paule estimator assigns equal dark uncertainties of about 200 cm\(^{-1}\) to all measurements, which results in a weighted mean of 4540570(220) cm\(^{-1}\). With either CMLE or CRMLE, the weighted mean is 4540880(230), while the standard statistical formulae without dark uncertainties give 4540590(210) cm\(^{-1}\). One can see that the Mandel–Paule method gives a mean value close to the standard statistical formula, while the result of CMLE and CRMLE is significantly different (by 310 cm\(^{-1}\), or 1.4\(\sigma \)). As follows from the analysis of Azarov et al. [7], the result of the MP estimator disagrees with the recommended value, 4541169(17) cm\(^{-1}\), by 3\(\sigma \), while for the result of CMLE and CRMLE the discrepancy is much smaller, only 1.3\(\sigma \). Here, however, the main point is that different ways of estimating dark uncertainties result in different values of both the weighted mean and its uncertainty. Application of statistical methods should always be taken with a grain of salt: their underlying assumptions can lead you to a wrong result.

Note that the values of weighted means and their uncertainties can easily be calculated using the functions WMDU() and unc_WMDU() included in the toolbox. These functions take as input three vertical arrays of cells: values, stated uncertainties, and dark uncertainties. The latter can be specified as an array of empty cells for calculation without dark uncertainties.

The procedure called by the Run button of the toolbox also displays the normal probability (NP) plots (see [16]) corresponding to the three averaging procedures (MP, CMLE, and CRMLE). The MP and CMLE plots are shown in Fig. 1 (the plot obtained with CRMLE is omitted, as it is almost exactly the same as the CMLE plot). In these plots, \(R_i\) is the normalized residual of the i-th measurement (difference of the measured value from the weighted mean divided by the stated uncertainty combined in quadrature with the dark uncertainty). One can see that the uniformly distributed dark uncertainty resulting from the MP estimator does not rectify the irregularity of the NP plot: there is one data point that strongly deviates from the straight line near which the rest of the data points are clustered. This data point corresponds to the faulty measurement discussed above. On the other hand, the NP plot drawn from the CMLE calculation is much closer to a straight line, but it has a slope of about 0.74, while one would expect this slope to be close to 1.0 for normally distributed measurements with correctly estimated uncertainties. The corresponding value of \(\chi ^2\) is 0.72, while for the MP calculation it is exactly 1.0, by design of this estimator. One has to come at peace with it: yes, the \(\chi ^2\) and the slope of the NP plot are allowed to be smaller than unity. This can happen due to presence of significant systematic uncertainties in the measurements.

One can see that the MP estimator is conceptually simpler and easier to use than the cumbersome CMLE and CRMLE estimators. However, it has an intrinsic deficiency: it treats all measurements on equal footing. If the measurements are statistically inconsistent, it assigns the same dark uncertainty to all of them. If the measurements are, in fact, unequal (in the sense that they use different methods or different experimental conditions, or there are other factors leading to different statistical distributions in the measurements), this leads to both the weighted mean value and its uncertainty being wrong. The CMLE and CRMLE estimators, when they agree with each other (this is not always the case), give much better estimates for the weighted mean and its uncertainty, even if one chooses not to investigate the causes of the large dark uncertainty they assign to outlying measurements. However, one should understand their limitations. They are designed to be valid under certain assumptions, the most important of which is that there are only two classes of measurements, “good” (receiving no dark uncertainty) and “bad” (all of the latter being equally bad and receiving the same value of dark uncertainty). If one wants to be fast and efficient, the easiest way of getting a statistically consistent set of data is to simply discard those to which CMLE and CRLME unanimously assign large dark uncertainties. However, there is danger of accidentally throwing away a piece of gold together with muck.

As noted above, the detection of a faulty measurement by a statistical procedure is in itself insufficient for a proper interpretation of the measurements. One has to look for physical reasons of the detected problem. In the case of the O VI line measurement by Liao et al. [17], the most probable cause is the intrinsic deficiency of the measurement method: it was designed to measure the positions of absorption peaks, while absorption to autoionizing levels, such as the one selected for this example, is described by a Fano profile, which is asymmetric and includes both an absorption and an emission feature. The true position of the level should be found by fitting a Fano profile to the observed feature in a significantly wider spectral area than just the width of the absorption peak. In this case, indeed, Figure 3 of Liao et al. [17] shows the presence of a weaker emission feature at a wavelength somewhat shorter than that of the absorption peak. A proper modeling of the spectral profile would give a higher energy, consistent with the observed deviation from the mean experimental value, but probably with a significantly larger uncertainty.

As mentioned above, Rukhin’s estimators of dark uncertainties can be used in the analysis of differences between observed and Ritz wavelengths (or wave numbers). Let us take as an example the measurements of Zr I wave numbers by Lawler et al. [18]. Although the results of their level optimization look generally good, there are many lines for which the observed wave numbers deviate from the Ritz values by more than twice the standard uncertainty. Among the total list of about 370 lines, the fraction of such strongly deviating lines is about 8 %, while for a normal statistical distribution one would expect 5 %. The simplest way to detect problematic lines is by calculating the Mandel–Paule dark uncertainty taking the differences of observed wave numbers from the Ritz values (\({\Delta }E_{\text {obs}-\text {Ritz}}\)) as the measured values with uncertainties equal to those of the observed wave numbers (\(u_{\text {obs}}\)). These dark uncertainties should be calculated for small subsets of lines originating from the same upper level. In this way, it is easy to see that the largest dark uncertainty gets assigned to lines from the \(4d^25s5p\)\(^5P^{\circ }_1\) level (25489.87 cm\(^{-1}\) in the NIST Atomic Spectra Database (ASD) [19]). The data associated with the four transitions from this level included in Lawler et al. [18] are given in Table 1.

Table 1 Selected spectral lines of Zr I

As noted above, the dark uncertainties given in the last three columns of Table 1 are estimated by treating the wave number difference \(\Delta \sigma _{\text {obs}-\text {Ritz}}\) as measured quantity with uncertainty equal to that of the observed wave number, \(u_{\text {obs}}\). Some people argue that, in the treatment of \(\Delta \sigma _{\text {obs}-\text {Ritz}}\), the uncertainty of the Ritz wave number should be added in quadrature to \(u_{\text {obs}}\). However, in the least-squares level optimization procedure (in all of its flavors known to me), the minimized quantity is the sum of squares of the ratio \(\Delta \sigma _{\text {obs}-\text {Ritz}}/u_{\text {obs}}\) (denoted as \(\Delta \sigma /u_{\text {obs}}\) in Table 1 for brevity). This sum is called residual sum of squares (RSS). Ideally, if the measurements are a sample from a normal statistical distribution, and all uncertainties \(u_{\text {obs}}\) are properly estimated, the optimization procedure should result in the ratio of RSS to the number of degrees of freedom (number of observed transitions minus the number of excited energy levels) equal to one. This ratio is otherwise known as reduced \(\chi ^2\). Conversely, if \(\chi ^2 > 1\), it means that one or both of the assumptions made above do not hold: some uncertainties are underestimated and/or the measurements are not normally distributed. Note that uncertainties of the Ritz wave numbers do not participate in weighting of the observed wave numbers. These properties of least-squares fitting follow from rigorous statistical considerations (see details in the LOPT article [20]).

The particular transitions that are most likely causing the problem in determination of the upper level energy can be pinpointed by using Rukhin’s CMLE and CRMLE methods implemented in the present statistical toolbox. Both these estimators unanimously suggest (by assigning the large dark uncertainties of about 0.006 cm\(^{-1}\)) that the transitions listed in the first and last rows of Table 1 are the most likely causes of the problem. However, it is not a good practice to blindly use these estimated dark uncertainties to degrade the originally estimated uncertainties. This can lead to a physically unjustified shift of the resulting energy levels. As noted above, no statistical considerations can replace a careful analysis of physical reasons of observed problems. They can be used to pinpoint the likely culprits, but the reasons for large deviations should be investigated. In the case of the measurements of Lawler et al. [18], each measured wave number results from averaging of measurements in several spectra. The number of spectra used for this averaging is given in the column \(N_{\text {spectra}}\) in Table 1. Analyzing the statistics of individual measurements could give more insights about the problem; looking at the line profiles in each spectrum is even more informative. Typical causes of large disagreements between different measurements are blending with neighboring lines, asymmetries caused by partially resolved hyperfine structure or self-absorption, noise, and calibration errors. In typical atomic spectroscopy experiments, the number of observed lines can be very large, measured by thousands or even tens of thousands. It is very difficult to analyze profiles of every measured line. However, isolating the problems to a limited number of lines can make the analysis of outstanding errors manageable. The statistical toolbox given here can be a good help in this matter.

It should be noted that the optimized level values given in Table 1 result from my level optimization made with the LOPT code of Kramida [20]. They slightly differ from the values given by Lawler et al. [18] (by one or two units of the last decimal place). These small differences are due to rounding errors that are treated differently in LOPT and in the code used by Lawler et al. [18].

Regarding the use of the MP, CMLE, and CRMLE estimators of dark uncertainties, my current recommendation is to follow the procedure described above: first, use the fast MP estimator to find potentially problematic measurements, then use both CMLE and CRLME to further narrow down the problem. The reason for using both these estimators is that they do not always agree with each other. If they do agree, it gives a strong indication that the measurements they endow with a large dark uncertainty are the culprit of the problem. If they do not agree, the problem widens. In any case, only a careful examination of all suspicious measurements can give a good solution to the problem of making all measurements statistically consistent with each other.

3 Uncertainties in calculated transition probabilities

There are two radically different approaches to estimation of uncertainties of theoretical calculations. One approach, which is used, e.g., by CODATA [13], is to explicitly include all neglected terms in the equations (such as the omitted high-order effects in a theory accounting only for a limited number of terms), estimate possible sizes of these omitted terms, and combine them in quadrature to arrive at an estimate of the total possible error. In calculation of transition matrix elements (determining transition probabilities), it is a very difficult task. However, it was done in several works. See, e.g., Safronova and Safronova [21]. It is fair to say that this approach is limited to a restricted number of atomic systems and requires a high level of expertise in atomic theory.

The second approach, which is conceptually simpler and technically easier to implement, is based on comparisons utilizing statistics in a way that is similar to evaluation of experimental results. The practical methodology of such evaluation was described in my review article [3], and its possible extensions were outlined in a subsequent article [9].

The main requirement for applicability of statistical evaluation of theoretical uncertainties is availability of a benchmark data set to compare with. In selection of benchmark data, the first priority should always be experimental data, if they are available. However, not all experimental methods are reliable, and it requires considerable expertise in both theory and experiment to detect unreliable experimental data. Some good guidelines for critical evaluation of transition probability data can be found in the review made by Wiese [22]. As one can see from that review, critical evaluation of published data (both experimental and theoretical) is not an easy task. Not many theorists possess the skills and knowledge required for it. Thus, one should use the published products of available critical evaluations. One common source of such critically evaluated data is the NIST ASD [19]. However, it is far from being complete; there are many spectra for which critically evaluated data on transition probabilities do not exist or are too sparse. (Note that the data compiled at NIST are not always the most accurate available. To extract the most reliable and accurate data, one should always consult the literature. Lists of relevant references are conveniently rendered in the links provided in the NIST ASD. These lists are extracted from the NIST bibliographic databases (see Sect. 4), which are fairly complete and up to date.) In the absence of reliable external benchmarks, one can resort to internal comparisons, in which different results of the same computer code can be compared with each other. The most commonly used method is to compare the results for transition rates computed in the length and velocity forms (in a non-relativistic approximation; these two forms roughly correspond to results obtained in the Babushkin and Coulomb gauges in a relativistic calculation). Some of the widely used modern atomic codes do produce transition rates in these two forms, e.g., the General Relativistic Atomic Structure Package (GRASP; see Froese Fischer et al. [23]) and the Flexible Atomic Code (FAC). The latter code was originally developed by Gu [24] and subsequently parallelized and extended to include many-body perturbation theory (MBPT) calculations [25].

To facilitate uncertainty estimation, the GRASP code outputs the so-called uncertainty indicator \(\textrm{d}T\) (see the GRASP2018 manual included in the Supplementary Materials for Ref. [23]):

$$\begin{aligned} \textrm{d}T = \frac{{\vert }A_l - A_v{\vert }}{\text {max}(A_l, A_v)}, \end{aligned}$$
(5)

where \(A_l\) and \(A_v\) are the length and velocity forms of transition rate.

This was apparently inspired by the articles of Froese Fischer [26] and Ekman et al. [27]. However, this definition of the uncertainty indicator differs from the one originally proposed in the above papers:

$$\begin{aligned} \frac{{\delta }A^{\prime }}{A^{\prime }} = ({\delta }E + {\delta }S), \end{aligned}$$
(6)

where \(A^{\prime }\) is the energy-adjusted transition rate (see Eq. (13) below) computed from the observed transition energy (\(E_{\text {obs}}\)), \({\delta }E = {\vert }E_{\text {calc}} - E_{\text {obs}}{\vert }/E_{\text {obs}}\) is the relative error in the transition energy, and \({\delta }S = {\vert }S_l - S_v{\vert } / \text {max}(S_l, S_v)\) is the relative discrepancy between the length and velocity forms of the line strength (see Eqs. (4–7) of Ekman et al. [27]).

The quantity \({\delta }E\) in the above equation practically never goes to zero, while its absence in Eq. (5) results in \(\textrm{d}T\) equal to zero in too many cases just due to chance coincidences of \(A_l\) and \(A_v\). However, in most cases on which theorists work nowadays, experimental energies are not known. This prompted Ekman et al. [27] to give a truncated recipe in their Eq. (8): \({\delta }A = ({\delta }S)A\), which ultimately reduces to Eq. (5).

It is important to understand that the above equations (5, 6) are meant to be only uncertainty indicators and not estimates of uncertainty. Many theorists forget about it and give \(\textrm{d}T\) in their tables describing it as “the uncertainties in the computed transition rates” (see, e.g., Atalay et al. [28] and Manai et al. [29]). As follows from the examples of application of \(\textrm{d}T\) to evaluation of uncertainties in Ekman et al. [27], the initially proposed use of \(\textrm{d}T\) is strictly statistical: the estimate of uncertainty of calculated rates for a certain group of transitions expected to be calculated with a similar accuracy is given as some average of the magnitude of \(\textrm{d}T\) for this group of transitions. How exactly to divide all transitions into groups with similar accuracy is still an open question (see below).

It must be noted that the original uncertainty estimator proposed by Froese Fischer [26], given by Eq. (6), has no statistical justification.

Furthermore, there is a big problem in the definition of \(\textrm{d}T\): due to the use of the max function in the denominator of Eq. (5), it always underestimates the uncertainties. This is because, for any statistically significant data set, there are similar numbers of cases when \(A_v < A_l\) and when \(A_v > A_l\). Always using the maximum value in the denominator results in roughly half the cases in which the value of \(\textrm{d}T\) is smaller than the actual relative discrepancy. When the differences between \(A_l\) and \(A_v\) are small, this underestimation is insignificant, but unfortunately, in most calculations of complex spectra, the vast majority of calculated transition rates have very poor accuracy, with discrepancies between \(A_l\) and \(A_v\) reaching orders of magnitude for very weak transitions. To some extent, this drawback of Eq. (5) can be rectified by using min instead of max in the denominator. This, however, would lead to overestimation of uncertainties. Arguably, it would give a more prudent estimate of uncertainty, since it neglects the errors in the transition energies.

A somewhat better statistical indicator of uncertainty could be given as

$$\begin{aligned} \textrm{d}L = \text {ln}(S_1/S_2), \end{aligned}$$
(7)

where \(S_1\) and \(S_2\) are any two forms of line strength of the same transition. For example, they can be the length and velocity forms (\(S_l\) and \(S_v\)) from the same calculation, or the results from two adjacent layers of a series of active-space calculation (for explanation of this terminology, see, e.g., Jönsson et al. [30], Section 3.2), or the results of two different calculations. To give a reasonable estimate of uncertainty, \(S_1\) and \(S_2\) in Eq. (7) should be calculated in sufficiently developed physical models. For example, if both \(S_1\) and \(S_2\) are calculated with a limited number of interacting configurations, they can be close to each other; then Eq. (7) would give an unreasonably small uncertainty estimate. An indication of a sufficiently good quality of the models can be obtained, e.g., by examining convergence trends in a series of calculations with increasing complexity.

For a group of transitions with similar accuracy, the relative uncertainty of the line strength can be estimated from the root mean square (rms) of dL, \(\langle dL \rangle \):

$$\begin{aligned} u_S \approx e^{\langle dL \rangle } - 1. \end{aligned}$$
(8)

In the limit of \(S_l\) being very close to \(S_v\), when they are used as \(S_1\) and \(S_2\) in Eq. (7), \(u_S\) tends to \(\textrm{d}T\), which is the main reason for using the natural logarithm and natural exponent in these equations.

Fig. 2
figure 2

Line strength S (left), weighted oscillator strength gf (middle), and weighted transition probability gA (right) as a function of transition wavelength \(\lambda \) for the \(1s-np_{3/2}\) (\(n = 2\) to 6) resonance transitions of H-like Ne (Ne X). The data are from the NIST ASD [19] (originating from Jitrik and Bunge [37]). Smooth curves are quadratic interpolations

To my knowledge, the first implementation of this method with the use of results of two successive layers of an active-space multiconfiguration Dirac–Hartree–Fock (MCDHF) calculation was made by El-Sayed in her work on Pd XLII [31]. In that work, as well as in several others, including Ref. [10] mentioned in the Introduction, she uses decimal logarithms instead of natural ones, but the difference from Eqs. (7, 8) is only technical. Her works illustrate an important fact: the dL estimator can be used for evaluating uncertainties not only of electric-dipole (E1) and electric-quadrupole (E2) transitions, but also for magnetic-dipole (M1) and magnetic quadrupole (M2) transitions, where there is no velocity form to compare with.

In another large series of works, a similar method was employed to estimate the uncertainties of transition rates based on comparison of results of two different calculations—MCDHF and MBPT—implemented in GRASP and FAC, respectively; see. e.g., Zhao et al. [32].

One should keep in mind that Eqs. (7) and (8) are also statistically unjustified in the sense that there is no evidence for statistical distribution of dL to be close to normal. In fact, my numerical experiments on M1 and E2 transitions in Fe V [9] showed that it is not the best choice of function from this point of view. Different groups of transitions have different shapes of statistical distributions even when the input parameters of the computing code [33, 34] are varied randomly with a normal statistical distribution. It was found that for most M1 and E2 transitions of Fe V the best function (i.e., having statistical distribution closest to normal) is not ln(\(S_1/S_2\)) but \([(S_1/S_2)^{1/3} - 1] / (1/3)\). Nevertheless, its asymptotic for \(S_1 \rightarrow S_2\) is the same, while for large discrepancies, there is not much sense in exactly answering the question: is the uncertainty one or two orders of magnitude?

As mentioned above, the big outstanding question is: how to divide all transitions into groups with similar expected uncertainties? In my article of 2013 [3], it was argued that the line strength S (in length form) is a natural choice on which such grouping should be made, because it does not explicitly depend on transition energies. However, this is true to some extent only for non-relativistic calculations, and probably not for all of them. For example, in several textbooks (see, e.g., Cowan [33] and Corney [35]) it was shown that for resonance lines of H-like ions, the oscillator strength f does not depend on nuclear charge Z, while the line strength is proportional to \(Z^{-2}\), which is close to the dependence of wavelength on Z (see also Wiese and Fuhr [36]). However, if one plots the dependencies of either S or f on transition energy along any chosen series of transitions with increasing principal quantum number for the same Z or for several values of Z, neither shows a constant behavior in practically any spectrum. This is illustrated in Fig. 2 showing the line strength S, weighted oscillator strength gf, and weighted transition probability gAFootnote 5 as a function of transition wavelength \(\lambda \) for the \(1s-np_{3/2}\) resonance transition of H-like Ne (Ne X) for \(n = (2\)–6).

The dynamic range of variation of the three quantities plotted in Fig. 2 can be taken as the ratio of the \(n = 2\) and \(n = 6\) values (the rightmost and leftmost points on the plots, respectively). It turns out that it is the smallest for gA (32), while for gf and S it is 53 and 69, respectively. Thus, in this case, the line strength S displays the strongest dependence on transition energy.

Nevertheless, it is an empirical fact that for the vast majority of calculated sets of transition rates, uncertainties in the calculated values are strongly correlated with line strength, S, while correlation with other quantities, such as gf, gA, and also the cancellation factor is much weaker. There are exceptions that can be found in the literature. Most notably, in the recent MCDHF calculations of transition rates for neutral nitrogen and oxygen [38, 39], the differences between the Babushkin and Coulomb gauges were found to be most strongly correlated with the A values rather than S. Thus, the general methodology outlined in [3] should be amended by an additional step: one should compare the distribution of computational errors when plotted against different quantities, such as S, gf, gA, or perhaps the branching fractions for radiative decay and choose the quantity that displays the strongest correlation with computational errors. This is illustrated in Fig. 3.

Fig. 3
figure 3

Discrepancy between the line strengths computed in the Babushkin and Coulomb gauges (\(S_{\text {B}}\) and \(S_{\text {C}}\)) plotted against weighted transition probability computed in the Coulomb gauge (\(gA_{\text {C}}\); upper left), against \(S_{\text {B}}\) (upper right), and against \(S_{\text {C}}/\lambda ^2\) (bottom; \(\lambda \) is transition wavelength). The data are from the MCDHF calculation of atomic nitrogen [38]

As one can see in the top right panel of Fig. 3, discrepancies between the Babushkin- and Coulomb-gauge results (\(S_{\text {B}}\) and \(S_{\text {C}}\)) do not correlate with \(S_{\text {B}}\). Correlation with \(S_{\text {C}}\) is even worse and is not shown here. On the other hand, there is a good correlation with \(gA_{\text {C}}\) (top left), which justifies the use of this quantity as a guide to uncertainties. The bottom panel is discussed further below.

Most textbooks, as well as a majority of research articles, recommend using the length form (or Babushkin gauge) as producing more reliable values of transition rates. However, recent research indicates that this is not universally correct. E.g., the MCDHF study of Papoulia et al. [40] indicated that in some cases, particularly for transitions involving high Rydberg states, the results in the Coulomb gauge may converge faster and be more reliable than the results in the Babushkin gauge, depending on the chosen computational scheme. One should also keep in mind that the line strength computed in the length and velocity forms have different sensitivity to errors in the computed transition energy. From Eqs. (8) and (9) of Bilal et al. [41], one can see that the leading term of S in the length form (Babushkin gauge) does not depend on transition wavelength, while the velocity form (Coulomb gauge) is proportional to its square. This can be derived from relativistic formulae given by Grant [42]. Relativistic atomic codes, such as GRASP [23] and FAC [24, 25], do not directly compute line strengths. Instead, they compute reduced transition matrix elements, and then the transition rates (A) are derived from them. Formulae for them can be found, e.g., in the paper of Grant quoted above [42]. From these formulae, by using a standard relation between A and S [36], one can derive approximate expressions for line strength as a function of transition frequency (\(\omega \)). For magnetic transitions,

$$\begin{aligned} S^\text {m}_{\alpha \beta } \propto \left[ \int _0^{\infty }\left( P_{\alpha }Q_{\beta } - Q_{\alpha }P_{\beta }\right) r^L\text {d}r\right] ^2, \end{aligned}$$
(9)

where L is the multipolarity (1 for dipole transitions, 2 for quadrupole, etc.), while for electric transitions the formulae depend on gauge. In the Babushkin gauge (equivalent to length form),

$$\begin{aligned} S^\text {e}_{\alpha \beta }\left( B\right) \propto \left[ \int _0^{\infty }R_{\alpha }R_{\beta }r^L\text {d}r\right] ^2, \end{aligned}$$
(10)

and in the Coulomb gauge (equivalent to velocity form),

$$\begin{aligned} S^\text {e}_{\alpha \beta }\left( C\right) \propto \frac{1}{\omega ^2}\left[ \int _0^{\infty }R_{\beta }r^{\left( L-1\right) /2} \Bigl \{ \frac{\text {d}}{\text {d}r} + \right. \nonumber \\ \left. \frac{\left( l_{\alpha } - l_{\beta }\right) \left( l_{\alpha } + l_{\beta } + 1\right) }{2r}\Bigr \}r^{(L-1)/2}R_{\alpha }\text {d}r\right] ^2. \end{aligned}$$
(11)

In Eqs. (9), (10), and (11), \(\alpha \) and \(\beta \) denote the lower and upper states of a transition, P and Q are the real and imaginary components of the wave function, and definitions of other quantities can be found either in Grant’s paper [42] or in his book [43]. As noted above, these equations are approximate. They were obtained by using the leading terms in Taylor series expansion of Bessel functions of \(\frac{{\omega }r}{c}\) entering radial integrals in relativistic formulae of Grant [42].

Equations (9) and (10) seem to confirm the idea of Ref. [3]: the line strength (of any magnetic transition or of the length form of electric transitions) does not explicitly depend on transition energy and thus should be a good discriminating quantity in evaluation of uncertainties of line strengths. However, as follows from Eq. (11), the velocity form of line strength is not a good quantity in this regard, as it is inversely proportional to transition frequency. It seems reasonable to assume that, if we divide it by squared wavelength, this would cancel the energy dependence, so \(S_C/\lambda ^2\) should be a good discriminating quantity. This is confirmed to some extent by the lower panel of Fig. 3: it looks quite regular, with only one strongly deviating point having ln(\(S_B/S_C\)\(\approx \) 12. This point corresponds to the transition having the longest wavelength in the tables of Ref. [38].

There is an important consequence of Eq.(11): in high-precision calculations, the results in the Coulomb gauge should be adjusted to experimental wavelengths (if known):

$$\begin{aligned} S^v_{\text {adj}} = S^v_{\text {calc}}\frac{\lambda _{\text {exp}}^2}{\lambda _{\text {calc}}^2}. \end{aligned}$$
(12)

To my knowledge, this adjustment was first introduced by Bilal et al. [41]. Note that this adjustment is additional to the well-known adjustment factors for A- and f-values (of E1 transitions, in the length form):

$$\begin{aligned} \begin{aligned} A^{\text {len}}_{\text {adj}}&= A^{\text {len}}_{\text {calc}}\frac{\lambda _{\text {calc}}^3}{\lambda _{\text {exp}}^3}, \\ f^{\text {len}}_{\text {adj}}&= f^{\text {len}}_{\text {calc}}\frac{\lambda _{\text {calc}}}{\lambda _{\text {exp}}}, \end{aligned} \end{aligned}$$
(13)

meaning that, for the velocity form, the right parts of Eq. (13) must be additionally multiplied by the same factor as used in Eq. (12), yielding

$$\begin{aligned} \begin{aligned} A^{\text {vel}}_{\text {adj}}&= A^{\text {vel}}_{\text {calc}}\frac{\lambda _{\text {calc}}}{\lambda _{\text {exp}}}, \\ f^{\text {vel}}_{\text {adj}}&= f^{\text {vel}}_{\text {calc}}\frac{\lambda _{\text {exp}}}{\lambda _{\text {calc}}}. \end{aligned} \end{aligned}$$
(14)

Another interesting development in comparisons of length and velocity forms was recently introduced by Zhang et al. [11]. These authors have found that occasional closeness of results obtained in different gauges is not always random. Sometimes it is due to insensitivity of the calculation to the choice of gauge; then the closeness of results in different gauges indicates real high accuracy of these results. The idea stems from the work of Grant [42] whose treatment allows one to consider gauge not as a strictly discreet choice between a few options but as a continuously varying parameter G. If the wavefunctions are exact, the results have no dependence on G. However, in imprecise numerical calculations, there will be a difference depending on G. The dependence of line strength on gauge is parabolic, with a positive quadratic term [44, 45]:

$$\begin{aligned} S(G) = aG^2 + bG + c. \end{aligned}$$
(15)

Its coefficients can be determined from three data points. Two are provided by the values of S in the Coulomb (\(G = 0\)) and Babushkin (\(G = \sqrt{(L + 1)/L}\)) gauges, where L is the multipolarity (1 for E1 transitions, 2 for E2, etc.). The third point is provided by the non-trivial fact that S equals zero at exactly one point given by Eq. (9) of Rynkun et al. [45]:

$$\begin{aligned} G_{S = 0} = \frac{\sqrt{2}}{1 - (M_B/M_C)}, \end{aligned}$$
(16)

where \(M_C\) and \(M_B\) are reduced matrix elements computed in the Coulomb and Babushkin gauges. In the special case of \(M_B = M_C\), \(G_{S = 0}\) goes to infinity, and S becomes independent of G.

Radžiūtė and Gaigalas [46] have investigated the dependence of dL (with \(S_1 = S_B\) and \(S_2 = S_C\)) on the parameter \(G_{S = 0}\) in several As-like spectra from Br III to Sr VI. As seen from their Figures 8 and 10, in all these spectra, accuracy better than 50 % is achieved when \({\vert }G_{S = 0}\vert > rsim 5\) for both the E1 and E2 transitions, and the dependence of dL on \(G_{S = 0}\) looks very regular, which prompted a discussion of the use of this dependence in evaluation of uncertainties: “Function (S(G)) minimum position correlate[s] well with accuracy class: transitions with [greater] \({\vert }G_{S = 0}\vert \) values are in better accuracy class” [46]. However, this behavior is a simple consequence of definition of \(G_{S = 0}\) (see Eq. 16). Inverting this equation and noting that \(S_B/S_C = (M_B/M_C)^2\), one obtains

$$\begin{aligned} \frac{S_B}{S_C} = (1 - \frac{\sqrt{2}}{G_{S = 0}})^2, \end{aligned}$$
(17)

from which it follows that \(S_B \rightarrow S_C\) when \({\vert }G_{S = 0}\vert \) increases beyond \(\sqrt{2}\), and the precise boundaries of the 50 % accuracy (\(0.5< S_B/S_C < 1.5\)) are at \(G_{S = 0} = \frac{2}{\sqrt{2} - \sqrt{3}} \approx -6.29\) and \(G_{S = 0} = \frac{2}{\sqrt{2} - 1} \approx 4.83\). In my view, the methodology described by Radžiūtė and Gaigalas [46] reflects a simple fact: these authors are assuming that there is no randomness in cases when length and velocity forms give close results, and this closeness always implies a high accuracy of those results. This is not justified, as discussed below.

Various ideas described in the articles mentioned above [11, 45, 46]) seem interesting, as they give new insights into possible ways to analyze the discrepancies between the length and velocity forms of calculated transition rates. However, at present I do not see any clear-cut recipe to decide whether occasional closeness of these two forms indicates real accuracy of the calculation or it is a quasi-random computational artifact. In the literature, there are many examples of calculations yielding very close results in the length and velocity forms, both being very wrong. This was spelled out by Hibbert [47]: “However, even though exact agreement between the two forms is achieved in a local potential approximation, the common value is not necessarily correct. It is sometimes possible to achieve good length and velocity agreement even in the HF approximation (a non-local potential method), but again the common value can be incorrect” (here, ‘HF’ means Hartree–Fock). He further gives some examples of such computational artifacts.

One of the ideas tried by Rynkun et al. [45] and Gaigalas et al. [48] is investigation of the dependence of cancellation effects on gauge. The cancellation factor (CF) originally used in Cowan’s atomic structure codes [33] is a numerical measure of the extent to which contributions of different sign cancel each other in the computation of transition matrix elements (see Eq. (6) of Gaigalas et al. [48]). Its smallness (generally, being smaller than about 0.1) indicates a high degree of cancellation, resulting in unreliable computed values of S and all other related parameters. It was found in both these works [45, 48] that calculations in Babushkin gauge are less sensitive to cancellations than in Coulomb gauge. Rynkun et al. [45] also reported a few transitions of Ce IV for which the largest values of CF (hence, more reliable results for S) were achieved with \(G = 1\), which is in between the Coulomb (\(G = 0\)) and Babushkin (\(G = \sqrt{2}\)) gauges for E1 transitions. The extension of the GRASP code package, with which these computations were made, is not yet publicly available at the distribution site of GRASP2018 (https://www.github.com/compas/grasp2018), but it would be very beneficial for atomic physics research to have it there.

It should be kept in mind that the smallness of the cancellation factor does not always mean that the calculation is hopelessly unreliable. As demonstrated in my work on forbidden transitions of Fe V [9], there are many cases when transitions with very small CF values still have very accurate A values. For the spectrum studied in that work, uncertainties in A correlated much stronger with S than with CF.

Figure 24 of Rynkun et al. [45] illustrates an important subtlety: distributions of errors in the computed line strengths may be different for certain subgroups of transitions. One of the most common reasons for some transitions to be less accurate than others is a sharp difference in the amount of correlation effects included in the calculation for energy levels with different principle quantum numbers n. For example, in the work of Rathi and Sharma [49] on Na-like Ar, Kr, and Xe, the active layers included in the calculations were restricted by \(n \le 11\), while the results include transitions from levels with n up to 9. It is clear that the wavefunctions of states with \(n = 8\) and 9 must be less accurate than those with smaller n, which was confirmed numerically by comparison of results of MCDHF and MBPT calculations performed by these authors. Thus, transitions from the \(n = 8, 9\) levels should be separated from the rest, which was done in that work in estimation of their uncertainties.

3.1 Uncertainties in computed lifetimes: error propagation

The best method to estimate the uncertainties of theoretical radiative lifetimes is to compare them with accurately measured experimental values. However, such benchmark data are available for only a limited number of energy levels in a limited number of spectra. Moreover, a database providing critically evaluated reference data for lifetimes does not exist. So, in estimating their uncertainties, theorists resort to scouring the literature and extracting the data they think are the most reliable. The most convenient way to do it is by using the NIST Atomic Transition Probability Bibliographic Database [50]. Even if the literature search provides some reference data, it is usually incomplete. So, theorists resort to alternative methods of uncertainty estimation.

The most commonly encountered incorrect practice is to compare the results in the length and velocity forms – similar to the method used for line strengths, but applied directly to lifetimes. One of the drawbacks of this method is that it cannot be complete: there exist metastable levels that decay only via magnetic transitions, for which the velocity form does not exist and cannot be used for comparisons. Second, comparison of length and velocity forms is controversial even for line strengths (see the previous section). Third, the lifetimes are computed from transition probability values, which have contributions not only from errors in line strengths, but also from wavelengths. These contributions can be eliminated when all transition probabilities are rescaled to experimental wavelengths. However, those are often unavailable for some levels, which requires careful estimation of uncertainties of calculated wavelengths. Many theorists do it incorrectly, by assuming a fixed percentage error assigned to all excited levels and applying the same percentage error to transition energies. Even for excitation energies, it is often incorrect, because excitation energies are small differences between two large quantities, the total energies of the two levels: the ground level and an excited level. The error in the total energy of the ground level is usually rather large and produces a systematic error in calculation of all excitation energies. An example of correct estimation of errors in calculated energy levels can be found in the work of Li et al. on N I [38]; see their Figure 1 and its discussion in the text. The work of that group on O I [39] provides a similar example of good practice.

Even when uncertainties of calculated excitation energies, \(E_{\text {calc}}\), can be expressed as percentage errors p, wave numbers of transitions between excited levels have uncertainties that are not proportional to wave number (\(\sigma \)). If the errors in \(E_{\text {calc}}\) can be treated as uncorrelated, uncertainties of the calculated wave numbers are combinations in quadrature of the uncertainties of the two levels. If the levels are close (i.e., there is a long-wavelength transition between two close levels), the wave number uncertainty, \(u_\sigma \), can easily be hundreds of times greater than p:

$$\begin{aligned} u_\sigma \approx \sqrt{2}pE_{\text {calc}} \end{aligned}$$
(18)

(when \(\sigma \) is much smaller than both calculated levels).

The work of Rathi and Sharma [49] gives an example of a correct account of contributions from uncertainties in line strengths S and in transition wavelengths in the total uncertainties of calculated transition probabilities; see their Eqs. (1) and (2).

If the computational errors in individual \(A_{ki}\) values (for transition from upper level k to lower level i) are statistically independent, relative uncertainty in the computed lifetime of level k, \(u(\tau _k)\), can easily be determined using the standard formula for error propagation:

$$\begin{aligned} \frac{u(\tau _k)}{\tau _k} = \tau _k\sqrt{\sum _i u(A_{ki})^2}, \end{aligned}$$
(19)

where the summation goes over all transitions to lower levels. This formula was applied, e.g., in the work of Ruczkowski and Elantkowska [51], see their Eq. (14); the paper of Rathi and Sharma [49] is another good example, see their Eq. (3).

An alternate method for estimation of lifetime uncertainties was used, e.g., by Singh et al. [52]. These authors have calculated the lifetimes of \(n \le 3\) levels of He-like W and Au for each successive layer of their active state calculations, which included virtual states up to \(n=7\), and compared the lifetime values from the last two layers. Although there is no evidence that their calculation has converged with respect to line strengths and lifetimes, and the contribution of errors in the computed wavelengths was not accounted for, such comparison can provide a reasonable estimate of possible errors.

4 Conclusions and outlook

This work has reviewed several recent developments in evaluation of uncertainties of transition wavelengths and calculated transition rates, which appeared in the literature after the publication of my last review [3]. The fact that many new interesting ideas have been published in the last several years is very encouraging, as it testifies to a certain success of my attempts to promote good practices of uncertainty estimation in atomic physics. The greatly increased percentage of theoretical papers containing uncertainty estimates (from \(<2\) % seven years ago to about 10 % in the last few years) is also very gratifying. However, the remaining 90 % are still published without uncertainty evaluation. Considering the rate at which these papers are published (for the topic of transition probabilities, steady at about 150 papers per year, see Fig. 4Footnote 6), it is clear that no single person or agency is able to critically evaluate the massive amount of theoretical results continuing to appear in literature. I intentionally call them “results” rather than “data,” because without uncertainties they are not data. In my opinion, publication of any results, whether experimental or theoretical, without carefully analyzed uncertainties should be banned altogether, as their only effect is confusion.

Fig. 4
figure 4

Number of new papers added to the three NIST bibliographic databases per publication year, see Refs. [50, 53, 54]

With experimental papers, there is a similar problem. Although they usually do contain estimates of measurement uncertainties for new reported measurements, most of them are limited to a small number of observed spectral lines in a narrow range of wavelengths. To be incorporated in updated sets of reference atomic data, these newly measured wavelengths must be considered together with all other previously published data on the studied spectrum to derive a consistent set of energy levels that fit all observations. This means that, if the authors are revising the previous data on energy levels by using their new line measurements, they should evaluate not only their own wavelength uncertainties, but also those of previous works, which may concern different sets of spectral lines not directly related to the authors’ measurements but affecting the energy levels involved. In such cases, which are very common, ensuring consistency requires careful assessment of old measurements, which often do not provide uncertainties for every measured line.

I hope that the methods of uncertainty evaluation described in the present work will help researchers to make progress in these matters, and deficiencies in the existing methodology will receive further treatment.