The winner of the data averaging challenge (published in volume 414 issue 1) is:

Cristhian Paredes, Instituto Nacional de Metrología de Colombia, Bogotá, D.C., Colombia.

The award entitles the winner to select a Springer book of his choice up to a value of €100,-.

Our Congratulations!

Summarizing the data, by averaging them somehow, is one of the fundamental tasks of data analysis. The task of this Challenge [1] seems trivial at first, but consider this: the solutions received from the readers all had a different answer for the average atomic weight of tellurium, ranging from 126.3 to 126.6. Moreover, the uncertainty associated with this average differed by more than an order of magnitude, ranging from 0.01 to 0.20!

Clearly, this topic is not trivial and requires making numerous modelling assumptions. Are the data consistent with each other? Are all data reliable? Are the provided measurement uncertainties reliable? The answers to these questions will determine the appropriate statistical model to reduce these data. For example, in the case of consistent data with reliable uncertainties, one could adopt the uncertainty weighted mean, µ, given by the following statistical model:

$${A}_{i}(\mathrm{Te})=\mu +{e}_{i} \quad (i=1\dots 6),$$

where ei are the measurement errors modelled as random draws from Gaussian distribution with zero mean and standard deviations u(Ai). This simple model was indeed used by Clarke himself who reported µ = 126.523 with the associated standard uncertainty u(µ) = 0.014 [2]. But the tellurium data are not consistent! The results from methods #1 and #3 are more than 20 standard uncertainties apart from one other (Fig. 1) and such a discrepancy cannot be explained by the provided measurement uncertainties alone.

Fig. 1
figure 1

Summary of atomic weight determinations of tellurium from Clarke’s 1897 edition of Recalculation of the Atomic Weights with ± 2 standard uncertainty error bars [2]

Given the large discrepancies observed between these results, three general approaches can be taken. First, one can declare the provided uncertainties as unreliable and proceed without them by calculating the simple arithmetic average. Second, one can declare some of the data as unreliable and proceed by finding the largest consistent subset using a chi-squared test to assess consistency [3]. Third, one can still proceed with all the data but using a statistical model that provides allowance for errors in the individual results [4]. This approach is formally known as the random effects model, given in its most simple form as

$${A}_{i}(\mathrm{Te})=\mu +\mathrm{\lambda }_{i}+{e}_{i} \quad (i=1\dots 6),$$

where the additional variable λi describes the laboratory effects which can be modelled as random draws from Gaussian distribution with zero mean and unknown standard deviation τ. One of the most popular solutions of this statistical model gives µ = 126.46 with the associated standard uncertainty u(µ) = 0.19 [5].

These are, of course, by no means the only choices. Even with the same statistical model at hand, one can adopt a variety of classical or Bayesian methods to fit the model to the data [6]. And there are many other statistical models developed for this simple task [7,8,9,10], with one of the most recent methods inspired from the Darwinian theory of evolution [11]. Some of these methods are daunting in their complexity but in this particular example, dismissing the provided measurement uncertainties might very well be the best way to reduce this dataset: the simplest of all averages, the arithmetic mean, is 126.44 with the associated standard uncertainty 0.15.