1 Introduction

Particle physics provides a fertile ground to a vast number of methods to statistically compare theory and data, giving a quantitative filling in order to guide the prospects of a theory, or even revealing important accomplishments or tensions in experimental efforts.

In general, when a measurement provides a large number of data points, a particular theory can visually (and intuitively) be compared to data by a superposition of measured and theoretically expected number of events, together with the associated uncertainties. However, when the number of data points is low, although there are a number of efficient methods from which is possible to draw rigorous quantitative conclusions from data, an intuitive method of comparing theory and experimental results is less direct.

One interesting phenomena, the neutrino burst detected from the supernova SN1987A, is particularly affected by low statistics, evidencing the above mentioned difficulties.

In this paper we review a particular statistical method to interpret and extract scientific conclusions from experiments with low number of data points, using the SN1987A data as an example, and propose a procedure to handle this difficulty through animations.

2 Statistical analysis

The quantitative comparison between theoretical predictions and experimental results is a major part of any scientific endeavour. When comparing theory and experimental data some important features of the theory that can be tested quantitatively are: (i) how good is the agreement between theoretical predictions and experimental data – the goodness of fit, gof, (ii) which set of theoretical parameter values provides the best agreement between theory and observations – the parameters best fit point, bfp and (iii) what is the region in the theoretical parameter phase space in which such agreement is valid for some confidence level. These features can be usually presented graphically, in ways that drive our intuition to better visually comprehend the key concepts of the statistical method used. An interested reader can follow, among many good textbooks on statistics, Ref. [1] for basic concepts of statistical data analysis.

As in many other areas, in particle physics an important quantity around such analysis is made is the detection rate of a specific event. As an example, theoretical models of solar neutrino production provide us with a steady theoretical neutrino flux, and solar neutrino experiments provide us with a detected event rate of solar neutrinos. The comparison between these two quantities can be done by translating the theoretical flux in an expected event rate, or inversely, translating the detected event rate in a compatible expected flux.

Besides, a lot of information can be extracted from the dependence of neutrino flux with its energy or time of detection. The most straightforward way to include this information on statistical analysis would be to split the total data into bins of specific energy or time intervals. Maybe the most spread statistical tool to implement these kind of analysis would be to calculate the following \(\chi ^2\):

$$\begin{aligned} \chi ^2=\sum _{i,j} (R^{th}_i - R^{ex}_i) (\sigma ^{-2})_{ij} (R^{th}_j - R^{ex}_j) \end{aligned}$$
(1)

where the indices i and j track the binning of the data, \(R^{th}_k\) is the theoretical prediction, \(R^{ex}_k\) is the experimental data and \(\sigma \) is the covariance matrix, that holds uncertainties and correlations (and then \(\sigma ^{-2}_{ij}\) is the ij element of the inverse of \(\sigma ^2\)). This analysis has one great advantage: it allows to visually grasp how good is the accordance between theory and experiment in a figure where experimental data points, uncertainties and theoretical predictions can be plotted together. If theoretical predictions are contained inside the region around the experimental data points delimited by the uncertainties, than we expect a good fit.

Several examples in neutrino physics can illustrate such procedure. Again taking solar neutrinos as an example, the data presented by the neutrino detector Super-Kamiokande is divided both in energy and the zenith angle that gives the Sun’s position in the time of detection. In Fig. 1 the data of 1496 days of running experiment are presented, and the binning on energy and angle can be seen. The continuous line represent the prediction for the best fit point of the statistical analysis when flavour conversion is considered in two scenarios, the large mixing angle with large \(\Delta m^2\) (LMA) and lower \(\Delta m^2\) (LOW).Footnote 1

Fig. 1
figure 1

Data from the 1496 days of Super-Kamiokande. Each day/night plot gives a binning in energy, LOW (light gray) and LMA (black) proposed solutions to be visually compared to data points. Taken from [2]

As discussed, it is possible to visually get a feeling about how good the accordance between experimental data and theoretical predictions by how the theoretical curves cross the region around the experimental data within the error bars. For this particular example, two solutions to the solar neutrino problem are presented. We can expect by visually inspecting the figure that both solutions would fit the data quite well, fact that is confirmed by a more careful statistical analysis.

The problem of this visualization is when there is no efficient way to collect the data to form bins, for instance, due to the low event rate. In particular, when the event rate is very low it is necessary to take the experimental data event by event.

In this context, what we propose here is to recover a way to visually access how good the accordance between experimental data and theoretical prediction in a particular scenario when the statistical analysis is done event by event: the neutrino data from Supernova 1987A.

3 Supernova 1987A

The Core Collapse Supernova is a remarkable end of life of a star and one of the most peculiar astrophysical phenomena. Despite being a prominent optical event, its most outstanding property is the powerful release of \(\sim 99\%\) of gravitational binding energy, generally in the order of \(\sim 10^{53}\) erg, from a \(m \gtrsim 10 M_\odot \) progenitor star in (anti)neutrinos of all flavors in MeV scale.

Nevertheless the high neutrino luminosity, a limitation in the neutrino observation on Earth is related to the high distance D from the source, with decreasing flux proportional to \(D^{-2}\), restricting the possible region for neutrino burst detection to a galactic or nearby the Milky Way, that possesses a low supernova rate of \(\sim 1\) per century [3].

Even though, in 1987, three detectors, Kamiokande II [4, 5], IMB [6, 7] and Baksan [8], were capable to observe a neutrino signal associated to a SN in the Large Magellanic Cloud (\(\sim 50\) kpc). These data are presented in Fig. 2. In contrast with Super-Kamiokande data in Fig. 1, these are individual events, and there is no obvious way to overlap a theoretical curve to them. Since the theoretical models provide a flux density and any kind of binning to convert this density into an event probability would be quite arbitrary, the approach through an unbinned maximum likelihood estimation is a robust alternative to confront the theoretical hypothesis with these individual events. In next section we describe such procedure.

Fig. 2
figure 2

Positron energy and relative time from the IMB, Kamiokande II and Baksan detectors, with a total of 29 events

4 Modelling SN1987A event-by-event likelihood

Frequently the likelihood treatment in particle physics involves the usage of Poisson distribution \(P(\mu , n)\), that fits well to phenomena that has small probability to occur, but a large number of tries. Given a measured variable set \(\mathbf{x}\), the Poisson likelihood is given by:

$$\begin{aligned} {\mathscr {L}} = \prod _{i=1}^{N_{bins}} \frac{{\mu (x_i)}^{n_i}}{n_i!} e^{\mu (x_i)} \end{aligned}$$
(2)

where \(n_i\) can be a particular number of events that occurs in a \(x_i + \delta x_i\) interval of our variable, in a number N of intervals, or bins, and \(\mu (x_i)\) is the expected value in the same interval. It is convenient to write \(\mu (x_i)\) as a distribution function on the variable \(x_i\), or \(\mu (x_i) = R(x_i)\delta x_i\), with a given events rate \(R(x_i) = \frac{dN}{dx_i}\) in an equally spaced bin of variable \(\delta x_i\) and number of counts \(n_i\). Including it in Eq. (2):

$$\begin{aligned} {\mathscr {L}}= & {} \prod _{i=1}^{N_{bins}} \frac{[R(x_i) \delta x]^{n_i}}{n_i!} e^{- R(x_i) \delta x} \nonumber \\= & {} e^{- \sum _{j=1}^{N_{bins}} R(x_j) \delta x} \prod _{i=1}^{N_{bins}} \frac{[R(x_i) \delta x]^{n_i}}{n_i!} \; . \end{aligned}$$
(3)

However, binning the data to use a single expected value of a set of points requires to assume a given statistical distribution of such a bin, that generally is assumed to be Gaussian for higher number of entries. The low statistics scenario does not allow this assumption, then it is possible to model the likelihood (3) to account for each event apart. This can be made by taking the bin to an infinitesimal width \(\delta x \rightarrow dx\) and number of counts \(n_i \rightarrow 1\), so we consider only infinitesimal bins with one event and drop the others, then (3) becomes

$$\begin{aligned} {\mathscr {L}} \propto e^{- \int R(x) dx} \prod _{i=1}^{N_{obs}} R(x_i) \end{aligned}$$
(4)

that also has the change from total number of bins \(N_{bins}\) to total number of observed events \(N_{obs}\) and the index i accounts for each individual event. The idea behind maximum likelihood is to maximize the quantity in (4), or given the correspondence \({\mathscr {L}} = e^{-\chi ^2/2}\), minimize the \(\chi ^2(\mathbf{x})= -2 \log {\mathscr {L}}(\mathbf{x})\) to respect to a free set of parameters \(\mathbf{x}\). If we have a single event at \(x=\bar{x}\), this expressions reduces to \(e^{- \int R(x) dx} R(\bar{x}) \). For different models with a normalized expected event rate \(\int R(x)dx\), the likelihood is maximized for the model with the highest value of \(R(x_i)\). And letting the normalization runs freely, it is maximized for \(\int R(x) dx=1\). It is straightforward to note that if we consider more than one single event this maximum occurs on the total number of events.

In a supernova detection, such as SN1987A, the variables x are the neutrino energy, the detection time and events scattering angle, i.e. \(R=R(E, t, \cos \theta )\) [9]:

$$\begin{aligned} R(E, t, \cos \theta )&= n_p \frac{d\sigma (E_\nu , \cos {\theta })}{d\cos {\theta }} \frac{d^2\phi _{\bar{\nu }_e} (E_\nu , t)}{dE_\nu dt} \nonumber \\&\quad \times \xi (\cos {\theta }) \eta (E_e) \frac{dE_\nu }{dE_e} \end{aligned}$$
(5)

with \(n_p\) being the number of free protons of each detector, \(\sigma (E_\nu , \cos {\theta })\) is the inverse beta decay cross section [10], \(\phi _{\bar{\nu }_e}(E_\nu , t)\) represents the electron antineutrino flux on Earth, \(\xi (\cos {\theta })\) is an angular bias of IMB detector and \(\eta (E_e)\) is an efficiency function taken from [8], that fits the reported efficiency points from each collaboration.

Then Eq. (4) becomes:

$$\begin{aligned} {\mathscr {L}}= & {} e^{- \int R(E, t, \cos \theta )\, dE\, dt \,d\cos \theta } \prod _{i=1}^{N_{obs}} R(E_i, t_i, \cos \theta _i)\nonumber \\&\times dE \,dt \,d\cos \theta \end{aligned}$$
(6)

where R is a triply differential equation, \(R=\frac{d^3N}{dE\,dt\,d\cos \theta }\) and N is the expected number of events at the detector. For simplicity we did not include the scattering angle dependence on the animations presented in the following, although they were used in the likelihood calculation. A complete analysis, including other details such as background and energy resolution can be seen in [9, 11,12,13,14,15].

Fig. 3
figure 3

Theoretical events rate cumulative integrated over time (Eq. 8) (blue) and normally distributed data as proposed in (9) (red) changing along relative detection time since the first measured neutrino from SN1987A. The time scale runs logarithmically in the first second and linearly afterwards to better show the data structure for early time events (see the animation here: https://github.com/santosmv/Animations-visualizing-SN1987A-data-analysis/blob/main/events_rate.png)

5 Single event distribution

The main ingredient to construct the likelihood is the theoretical triply differential expected rate. However, since there is no way to convert the theoretical predictions into some quantity to be compared with individual events, we can instead modify the events to match the theoretical probability distribution. For instance, all SN1987A events are published with an uncertainty in energy, so the true information we can take from each event is a probability distribution around some most probable result. Assuming such distribution to be Gaussian, a specific event with measured energy \(\bar{E}_\nu \pm \sigma _E\), where \(\sigma _E\) is the energy uncertainty, measured on time \(\bar{t}\pm \sigma _t\), with \(\sigma _t\) being the uncertainty in time, is related to the following probability distribution:

$$\begin{aligned} \frac{d^2P(E_\nu ,t)}{dE_\nu \,dt}&= \frac{1}{\sigma _E\sqrt{2\pi }} \exp {\left( -\frac{1}{2}\frac{(E_\nu -\bar{E}_\nu )^2}{\sigma _E^2}\right) }\nonumber \\&\quad \times \frac{1}{\sigma _t\sqrt{2\pi }} \exp {\left( -\frac{1}{2}\frac{(t-\bar{t})^2}{\sigma _t^2}\right) } \end{aligned}$$
(7)

where \(P(E_\nu ,t)\) is the probability that the event had a true energy between \(E_\nu \) and \(E_\nu +dE_\nu \), and was measured in the true time between t and \(t+dt\).

This can be compared with the theoretical probability of inducing an event on the detector:

$$\begin{aligned} \frac{d^2N}{dE_\nu \,dt}=A\,\frac{d^2\phi (E_\nu ,t)}{dE_\nu \,dt} \,\sigma (E_\nu ) \end{aligned}$$
(8)

where A is a normalization constant that takes into account the number of targets in the detector and its efficiency. The neutrino interaction cross section is given by \(\sigma (E_\nu )\), and \(\phi (E_\nu ,t)\) is the neutrino flux. The specific parameterization of these two last functions will be presented in the sequence.

To proper visualize the data points being collected, we can create an animation with the detected event probability integrated on time. Since the uncertainty on time is very small, the distribution converges to a \(\delta \)-function, and such animation would advance in steps while the data gets collected:

$$\begin{aligned} \sum _i\frac{dP_i(E_\nu ,t)}{dE_\nu }&=\int _{t_0}^t dt \,\sum _i\frac{d^2P_i(E_\nu ,t)}{dE_\nu \,dt}\nonumber \\&=\sum _i\frac{1}{\sigma _{E_i}\sqrt{2\pi }}\nonumber \\&\quad \exp {\left( -\frac{1}{2}\frac{(E_\nu -\bar{E}_{\nu i})^2}{\sigma _{E_i}^2}\right) }\theta (t-t_i) \end{aligned}$$
(9)

Such animation is presented in Fig. 3 (red curve). Since what is presented is the cumulative result after integrating on time, the final moment of this animation, when integrated also on energy, provides all the 29 events detected by the three experiments. The comparison with theoretical predictions can be made visually if we produce a similar animation for the expected number of events, integrating Eq. (8) on time, also presented in Fig. 3 (dashed curve). This method of a model independent curve representing the spectrum has already been fully discussed in [16,17,18], where Refs. [17, 18] also bring a comparative analysis to neutrino emission models.Footnote 2

Our parameterization of the electron antineutrinoFootnote 3 flux \(\phi (E_\nu , t)\) in Eq. (8) follows the model of Ref. [9] and consists in a two-component emission (accretion + cooling) with nine free parameters, that come from the proposed flux \(\phi = \phi (t, E, \cos \theta , \mathbf{y})\), with \(\mathbf{y} = (T_c, R_c, \tau _c, T_a, M_a, \tau _a)\), where \(T_c\) (\(T_a\)) is the initial antineutrino (positron) temperature from the cooling (accretion) phase, \(R_c\) is the radius of the neutrinosphere, \(\tau _c\) (\(\tau _a\)) is the characteristic time from the cooling (accretion) phase and \(M_a\) is the initial accreting mass. The remaining three free parameters are a time offset \(t_{\mathrm{off}}\) to be adjusted independently for each detector. It was assumed that the neutrino flux was affected by mixing exclusively through MSW effect, in the normal hierarchy scenario [19, 20], with mixing parameters taken from [21]. A more detailed discussion of fitting SN1987A data in a similar way can be seen in the widely cited [22]. A complete comparison of distinct parameterizations is also discussed in [18].

These parameters are estimated from an event-by-event maximum likelihood, and the best fit values of our analysis, used in Fig. 3, are:

$$\begin{aligned}&T_c = 5.1 \,\text{ MeV }, ~R_c = 12 \,\text{ km }, ~\tau _c = 4.3 \,\text{ s }, \end{aligned}$$
(10)
$$\begin{aligned}&T_a = 1.7 \,\text{ MeV }, ~M_a = 1.2 \,M_\odot , ~\tau _a = 0.7 \,\text{ s } \end{aligned}$$
(11)

As described before, the maximization on the likelihood depends on two terms. The term in the exponential factor is related to the number of events, and drives the theoretical parameters to those who provides the right expected number of events, i.e., the area under the curves at the end of the animation in Fig. 3. It is quite easy to grasp if our theoretical model fits well the data by this aspect.

The second term access how close the theoretical curve is to the experimental one at the data central points, both in energy and in time. Since the uncertainty in time is negligible, we can visually compare the curves at the moments a new data is collected, providing us with a visual tool to this second ingredient of the statistical analysis. By performing these two analysis on Fig. 3, we can expect that, although not perfect, the theoretical prediction would provide a reasonably good fit to the data.

Fig. 4
figure 4

To visualize the effect of spectral distortion impact on events rate, we used two sets of parameters for \(T_a\) and \(M_a\) (in green and in cyan) that are excluded at \(90\%\) C.L. according to our analysis. The green curve produces a distortion for low energy events, while the cyan produces a distortion that favour high-energy events. Also on light grey we present the best-fit point of our analyses, shown in Fig. 2 (see the animation here: https://github.com/santosmv/Animations-visualizing-SN1987A-data-analysis/blob/main/events_rate_worse_fit.png)

It is useful now to analyse a set of theoretical parameters that do not fit well the data. This is done in Fig. 4, where we chosed two set of parameters that are excluded at 90% C.L. according to our analysis. These parameters were chosen in a way to not change the total number of predicted events, so we can focus on the energy spectrum information. It is clear, again using a visual comparison, that these new set of parameters produce a worse fit to the data, fact that is confirmed by a full statistical analysis.

As it was pointed out earlier, the two main neutrino observables that we are taking into consideration are the neutrino energy and the time of detection. After discussing the first on the above analysis, we will now focus on the second, and the best way to do this is the limits in neutrino mass that can be achieved using this statistical method.

6 Neutrino mass limits

An important remark is that the neutrino detection spread in time is an important source of physical information, allowing us to probe both Supernova explosion mechanisms and neutrino properties. The most important neutrino property that can be probed by such time spread is its mass.

The first difficulty in these kind of analysis is that the data itself does not allow us to correlate the time of arrival of the neutrino burst at the detectors with the unknown time at which the neutrinos left the Supernova. The solution is to use the data itself to establish, through statistical analysis, the match between the neutrino flux theoretical prediction and the data, taking the time of arrival of the first neutrino event in each detector as a marker. The time of the following events, \(t_i\) are taken as relative ones to the time of arrival of the first event, \(t_1\):

$$\begin{aligned} \delta _i = t_i - t_1 \end{aligned}$$

and \(t_1\) is left to vary freely to best match the theoretical prediction in a previously established time scale.

This simple picture arises when we assume massless neutrinos. In this case the relative time between events is identical to the relative time between the emission of these detected neutrinos on the production site, since the time delay due to the travel between the supernova and the detectors does not depend on the neutrino properties. In Fig. 3 it is assumed a vanishing mass neutrino, and the time showed on the animation correspond to the time since the supernova offset.

Fig. 5
figure 5

Effect of neutrino mass delay on SN1987A detected burst compared to standard flux for a \(3 \sigma \) excluded neutrino mass. The gray line corresponds to the fitted theory in Fig. 3 (see the animation here: https://github.com/santosmv/Animations-visualizing-SN1987A-data-analysis/blob/main/events_rate_mass_delay.png)

However, since the neutrinos have mass, neutrinos with different energies have different velocities, which changes the described scenario. More energetic neutrinos travels faster then less energetic neutrinos, meaning that the relative times between events does not correspond to relative times of the neutrinos emission. The correction is done by a simple kinematic analysis:

$$\begin{aligned} t_{i.d}= & {} t_{i,p}+\frac{D}{v_i}=t_{i,p}+\frac{D}{c} \sqrt{1-\frac{m^2}{p_i^2}} \sim t_{i,p}\\&+\frac{D}{c}\left( 1-\frac{m^2}{2E_i^2}\right) \end{aligned}$$

where D is the distance to the supernova, and m and E are the neutrino mass and event energy. The sub-index p (d) refers to the time at production (detection). The emission time of each event is then calculated from the relative times \(\delta _i\), and the kinematic corrections:

$$\begin{aligned} t_{i,d}= & {} t_{1,d}+\delta _i \nonumber \\ t_{i,p}= & {} \delta _i+\left( t_{i,p}-\frac{D}{c}\frac{m^2}{2E_1^2}\right) +\frac{D}{c}\frac{m^2}{2E_i^2} \end{aligned}$$
(12)

For more details, we refer to [22, 23].

Instead of making the correction on the time of the production, presented here to give proper credit to the authors that proposed and performed this analysis, we prefer to correct the theoretical predictions by continuous spread in time on the neutrino flux spectrum at the detector. So, instead of converting the time of the detected events to the supernova emission, we adjust the theoretical prediction to the detector site. Clearly both choices are equivalent, but with this second procedure we can use the same data animation presented in Fig. 3, and adjust the theoretical curve by making the replacement:

$$\begin{aligned} t\rightarrow t-\frac{D}{c}\frac{m^2}{2E^2} \end{aligned}$$

in Eq. (8).

An animation evidencing this model independent limit is shown in Fig. 5, where we chose an exceeding neutrino mass of 30 eV, highly beyond of astrophysical limits of \(\sim 5\) eV [22, 23] in order to effectively visualize the delay given by mass, with the same astrophysical parameters used to produce Fig. 3, and then with the same neutrino flux at the source. But due to the different time lag of neutrinos traveling to Earth with different energies, the time history of the expected number of events changes significantly, allowing us to place a limit on neutrino analysis using a proper statistical analysis.

7 Conclusion

This paper intended to present a pedagogical view of how to understand the likelihood analysis when an event-by-event treatment is necessary. The detection of SN1987A is a perfect example for that, once a lot of physics can be extracted by the few events that were collected through neutrino detection. It also has the interesting feature that different information can be extracted from the total expected number of events, its spectral distortion or its time structure. We present some animations as a visual tool to understand the statistical procedure, and produce a first impression on how different models fit the data.