Fragile detection of solar g modes by Fossat et al

The internal gravity modes of the Sun are notoriously difficult to detect, and the claimed detection of gravity modes presented in Fossat et al. 2017 is thus very exciting. Given the importance of these modes for understanding solar structure and dynamics, the results must be robust. While Fossat et al. 2017 described their method and parameter choices in detail, the sensitivity of their results to several parameters were not presented. Therefore, we test the sensitivity to a selection of them. The most concerning result is that the detection vanishes when we adjust the start time of the 16.5 year velocity time series by a few hours. We conclude that this reported detection of gravity modes is extremely fragile and should be treated with utmost caution.


Introduction
Despite the revelations about the solar internal structure and dynamics from helioseismology in the past thirty years, the deep core of the Sun has remained invisible. This is because the most easily observed pressure modes (p modes) are predominantly sensitive to the near-surface layers of the Sun. Gravity modes (g modes) that probe the core of the Sun, are evanescent in the convection zone and have small amplitudes at the surface making them difficult to detect (e.g. Appourchaux et al., 2010). There have been prior claims of g-mode detection, (e.g. García et al., 2007), but to our knowledge the results have not been independently reproduced, and remain controversial. Refreshingly, Fossat et al. (2017) provide their data publicly and describe their method sufficiently well for us to qualitatively reproduce their results. Fossat et al. (2017) present a method based on the principle that g modes perturb the solar core, changing the round-trip travel-time (the time taken to travel to the other side of the Sun and back) of the sound waves (p modes). They measure perturbations to the large separation (equivalent to the roundtrip travel-time) between pairs of low-frequency p modes with even (ℓ = 0, 2) and odd (ℓ = 1, 3) harmonic degrees. The large separation is not as susceptible to convective noise and surface effects as individual mode frequencies, and is sensitive to the mean density of the Sun.
Applying their method to the long-term velocity time series from the Global Oscillations at Low Frequencies (GOLF) instrument (Gabriel et al., 1995), Fossat et al. (2017) inferred that the solar core rotates about 3.8 times faster than the envelope. Given the potential impact of this detection on solar structure and dynamics, it is important that these results can be independently reproduced and tested. A robust detection will also impact the theory of stellar evolution, and the method could potentially be applied to stellar observations of Sun-like stars, for example from PLATO (Rauer et al., 2014) observations.
Although Fossat et al. (2017) described their analysis method well enough to be qualitatively reproduced, they did not present a quantitative description of the sensitivity to the parameters of the method.
In this paper we first present our independent reproduction of the measurement of the rotational splitting of the g modes (Sect. 2). We then examine the sensitivity of the significance of the detection to four parameters in the analysis method: the method used to measure the round-trip travel-time in Sect. 3, the smoothing of the power spectrum of the round-trip travel-time in Sect. 4, the cadence of the round-trip travel-time measurements in Sect. 6, and the start-time of the GOLF time series in Sect. 5. We conclude in Sect. 7.

Reproduction of g mode detection
By following the analysis method described in Fossat et al. (2017) we were able to qualitatively reproduce the results of Fig. 10 in that paper. However our results are slightly different, possibly due to differences in the input data or because some parts of the algorithm were not clearly described. Here we describe our procedure in relation to that described in Fossat et al. (2017).
We downloaded the 16.5 year-long time series of the solar global velocity observations from the GOLF instrument provided at https://www.ias.u-psud.fr/golf/templates/access.html and divided the data into 36130, 8 hour-long segments with a 4 hr cadence.
For each valid segment we first padded the data out to 10 6 seconds before computing the power spectrum. We filtered this power spectrum for the frequency band between 2.32 and 3.74 mHz, and translated the beginning of this band to the zeroth frequency. We subtracted the mean of the power spectrum and zero-padded out to 125 mHz. We then computed the power spectrum of this padded, frequency-filtered power spectrum, to get the temporal power spectrum (which is the envelope of the autocorrelation of the GOLF time series in the selected frequency range). We did a least-squares fit of a quadratic function to the prominent peak in the range between 14000 s and 15600 s. The time of the maximum of the quadratic is the round-trip travel-time of the p modes. At this point, we have a time series of the round-trip travel-time of the p modes at a 4 hr cadence. If a segment had a duty-cycle less than 25% we set the measurement of the round-trip travel-time to zero.
The condition that the segments have a minimum 25% duty-cycle was a subjective choice on our behalf because the value was not specifically stated in Fossat et al. (2017), and results in 1.5% more round-trip travel-time measurements than Fossat et al. (2017). There are 34775 non-zero values with a root-mean-square (rms) uncertainty of 54 s, compared to Fig. 6 in Fossat et al. (2017), who have 34261 non-zero values with an rms of 52 s.
We then subtracted the mean value (14806 s, compared to Fossat et al. (2017) who gets 14807 s) to get the round-trip travel-time perturbation as a function of time with a 4 hr cadence. We set any absolute value larger than 240 s to zero. We then computed the power spectrum of these round-trip travel-time perturbations, and convolved it with a six-pixel-wide box-car window (11.5 nHz) to smooth the power spectrum, and calculated it's autocorrelation up to 3500 nHz. We note that Fossat et al. (2017) used a seven-pixel-wide window with weighting [0.5, 1, 1, 1, 1, 1, 0.5] (private communication, E. Fossat). This makes little difference to the results. Figure 1 shows the result of our analysis, which can be qualitatively compared to Fig. 10 of Fossat et al. (2017). The peaks at 210 nHz, 630 nHz and 1260 nHz are claimed by Fossat et al. (2017) to represent the splittings of the g modes caused by rotation in the core. Our highest peak at 210 nHz has a significance of 4.0 σ compared to 4.7σ in Fossat et al. (2017).
We defined the significance as the maximum value of the autocorrelation in a certain range of frequency lag, relative to the standard deviation of the autocorrelation in the full range of frequency lag. We defined the maximum of the first peak to lie within the range 189-231 nHz (see the blue part of the curve in Figs. 1).
Here, we simply aimed to first qualitatively reproduce the results, and in the following sections we will analyse the sensitivity to some parameters in the analysis method.
Initially, we did not interpret the description of "smoothed over 6 bins" and "6-bin smoothing" of the power spectrum correctly and had difficulty acquiring the large significance of the main peak. Through private communication with D. Salabert and T. Corbard we established that this phrase means a convolution of a six-pixel-wide window with the power spectrum.
We found that the significance of our main peak could increase by relaxing the criteria that each 8 hour segment of data has a duty-cycle of at least 25%. However, we chose to keep a well defined value here rather than try to tune any of the criterion to increase the significance.
3. Sensitivity to the measurement of the round-trip travel-time Fossat et al. (2017) claim that "the (second-order) polynomial fit made on a range of ±800 s around 14 800 s minimizes the scatter of T ", (where T is the round-trip travel-time) and that this fit is "not at all the best fit on the peak profile, but it is the least noisy estimate of the peak centroid.". Naively, it looks like a Gaussian would be a good fit to measure the peak of this curve (see Fig. 2), however the centroid may be the key parameter. Therefore we compare the least squares fit of the quadratic function used in Fossat et al. (2017), q(t) = at 2 + bt + c (where t is time), with a non-linear least squares fit of a three parameter Gaussian function, g(t) = Ae ; and a direct measurement of the centroid c = t 2 , where t 1 = 14000 s, t 2 = 15600 s and we evaluated the integrals discretely using the trapezoidal rule. Figure 2 shows an example of the different fits to the round-trip travel-time peak. The root-mean-square of the round-trip travel-time perturbations (e.g. similar to Fig. 1) is 53 s for the quadratic fit, 72 s for the Gaussian fit and 64 s for the centroid measurement, compared to 52 s in Fossat et al. (2017).
This quantitatively shows that the quadratic fit does indeed have the lowest noise compared to the two other methods of measuring the round-trip traveltime. The Pearson correlation coefficient for the time series of the round-trip travel-time measurements quadratic and Gaussian measurements is 0.89, and the quadratic and centroid measurement is 0.93, showing that the different methods do not measure something wildly different. The significance of the first peak in the autocorrelation function for the Gaussian is similar to the quadratic, but the second peak is not as significant and the third peak is not clear at all (Fig. 3). The first peak is much less significant in the centroid measurements than in the other two cases. 4. Sensitivity to the smoothing of the power spectrum Fossat et al. (2017) states that the significance of the peaks is maximised by convolving the power spectrum of the round-trip travel-times with a six-pixel window (11.5 nHz, 1 pixel corresponds to a frequency bin of 1.92 nHz) to smooth the data. They also claim that the smoothing accounts for the imprecision of the observed mode frequencies in the power spectra. We computed the autocorrelation for power spectra convolved with box-car windows up to twelve pixels wide. Figure 4 shows the autocorrelation of the power spectrum without smoothing, and Figure 5 shows the autocorrelation of the power spectrum smoothed by a twelve-pixel-wide (23 nHz) box-car window.
We computed the significance of the peaks in the autocorrelation by measuring the maximum value in the range 189-231 nHz for the first peak, 598.5-661.5 nHz for the second peak and 1234.8-1285.2 nHz for the third peak (see, for example, the blue parts of the autocorrelation curve in Fig. 3). Figure 6 shows the significance of each of the three peaks for different sizes of the smoothing window and for the different methods of measuring the round-trip travel-time. We quantitatively show that the significance of the first and second peaks is a maximum for a six-pixel-wide smoothing window for the quadratic case as claimed by Fossat et al. (2017).

Sensitivity to start-time of the GOLF time series
The GOLF velocity time series is 16.5 years long with an 80 s cadence. We changed the start-time of the data by removing different amounts of data from the beginning of the time series to test the stability of the results.  Figure 7 shows the autocorrelation function for four different start times. When we removed 2 hrs and 10 hrs there are no significant peaks. On the other hand, removing whole segments (e.g. 4 or 24 hours), does not have a large effect. That the significance is so sensitive to shifting the start time by a couple of hours out of 16.5 years clearly shows that the result is fragile. To further investigate this, Fig. 8 shows the significance of the three peaks as a function of the start time. This shows a 4 hr oscillation period, equal to the cadence, confirming the fragility of the results.
6. Sensitivity to cadence of round-trip travel-time measurement Fossat et al. (2017) measured round-trip travel-times at a cadence of half (4 hrs) the segment length (8 hrs). These seem like reasonable choices, but were not justified. We repeated the analysis using different cadences (while keeping the segment length equal to twice the cadence) and scaling the cut-off value of 240 s by the relative rms of the non-zero round-trip travel-time. Longer (shorter) segment lengths makes the round-trip travel-time peak narrower (wider), and less (more) noisy. Figure 9 shows that the signal in the autocorrelation vanishes for a 3.9 hr cadence. Why such a small change has such a large impact is difficult to understand, as there is no particular physical relevance of exactly 4 hours.
To further illustrate the robustness of the results we explored what would happen if we had initially selected a different cadence. To do this, we chose the largest peak in the autocorrelation between 30 and 500 nHz and used this to compute where we would expect the other two peaks to be. To calculate where the other peaks should be we used the asymptotic splitting of the g modes, s ℓ,m = m [β ℓ Ω g − Ω p ], where β ℓ = 1−1/ (ℓ(ℓ + 1)), where Ω g is the mean rotation rate felt by the g modes, and we defined Ω p = 433 nHz as the mean rotation rate felt by the p modes. For some cases we found peaks where we would expect them to be, albeit with less significance than for the 4 hr cadence case. In other cases we did not find any peaks.
As an example, we show the case of measuring the round-trip travel-time at a 5 hr cadence. We assumed that the largest peak we found, at 498 nHz, is the Figure 7. The autocorrelation function for start-times offset by 2 hrs (top), 4 hrs, 10 hrs and 24 hrs (bottom) relative to the original start-time. The blue curve indicates the range within which we find the maxima to compute the significance of the peak. The vertical red lines indicate where Fossat et al. (2017) identified the original three g-mode peaks. Peaks at the location of the purported g modes are only evident at multiples of four hours. ℓ = 1, m = 1 mode, and so Ω g = 1862 nHz (4.3 times the rotation rate felt by the p modes). Then we used this value of Ω g to predict where the ℓ = 2 mode splittings should be. These values, s 2,1 = 1118 nHz and s 2,2 = 2237 nHz, are shown as vertical dashed lines in the top panel of Fig. 10, and indeed there are peaks nearby. This demonstrates that if this arbitrary cadence had been chosen from the beginning, then a different answer would have been found. Figure 8. Significance of the three g-mode peaks as a function of start-time relative to the start-time of the original GOLF time series. The black crosses are the significance of the maximum value of the autocorrelation near the first peak, the red stars are the significance near the second peak, and the blue diamonds are the significance near the third peak. The first points at zero start-time correspond to the three peaks in Fig. 1. In addition, we adjusted the start time and repeated the analysis for the 5 hr cadence (Fig. 10). The bottom panel of Fig. 10 shows the autocorrelation for a start time offset by 5 hrs (the cadence). We see that the location of the peaks are similar to the case with no offset (top panel). Similarly to the 4 hr cadence case, Fig. 11 shows that the significances are highest when removing multiples of the cadence. Fig. 8 and 11 together suggest that the peaks found by Fossat et al. (2017) could be an artefact of the analysis method and the choice of cadence.

Conclusions
We have shown that the most recent detection of g modes by Fossat et al. (2017) is extremely fragile. In particular, the claimed detection is sensitive to • the start-time of the GOLF data series (Fig. 7), • the cadence of the round-trip travel-time measurements (Fig. 9) • the technique used to measure the round-trip travel times (Fig. 3), and • the smoothing of the power spectrum (Fig. 6).
The first point is particularly worrying because it means that the inclusion or exclusion of a very small fraction of the data makes a substantial difference, and in general because the parameter is unrelated to the properties of the Sun. This is further illustrated in Fig. 8, which shows that the significance varies with a Figure 9. The autocorrelation function for different cadences of the round-trip travel-time measurement. From the top panel down we show results for a 3 hr cadence, 3.5 hr cadence, 3.9 hr cadence, 4.5 hr cadence and 5 hr cadence. The blue curve indicates the range within which we find the maxima to compute the significance of the peaks. The vertical red lines indicate where Fossat et al. (2017) identified the original three g-mode peaks. The signal of the g-mode splittings has vanished in all cases. Figure 10. The autocorrelation function of round-trip travel-times at a 5 hr cadence. The autocorrelation for no start-time offset in the top panel is the same as the bottom panel of Fig. 9, followed by start-times offset by 2 hrs, 4 hrs and 5 hrs. The vertical solid lines indicate where Fossat et al. (2017) identified the original three g-mode peaks. The vertical dashed lines show the location of the g-mode splittings we identified for zero start time offset at 498 nHz, 1118 nHz, and 2237 nHz. The blue curve indicates the range within which we find the maxima to compute the significance of the peaks.
period equal to the cadence of the data segments used. The latter two points, on the other hand, are less worrying as they could in principle be attributed to properties of g-mode oscillations, even if the reason for the high sensitivity is not understood in detail. Overall, we conclude that the claimed detection of g modes must be treated with extreme caution until these issues are understood.
It would be valuable to repeat the analysis using other observational data sets from space (MDI, HMI) and from the ground (BiSON, GONG). We note that we have not investigated the further analysis by Fossat et al. (2017) and Fossat and Schmider (2018) where they identify a model of the gmode power spectrum in the observed power spectrum.