# Tests of Sunspot Number Sequences: 3. Effects of Regression Procedures on the Calibration of Historic Sunspot Data

- 736 Downloads
- 15 Citations

## Abstract

We use sunspot-group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups \([R_{\mathrm{B}}]\) above a variable cut-off threshold of observed total whole spot area (uncorrected for foreshortening) to simulate what a lower-acuity observer would have seen. The synthesised annual means of \(R_{\mathrm{B}}\) are then re-scaled to the full observed RGO group number \([R_{\mathrm{A}}]\) using a variety of regression techniques. It is found that a very high correlation between \(R_{\mathrm{A}}\) and \(R_{\mathrm{B}}\) (\(r_{\mathrm{AB}} > 0.98\)) does not prevent large errors in the intercalibration (for example sunspot-maximum values can be over 30 % too large even for such levels of \(r_{\mathrm{AB}}\)). In generating the backbone sunspot number \([R_{\mathrm{BB}}]\), Svalgaard and Schatten (*Solar Phys.*, 2016) force regression fits to pass through the scatter-plot origin, which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot-cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile–Quantile (“Q–Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least-squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot-group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar-cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar–terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.

### Keywords

Sunspot number Historic reconstructions Calibration Regression techniques## 1 Introduction

Articles 1 and 2 of this series (Lockwood *et al.*, 2016a, 2016b) provide evidence that the new “backbone” group sunspot number \([R_{\mathrm{BB}}]\) proposed by Svalgaard and Schatten (2016) overestimates sunspot numbers as late as Solar Cycle 17 and that this overestimation increases as one goes back in time. There is also some evidence that most of the overestimation grows in discrete steps, which could imply a systematic problem with the ordinary linear-regression techniques used by Svalgaard and Schatten to “daisy-chain” the calibration from modern values back to historic ones. This daisy-chaining is unavoidable in this context unless a method is used to calibrate historic (pre-photographic) data with modern data without relating both to data taken during the interim. (Note that one such a method, which avoids both regressions and daisy-chaining, has recently been developed by Usoskin *et al.* (2016).) As discussed in Articles 1 and 2, the regressions used are of particular concern because the daisy-chaining means that both random and systematic errors are amplified as one goes back in time.

As one reads the article by Svalgaard and Schatten (2016), one statement stands out and raises immediate concerns in this context: “Experience shows that the regression line almost always very nearly goes through the origin, so we force it to do so …” To understand the implications of this, consider two observers A and B, recording annual mean sunspot-group numbers \(R_{\mathrm{A}}\) and \(R_{\mathrm{B}}\), respectively. If observer B has lower visual acuity than A, then \(R_{\mathrm{B}} \leq R_{\mathrm{A}}\). This may be caused by B having a lower resolution and/or less well-focused telescope, or one that gives higher scattered-light levels. It may also be caused by the keenness of observer B’s eyesight and how conservative he/she was in making the subjective decisions to define spots and/or spot groups from what he/she saw. In addition, the local atmospheric conditions may also have hindered observer B (greater aerosol concentrations, more mists or thin cloud). Forcing the fits through the origin means that \(R_{\mathrm{A}} = 0\) when \(R_{\mathrm{B}} = 0\) and *vice versa*. When the higher-acuity observer A sees no spot groups, the lower-acuity observer B should not detect any either and so both \(R_{\mathrm{A}}\) and \(R_{\mathrm{B}}\) should indeed both be zero in this case. However, there will, in general, have been times when observer A could detect groups but observer B could not and so \(R_{\mathrm{A}} > 0\) when \(R_{\mathrm{B}} = 0\). Thus any linear-regression fit used to scale \(R_{\mathrm{B}}\) to \(R_{\mathrm{A}}\) should not, in general, pass through zero as Svalgaard and Schatten (2016) forced all of their fits to do. There is no advantage gained by forcing the fits through the origin (if anything fits are easier to make without this restriction) but, as discussed in this article, it introduces the potential for serious error.

*et al.*, 2013a, 2013b) to show that it can be a highly significant effect, especially when one considers that the effect will be compounded by successive intercalibrations in the daisy chain.

Other concerns are that the errors in the data do not meet the requirements set by the assumptions of ordinary least-squares (OLS) fitting algorithms, and this possibility should always be tested for using the fit residuals. Failure of these tests means an inappropriate fitting procedure has been used or the noise in the data is distorting the fit. In addition, OLS can be applied by minimising the perpendiculars to the best-fit line or by minimising the verticals to the fit line. It can be argued that this choice should depend on the relative magnitudes of the errors in the fitted parameters. Another possibility that we consider here is that the effect of reduced acuity of observer B may vary with the level of solar activity leading to non-linearity in the relationship between \(R_{\mathrm{A}}\) and \(R_{\mathrm{B}}\) (see Usoskin *et al.*2016 for evidence of this effect). We here also investigate the effects of using the linear ordinary least-squares fits used by Svalgaard and Schatten (2016) under such circumstances.

Figure 1b illustrates the effects of using a linear fit if observer B’s lower acuity has more effect at low sunspot numbers than at high ones, giving a non-linear (quadratic) relationship. In this case, a linear regression with non-zero intercept causes inflation of both the highest and the lowest values but lowers those around the average. Figure 1c shows the effects of both using a linear fit and making it pass through the origin, as employed by Svalgaard and Schatten (2016): in this case the effects are as in Figure 1a but the non-linearity makes them more pronounced.

Non-linearity between the two variables is just one of the main pitfalls in OLS regression. These can arise because the data violate one of the four basic assumptions that are inherent in the technique and that justify the use of linear regression for purposes of inference or prediction. The other pitfalls are a lack of statistical independence of the errors in the data; heteroscedasticity in the errors (they vary systematically with the fit parameters); and cases for which the errors are not normally distributed (about zero). In particular, one or more large-error datapoints can exert undue “leverage” on the regression fit. If one or more of these assumptions is violated (*i.e.* if there is a nonlinear relationship between the variables or if their errors exhibit correlation, heteroscedasticity, or a non-Gaussian distribution) then the forecasts, confidence intervals, and scientific insights yielded by a regression model may be seriously biased or misleading. If the fit is correct, then the fit residuals will reflect the errors in the data and so we can apply tests to the residuals to check that none of the assumptions has been invalidated. Non-linearity is often evident as a systematic pattern when one plots the fit residuals against either of the regressed variables. For regression of time-series data, lack of independence of the errors is seen as high persistence of the fit residual time series. Lack of homoscedasticity is apparent from scatter plots because the scatter increases systematically with the variables. A normal distribution of fit residuals can be readily tested for using a Quantile–Quantile (“Q–Q”) plot (*e.g.* Wilk and Gnanadesikan 1968). This is a graphical technique for determining if two datasets come from populations with a common distribution; hence by making one of the datasets normally distributed we can test the other to see if it also has a normal distribution. Erroneous outliers and lack of linearity can also be identified from such Q–Q plots. If outliers are at large or small values they can have a very large influence on a linear regression fit – such points can be identified because they have a large Cook’s-D (leverage) factor (Cook 1977) and should be removed and the data re-fitted. There is no one standard approach to regression that can be applied and implicitly trusted. There are many options that must be investigated, and the above tests must be applied to ensure that the best option is used and that the results are statistically robust. In addition to OLS, we here employ non-linear regression (using second-order and third-order polynomials), Median Least Squares (MLS) and Bayesian Least Squares (BLS). The MLS and BLS procedures were discussed by Lockwood *et al.* (2006).

The results presented in this article show that linear regression fits in the context of intercalibrating sunspot-group numbers can violate the inherent assumptions and lead to some very large errors, even though the correlation coefficients are high. In Section 2, we present one example in which intercalibration over two full sunspot cycles (1953 – 1975) can produce an inflation of sunspot peak values of over 30 % even when the correlation between \(R_{\mathrm{A}}\) and \(R_{\mathrm{B}}\) exceeds 0.98. This is a significant error. To put it into some context, Svalgaard (2011) pointed out a probable discontinuity in sunspot numbers around 1945 that has been termed the “Waldmeier discontinuity”. Svalgaard quantified it as a 20 % change but Lockwood, Owens, and Barnard (2014) and Lockwood *et al.* (2016a) find it is \(11.9\pm0.6~\mbox{\%}\) and Lockwood, Owens, and Barnard (2016) find it to be 10 %. (The latter estimate is lower because it is the only one not to assume proportionality.) Hence 30 % is a very significant number for one intercalibration, let alone when it is combined with the effect of others in a series of intercalibrations. In Section 3 we present a second example interval (1923 – 1945, when solar activity was lower) to see if it reveals the same effects.

Lastly, we note that we here employ annual means to be consistent with Svalgaard and Schatten (2016). We do not test for any effects of this in the present article but it does cause additional concerns when the data are sparse. This is because observers A and B may have been taking measurements on different days and, because of factors such as regular annual variations in cloud obscuration, their data could even mainly come from different phases of the year. This may therefore not be a random error, which would again invalidate the assumptions of ordinary least-squares regression. Usoskin *et al.* (2016) show this effect can be highly significant for sparse data and Willis, Wild, and Warburton (2016) show it even needs to be considered when using the earliest (before 1885) data from the Royal Observatory, Greenwich.

In the present article, we make use of the photo-heliographic measurements from the Royal Observatory, Greenwich and the Greenwich Royal Observatory (here collectively referred to as the “RGO” data). We employ the version of the RGO data made available by the Space Physics website (solarscience.msfc.nasa.gov/greenwhch.shtml) of the Marshall Space Flight Center (MSFC) which has been compiled, maintained and corrected by D. Hathaway. These data were downloaded in June 2015. As noted by Willis *et al.* (2013b), there are some small differences between these MSFC data and versions of the RGO data stored elsewhere (notably those in the National Geophysical Data Center, NGDC, Boulder). We here use only data for 1923 – 1976 for which these differences are minimal. The use of this interval also avoids all times when the calibration of the RGO data has been questioned (Cliver and Ling 2016; Willis, Wild, and Warburton 2016).

## 2 Study of 1953 – 1975

### 2.1 Distribution of Sunspot Group Areas

*et al.*2013a) in the interval 1953 – 1975. \(A\) is uncorrected for foreshortening and so is the area that the observer actually sees on the solar disc. The right-hand plot is a detail of the left-hand plot and shows the peak of the distribution. The large number of small-area groups mainly arises from near the solar limb where the foreshortening effect is large. These areas are those recorded by the RGO observers, who are here collectively termed “Observer A”. To simulate what a lower-acuity Observer “B” would have seen, we here assume that he/she would only detect groups for which the observed area \([A]\) exceeded a threshold \([A_{\mathrm{th}}]\). The number of groups seen on each day by the RGO observers and by the virtual observer B [\(R_{\mathrm{A}}\) and \(R_{\mathrm{B}}\) respectively] were counted. Annual means of both \(R_{\mathrm{A}}\) and \(R_{\mathrm{B}}\) were then evaluated to be compatible with the procedure used to generate the backbone data series \([R_{\mathrm{BB}}]\). This was repeated for a wide range of \(A_{\mathrm{th}}\) thresholds.

### 2.2 Variations of \(R_{\mathrm{A}}\) and \(R_{\mathrm{B}}\) and Fits of \(R_{\mathrm{B}}\) to \(R_{\mathrm{A}}\)

Fit procedures employed.

Fit | Line colour in figures | Fit type | Assumed variation | Parameter minimised | Treatment of intercept |
---|---|---|---|---|---|

1 | Blue | OLS | linear | r.m.s. of perpendiculars | Not forced through origin |

2 | Green | OLS | linear | r.m.s. of verticals | Not forced through origin |

3 | Red | OLS | linear | r.m.s. of perpendiculars | Forced through origin |

4 | Orange | OLS | linear | r.m.s. of verticals | Forced through origin |

5 | Brown | Polynomial | 2nd-order polynomial | r.m.s. perpendiculars | Not forced through origin |

6 | – | MLS | linear | r.m.s. perpendiculars | Not forced through origin |

7 | – | BLS | linear | r.m.s. perpendiculars | Not forced through origin |

8 | Cyan | Polynomial | 3rd-order polynomial | r.m.s. perpendiculars | Not forced through origin |

Fits using median least squares (MLS, fit 6) and Bayesian least squares (BLS, fit 6) were also made but were no better than the comparable OLS fit (fit 1). We also attempted successive removal of the largest outliers to try to make the fits converge to a stable result, but again no improvement was made for all these linear fits. This left just one assumption to test, namely that the variation of \(R_{\mathrm{B}}\) with \(R_{\mathrm{A}}\) is linear. A least-squares fit of a second-order polynomial fit was carried out (fit 5): this is shown by the brown lines in Figures 3 and 4. This appears to remove the problem of the exaggerated peak values. Note that for this fit one outlier data point has been removed (see below). In addition, a third-order polynomial fit was carried out (fit 8): the Q–Q plot for this fit is shown by the cyan points in Figure 5f and it can be seen that this fit generates some non-Gaussian tails to the distribution.

In Figure 5e, the open triangles show the results for the second-order polynomial fit to all datapoints and the point in the upper tail of the distribution is seen to be non-Gaussian. This arises from the outlier data point that can be seen in Figure 3 at \(R_{\mathrm{A}} \approx 9.3\), \(R_{\mathrm{B}} \approx 3.2\). The solid circles are for the fit after this outlier has been removed and the remaining points can now be seen to give an almost perfect Gaussian distribution of residuals, and so the fit is robust. The brown lines in Figures 3 and 4 show the results of this fit with the outlier removed. The largest outlier was also removed or all other fits but fit 5 was the only one for which the Q–Q plot was significantly improved. Note that for the test done here, the fits are never used outside the range of values that were used to make the fit. However, this would not necessarily be true of an intercalibration between two daisy-chained data segments and very large errors could occur if there is non-linearity and one is extrapolating to values outside the range used for calibration fitting.

### 2.3 Effect of the Threshold \(A_{\mathrm{th}}\)

## 3 Study of 1923 – 1945

## 4 Discussion and Conclusions

Our tests of regression procedures, comparing the original RGO sunspot-group area data with a deliberately degraded version of the same data, show that there is no one definitive method that ensures the regressions derived are robust and accurate. Certainly correlation coefficient is not a valuable indicator and very high correlations are necessary but very far from sufficient.

The one definitive statement that we can make is that forcing fits through the origin is a major mistake. It causes solar-cycle amplitudes to be inflated so that peak values in the lower-acuity data are too high and both minimum and mean values are too low. This is the method used by Svalgaard and Schatten (2016), and our findings show that it will have contributed to a false upward drift in their backbone group number reconstruction \([R_{\mathrm{BB}}]\) values as one goes back in time. At the time of writing we do not have the original data to check the effect on both the regressions used to intercalibrate backbones and any regressions used to combine data into backbones. Both will be subject to this effect. Hence we cannot tell whether or not this explains all of the differences between, for example, the long term changes in \(R_{\mathrm{BB}}\) and the terrestrial data (ionospheric, geomagnetic, and auroral) discussed in Articles 1 and 2. However, it will have contributed to these differences. Note that all of the above also applies to any technique based on the ratio \(R_{\mathrm{A}} /R_{\mathrm{B}}\) as that also forces the fit through the origin.

Lastly it is not clear which procedure should be used to daisy-chain the calibrations. Ordinary least-squares fits work well only when the Q–Q plots show a good normal distribution of residuals. Even then, minimising the verticals gives the best answer for the mean values but minimising the perpendiculars gives the best answer for the peak values. The failures in the Q–Q plots appear to be mainly because the dependence is not linear and a non-linear fit then works well. We used a second-order polynomial and the fitted \(R_{\mathrm{B}}^{2}\) term is found to be relatively small (meaning it is a near-linear fit) and hence this seems to have been adequate, at least for the cases we studied. However, we note that this should not be used for values that are outside the range seen during the intercalibration interval because the dependence of the extrapolation on the polynomial used is then extremely large.

## Notes

### Acknowledgements

The authors are grateful to David Hathaway and the staff of the Solar Physics Group at NASA’s Marshall Space Flight Center for maintaining the on-line database of RGO data used here. The work of M. Lockwood, M.J. Owens, and L. Barnard at Reading was funded by STFC consolidated grant number ST/M000885/1 and that of I.G. Usoskin was done under the framework of the ReSoLVE Center of Excellence (Academy of Finland, project 272157).

### References

- Cliver, E., Ling, A.G.: 2016, The discontinuity circa 1885 in the group sunspot number.
*Solar Phys.*DOI. Google Scholar - Cook, R.D.: 1977, Detection of influential observations in linear regression.
*Technometrics***19**, 15. DOI. MathSciNetMATHGoogle Scholar - Lockwood, M., Rouillard, A., Finch, I., Stamper, R.: 2006, Comment on “The IDV index: its derivation and use in inferring long-term variations of the interplanetary magnetic field strength” by Leif Svalgaard and Edward W. Cliver.
*J. Geophys. Res.***111**, A09109. DOI. ADSGoogle Scholar - Lockwood, M., Owens, M.J., Barnard, L.: 2014, Centennial variations in sunspot number, open solar flux, and streamer belt width: 1. Correction of the sunspot number record since 1874.
*J. Geophys. Res.***119**, 5193. DOI. CrossRefGoogle Scholar - Lockwood, M., Scott, C.J., Owens, M.J., Barnard, L., Willis, D.M.: 2016a, Tests of sunspot number sequences. 1. Using ionosonde data.
*Solar Phys.*, accepted. Google Scholar - Lockwood, M., Scott, C.J., Owens, M.J., Barnard, L., Nevanlinna, H.: 2016b, Tests of sunspot number sequences. 2. Using geomagnetic and auroral data.
*Solar Phys.*, submitted. Google Scholar - Lockwood, M., Owens, M.J., Barnard, L.A.: 2016, Tests of sunspot number sequences: 4. Discontinuities around 1945 in various sunspot number and sunspot group number reconstructions.
*Solar Phys.*, submitted. Google Scholar - Svalgaard, L.: 2011, How well do we know the sunspot number?
*Proc. Int. Astron. Union***7**, 27. DOI. CrossRefGoogle Scholar - Svalgaard, L., Schatten, K.H.: 2016, Reconstruction of the sunspot group number: the backbone method.
*Solar Phys.*DOI Google Scholar - Usoskin, I.G., Kovaltsov, G.A., Lockwood, M., Mursula, K., Owens, M.J., Solanki, S.K.: 2016, A new calibrated sunspot group series since 1749: statistics of active day fractions.
*Solar Phys.*, in press. Google Scholar - Wilk, M.B., Gnanadesikan, R.: 1968, Probability plotting methods for the analysis of data.
*Biometrika (Biometrika Trust)***55**, 1. DOI. Google Scholar - Willis, D.M., Coffey, H.E., Henwood, R., Erwin, E.H., Hoyt, D.V., Wild, M.N., Denig, W.F.: 2013a, The Greenwich photo-heliographic results (1874 – 1976): summary of the observations, applications, datasets, definitions and errors.
*Solar Phys.***288**, 117. DOI. ADSCrossRefGoogle Scholar - Willis, D.M., Henwood, R., Wild, M.N., Coffey, H.E., Denig, W.F., Erwin, E.H., Hoyt, D.V.: 2013b, The Greenwich photo-heliographic results (1874 – 1976): procedures for checking and correcting the sunspot digital datasets.
*Solar Phys.***288**, 141. DOI. ADSCrossRefGoogle Scholar - Willis, D.M., Wild, M.N., Warburton, J.S.: 2016, Re-examination of the daily number of sunspot groups for the Royal Observatory, Greenwich (1874 – 1885).
*Solar Phys.*, accpeted. Google Scholar

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.