Tests of Sunspot Number Sequences: 2. Using Geomagnetic and Auroral Data
Abstract
We compare four sunspotnumber data sequences against geomagnetic and terrestrial auroral observations. The comparisons are made for the original Solar Influences Data Center (SIDC) composite of Wolf/Zürich/International sunspot number [\(R_{\text{ISNv}1}\)], the group sunspot number [\(R_{\mathrm{G}}\)] by Hoyt and Schatten (Solar Phys. 181, 491, 1998), the new “backbone” group sunspot number [\(R_{\mathrm{BB}}\)] by Svalgaard and Schatten (Solar Phys., DOI, 2016), and the “corrected” sunspot number [\(R_{\mathrm{C}}\)] by Lockwood, Owens, and Barnard (J. Geophys. Res. 119, 5172, 2014a). Each sunspot number is fitted with terrestrial observations, or parameters derived from terrestrial observations to be linearly proportional to sunspot number, over a 30year calibration interval of 1982 – 2012. The fits are then used to compute test sequences, which extend further back in time and which are compared to \(R_{\text{ISNv}1}\), \(R_{\mathrm{G}}\), \(R_{\text{BB}}\), and \(R_{\mathrm{C}}\). To study the longterm trends, comparisons are made using averages over whole solar cycles (minimumtominimum). The test variations are generated in four ways: i) using the IDV(1d) and IDV geomagnetic indices (for 1845 – 2013) fitted over the calibration interval using the various sunspot numbers and the phase of the solar cycle; ii) from the open solar flux (OSF) generated for 1845 – 2013 from four pairings of geomagnetic indices by Lockwood et al. (Ann. Geophys. 32, 383, 2014a) and analysed using the OSF continuity model of Solanki, Schüssler, and Fligge (Nature, 408, 445, 2000), which employs a constant fractional OSF loss rate; iii) the same OSF data analysed using the OSF continuity model of Owens and Lockwood (J. Geophys. Res. 117, A04102, 2012), in which the fractional loss rate varies with the tilt of the heliospheric current sheet and hence with the phase of the solar cycle; iv) the occurrence frequency of lowlatitude aurora for 1780 – 1980 from the survey of Legrand and Simon (Ann. Geophys. 5, 161, 1987). For all cases, \(R_{\mathrm{BB}}\) exceeds the test terrestrial series by an amount that increases as one goes back in time.
Keywords
Sunspots, Statistics Magnetosphere, Geomagnetic Disturbances1 Introduction
The article by Svalgaard and Schatten (2016) contains a new sunspotgroup number composite. The method used to compile this data series involves combining data from available observers into segments that the authors call “backbones”, which are then joined together by linear regressions. We here call this the “backbone” sunspotgroup number [\(R_{\mathrm{BB}}\)] to distinguish it from other estimates of the sunspotgroup number. What is different about the \(R_{\mathrm{BB}}\) composite is that instead of the recent grand maximum being the first since the Maunder minimum (circa 1650 – 1710), as it is in other sunspot data series, it is the third; there being one maximum of approximately the same magnitude in each century since the Maunder minimum. In itself, this is not such a fundamental change as it arises only from early values of \(R_{\mathrm{BB}}\) being somewhat higher than for the previous sunspot number or sunspotgroup number records. However, the new series does suggest a flipping between two states rather than a more sustained rise from the Maunder minimum to the recent grand maximum, with implications for solardynamo theory and for reconstructed parameters, such as total and UV solar irradiances. We note the \(R_{\mathrm{BB}}\) data composite is quite similar to the second version of the composite of Wolf/Zürich/International sunspot number [\(R_{\text{ISNv2}}\)], recently generated by the Solar Influences Data Centre (SIDC) of the Solar Physics Research department of the Royal Observatory of Belgium. Specifically, annual \(R_{\text{ISNv2}}\) values also show three longerterm maxima since the Maunder minimum that are more equal in magnitude than for earlier series such as \(R_{\mathrm{G}}\), \(R_{\mathrm{C}}\), and R_{ISNv1}, but not as equal as they are for \(R_{\mathrm{BB}}\) and, unlike \(R_{\mathrm{BB}}\), the most recent of those three maxima remains the largest in \(R_{\text{ISNv2}}\). \(R_{\mathrm{BB}}\) is extreme in that it is the only sequence for which the highest solarcycle means are not for Cycle 19 (uniquely, being larger for Solar Cycles 0 and 3 than for Cycle 19), whereas the highest annual value is during Cycle 19 for all data series, including \(R_{\mathrm{BB}}\).
The standard approach to calibrating historic sunspot data is “daisychaining”, whereby the calibration is passed from one data series (be it a backbone or the data from an individual observer) to an adjacent one, usually using linear regression over a period of overlap between the two. Svalgaard and Schatten (2016) claim that daisychaining was not used in compiling \(R_{\mathrm{BB}}\), positing that the backbone method is an alternative method to daisychaining rather than a different form of it. However, avoiding daisychaining requires deployment of a method to calibrate early sunspot data, relative to modern data, without comparing both to data taken in the interim: because no such method is presented in the description of the compilation of \(R_{\mathrm{BB}}\), it is evident that daisychaining was employed. Another new sunspotgroup number data series has recently been published by Usoskin et al. (2016): these authors describe and employ a method that genuinely does avoid daisychaining because all data are calibrated by direct comparison with a single reference data set, independent of the calibration of any other data.
As discussed in Article 3 (Lockwood et al.2016b), there are major concerns about the use of daisychaining. Firstly, rigorous testing of all regressions used is essential, and Lockwood et al. (2016b) show that the assumptions about linearity and proportionality of data series made by Svalgaard and Schatten (2016) when compiling \(R_{\mathrm{BB}}\) cause both random and systematic errors. The use of daisychaining means that these errors accumulate over the duration of the data series. Another problem has been addressed by Usoskin et al. (2016) and Willis, Wild, and Warburton (2016), namely that the daytoday variability of sunspotgroup data make it vital only to compare data from two observers that were taken on the same day. Hence the use of annual means by Svalgaard and Schatten (2016) is another potential source of error.
Other sunspotdata composites are also compiled using daisychaining, such as the original sunspotgroup number [\(R_{\mathrm{G}}\)] generated by Hoyt, Schatten, and NesmeRibes (1994) and Hoyt and Schatten (1998); versions 1 and 2 of the composite of the Wolf/Zürich/International sunspot number [\(R_{\text{ISNv}1}\) and \(R_{\text{ISNv2}}\)], and the corrected \(R_{\text{ISNv}1}\) series [\(R_{\mathrm{C}}\)], proposed by Lockwood, Owens, and Barnard (2014a, 2014b). Some of these series also employ linear regressions of annual data. Hence these data series, like \(R_{\mathrm{BB}}\), have not been compiled with the optimum and most rigorous procedures and so also require critical evaluation. We note that, as for \(R_{\mathrm{BB}}\), there are specific concerns about \(R_{\mathrm{G}}\) and \(R_{\mathrm{C}}\) that have been expressed in the literature and which, as for \(R_{\mathrm{BB}}\), arise for the use of daisychaining. For example, Cliver and Ling (2016) find an error in the earliest RGO data, and the daisychaining construction of \(R_{\mathrm{G}}\) means that all values of \(R_{\mathrm{G}}\) before 1874 would be too low. Similarly, the intercalibration of the datasets of Schwabe and Wolf that was used in the construction \(R_{\mathrm{C}}\) has been questioned (e.g., Clette and Lefèvre 2016), and with daisychaining this would mean that all values of \(R_{\mathrm{C}}\) before 1850 would be too low.
These problems give the potential for calibration drifts and systematic errors, which means that uncertainties (relative to modern values) necessarily increase in magnitude as one goes back in time. By comparing with early ionospheric data, Article 1 (Lockwood et al.2016a) finds evidence that such calibration drift is present in \(R_{\mathrm{BB}}\) as late as Solar Cycle 17, raising concerns that there are even larger drifts at earlier times.
There are two major concerns in relation to the different behaviour of \(R_{\mathrm{BB}}\) evident in Figure 1. The first is the stability of the calibration of each backbone over the interval it covers, and the second is the regression fits used to daisychain the backbones. Even for very highly correlated data segments, the bestfit regression can depend on the regression procedure used (see Article 3; Lockwood et al.2016b), and it is vital to ensure that the most appropriate procedure is employed (Ryan 2008). Options include median least squares, Bayesian least squares, minimumdistance estimation, nonlinear fits, and the ordinary least squares (OLS) that was used to generate \(R_{\mathrm{BB}}\). Even the OLS fits can be carried out in different ways in that they can either minimise the sum of the squares of the verticals (appropriate when the \(x\)parameter is fixed or of small uncertainty such that the dominant uncertainty is in the \(y\)parameter) or they can minimise the sum of the squares of the perpendiculars (usually more appropriate when there are uncertainties of comparable magnitude in both \(x\) and \(y\)). It is very important to test that fits are robust and the data do not violate the assumptions of OLS leastsquares fitting procedure: Q–Q plots can be used to check the residuals are normally distributed, the CookD leverage parameter can test for data points that are having undue influence on the overall fit, and the fit residuals can be checked to ensure they are “homoscedastic” (i.e. that the dependent variable exhibits similar variance across the range of values for the other variable). All of these can invalidate a fit because the data are violating one or more of the assumptions of the regression technique used (Lockwood et al.2006). Any daisychaining used to generate a longterm sunspot number sequence is of particular concern because if the random fractional uncertainty of the ith intercalibration is \(\delta_{\mathrm{i}}\), then the total fractional uncertainty will be \((\Sigma^{\mathrm{n}}_{\mathrm{i}=1} \delta_{\mathrm{i}}^{2})^{1/2}\), where \(n\) is the number of intercalibrations (provided the uncertainties \(\delta_{i}\), are uncorrelated). Even more significantly, systematic fractional errors at each intercalibration \(\varepsilon_{\mathrm{i}}\) will lead to a total systematic fractional error of \(\Pi^{\mathrm{n}}_{\mathrm{i}=1} \varepsilon_{\mathrm{i}}\). Both will inevitably grow larger as one goes back in time. Hence considerable uncertainties and systemic deviations are both possible for the earliest data compared to the modern data for any sunspotnumber sequence compiled by daisychaining. The ability for these uncertainties to become amplified as one goes back in time makes it vital to check that the regressions are not influenced by an inappropriate fit procedure. None of the compilers of daisychained data series have investigated these potential effects, for example by using a variety of regression procedures, and instead implicitly trust the one procedure that they adopt. In the absence of tests against other procedures, comparison with other solarterrestrial parameters becomes important as a check that the daisychained calibrations have not led to a false drift in the sunspot calibration.
2 Analysis
In this article, we compare the longterm drifts inherent in sunspotdata series with indices derived from terrestrial measurements that have been devised to vary in a manner that is as close to linear as possible with sunspot numbers over a 30year “training” interval of 1982 – 2012. Linearity between the test metric and sunspot number is important because nonlinearity would generate a difference in their longterm trends, especially for periods such as the Dalton and Maunder minima when values are outside the range seen during the training interval. Because of the concerns about the compounding effect of uncertainties in daisychained regressions and the potential differences between the results of different regression techniques, we here try to avoid using regression in making this comparison. Where regression techniques have to be used, they are used only in the training interval and the coefficients derived are then applied uniformly to the whole interval (1845 – 2013), such that 1845 – 1982 forms a fully independent test period. A probability \(p\)value for every combination of fitted values is quantified and used in uncertainty analysis.
In this article, we apply the same analysis as in Figure 2 to indices derived from terrestrial measurements that have been designed, or found, to vary monotonically, and as closely as possible to linearly, with sunspot numbers. This enables us to compare likewithlike when we assess the longterm variations. We used the IDV (Svalgaard and Cliver 2005) and IDV(1d) (Lockwood et al.2013a, 2013b; 2014a, 2014b) geomagnetic indices. One application of these geomagnetic indices exploited here is an empirical, statistical property (one that varies with the phase of the solar cycle, therefore allowance must be made for that factor) (Lockwood, Owens, and Barnard 2014b). More satisfactory are comparisons that employ the open solar flux (OSF) reconstruction of Lockwood et al. (2014b) (derived from the combination of four different pairings of geomagnetic indices) using two different theoretical formulations of the physical OSF continuity equation to relate OSF to sunspot numbers. A recent graphic demonstration of why these reconstructions of sunspot numbers from geomagnetic activity are valid and valuable has been presented by Owens et al. (2016). These authors showed that both the statistical and theoretical relationships between the geomagneticactivity indices and sunspot numbers mean that the sunspot numbers and the geomagneticactivity indices, including both IDV and IDV(1d), give reconstructions of the nearEarth interplanetary magnetic field that are almost identical. Lastly, we look at the annual occurrence of lowlatitude aurorae [\(N_{\mathrm{A}}\)] compiled by Legrand and Simon (1987). In this case we have no quantitative theoretical relationship to exploit, although we do have a good qualitative understanding (Lockwood and Barnard 2015), and simply compare the variations in the normalised averages of \(N_{\mathrm{A}}\) and sunspot numbers.
2.1 Tests Using the IDV(1d) and IDV Geomagnetic Indices
The IDV and IDV(1d) indices are both based on Bartels’ \(u\)index (Bartels 1932), which employs the difference between successive daily values of the horizontal or vertical component of the geomagnetic field (whichever yields the higher value). There are differences in the construction of these two indices. IDV employs the hourly means (or spot values) that are closest to local midnight for the station in question and uses as many stations as are available (the number of which therefore declines as one goes back in time) (Svalgaard and Cliver 2005). The IDV(1d) index uses the \(u\)values as defined by Bartels (i.e. the differences in daily means) from just one station at any one time. Only three specified and intercalibrated stations are used with allowance for the effect of the secular drift in their geomagnetic latitude on the \(u\)values (Lockwood 2013; Lockwood, Owens, and Barnard 2014a, 2014b). The stations were selected to make the IDV(1d) composite as long and as homogeneous as possible, but with the minimum number of intercalibrations, and each gave the smallest rootmeansquare deviation from the data from all other available subauroral stations. To cover the full time interval, three different stations are required, but the calibration of these is not done by daisychained regressions. Instead the values are all normalised to the Eskdalemuir station in the year 2000 using the results of a survey of the dependence of \(u\) on geomagnetic latitude along with paleomagnetic and empirical model predictions of the variation of each station’s geomagnetic latitude (Lockwood 2013). Eskdalemuir was chosen because it provided the most stable longterm data (giving the lowest fit residuals with the data from the other 49 available subauroral stations) and the year 2000 as a convenient and memorable date in modern times. Regressions are used to then check the intercalibrations, but were not used to derive them. Because it is homogeneous in its construction and does not depend on daisychained calibrations, we here show results for IDV(1d), but results were very similar indeed if IDV was used.
Correlation coefficients [\(r\)] between sunspot numbers inferred from geomagnetic activity and the various sunspotnumber sequences [\(R_{\text{ISNv}1}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), \(R_{\mathrm{BB}}\), and \(R_{\text{ISNv2}}\)] over the training period of 1982 – 2012. (a) \(R_{\mathit{IDV}(1\mathrm{d})}\), (b) \(R_{\text{OSF}1}\), and (c) \(R_{\text{OSF}2}\) are generated (a) from the IDV(1d) index, (b) using the Solanki, Schüssler, and Fligge (2000) OSF model with the geomagnetic OSF reconstruction of Lockwood, Owens, and Barnard (2014b), and (c) by the Owens and Lockwood (2012) OSF model using the same OSF reconstruction. Training of the algorithms employs the same sunspotnumber sequence with which the \(R_{\mathit{IDV}(1\mathrm{d})}\), \(R_{\text{OSF}1}\), and \(R_{\text{OSF}2}\) sequences are then correlated. The significance level of each correlation evaluated against the AR1 red noise model [\(S\)], is given in brackets in each case. Note that \(R_{\mathrm{G}}\) is only available up to 1995 and the training period therefore is 1982 – 1995 (resulting in lower \(S\)values) and that \(R_{\text{ISNv}1}\) and \(R_{\mathrm{C}}\) are identical over the training interval.
\(R_{\mathit{IDV}(1\mathrm{d})}\)  \(R_{\text{OSF}1}\)  \(R_{\text{OSF}2}\)  

\(R_{\text{ISNv}1}\) & \(R_{\mathrm{C}}\)  0.976 (93.5%)  0.955 (97.4%)  0.966 (92.8%) 
\(R_{\mathrm{G}}\)  0.982 (87.0%)  0.975 (92.5%)  0.976 (99.6%) 
\(R_{\mathrm{BB}}\)  0.979 (97.6%)  0.907 (97.4%)  0.952 (91.1%) 
\(R_{\text{ISNv2}}\)  0.979 (96.1%)  0.937 (97.6%)  0.977 (99.4%) 
We need to present a very important caveat about this test. It is based on a purely empirical relationship between IDV(1d), sunspot number, and the phase of the solar cycle. The relationship appears to work well for the interval for which we have IDV(1d) data (1845 – present) and over which we here apply the test. However, because it is a purely empirical relationship, this does not mean that it would necessarily work well for other intervals (and the Maunder minimum in particular). The same is equally true for any application of the purely empirical relationships between the IMF \(B\) and \(R^{{n}}\) and the IDV index and \(R^{{n}}\). We note that the test presented here was also carried out using the IDV index (not shown), and the results were the same on all important points.
2.2 Test Using OSF Derived from Geomagnetic Indices and a Continuity Model
The black line in Figure 5b gives the optimum value of normalised cycle averages of \(R_{\text{OSF}1}\) and the grey area around it the \(\pm1\sigma\) uncertainty band, in the same format and derived in the same way as for \(R_{\mathit{IDV}(1\mathrm{d})}\) in the previous section.
2.3 Test Using OSF Derived from Geomagnetic Indices and a Second Continuity Model
2.4 Tests using Occurrence Frequency of LowLatitude Aurora
Correlation coefficients [\(r\)] between lowlatitude auroral activity, quantified by the number of auroral nights per year at geomagnetic latitudes below 55° [\(N_{\mathrm{A}}\)] and the various sunspotnumber sequences [\(R_{\text{ISNv}1}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), \(R_{\mathrm{BB}}\), and \(R_{\text{ISNv2}}\)] over the whole interval of the auroral data (1770 – 1980). The significance level of each correlation evaluated against the AR1 noise model [\(S\)] is given in brackets. The columns are for different data subsets determined by the phase of the solar cycle [\(\Phi\)]: all the data [\(0 \leq \Phi < 2\pi\)], the half of the cycle around solar maximum [\(\pi/2 \leq \Phi < 3\pi/2\)], and the half of the cycle around solar minimum [\(\Phi < \pi/2\) or \(\Phi \geq 3\pi/2\)]. (a) is for annual means, (b) is for solarcycle averages. Note that there are only 18 data points for the cycle means (panel b), which is too few to compute meaningful significance levels.
All cycle 0 ≤ Φ<2π  Solar maximum π/2 ≤ Φ<3π/2  Solar minimum Φ<π/2 & Φ ≥ 3π/2  

(a) Annual averages  
\(R_{\mathrm{C}}\)  0.672 (80.6%)  0.666 (74.5%)  0.718 (79.6%) 
\(R_{\mathrm{G}}\)  0.663 (64.7%)  0.618 (73.4%)  0.665 (60.3%) 
\(R_{\mathrm{BB}}\)  0.645 (93.8%)  0.568 (90.7%)  0.685 (36.2%) 
\(R_{\text{ISNv}1}\)  0.678 (86.9%)  0.677 (83.1%)  0.718 (29.7%) 
\(R_{\text{ISNv2}}\)  0.661 (90.2%)  0.593 (84.5%)  0.708 (26.0%) 
(b) Solarcycle averages  
\(R_{\mathrm{C}}\)  0.955  0.906  0.916 
\(R_{\mathrm{G}}\)  0.906  0.881  0.918 
\(R_{\mathrm{BB}}\)  0.882  0.797  0.823 
\(R_{\text{ISNv}1}\)  0.956  0.860  0.924 
\(R_{\text{ISNv2}}\)  0.919  0.829  0.864 
3 Discussion
Figure 5 shows that \(R_{\mathrm{BB}}\) becomes increasingly larger than the other sunspotnumber estimates as one goes back in time. None of the series derived here from geomagnetic or auroral activity [\(R_{\mathit{IDV}(1\mathrm{d})}\), \(R_{\text{OSF}1}\), \(R_{\text{OSF}2}\), and \(N_{\mathrm{A}}\)] reproduce this behaviour. In each case, extrapolating back in time from the algorithm training period (1982 – 2012) gives a timeseries that lies closest to the variations for \(R_{\mathrm{C}}\) and \(R_{\text{ISNv}1}\). In each case, \(R_{\mathrm{BB}}\) lies above the extrapolation in almost all years by an amount that exceeds the two\(\sigma\) uncertainty (the grey bands). This trend is seen for all series back to the start of the geomagneticactivity data in 1845 and is consistent with the findings for Cycle 17 in Article 1 (Lockwood et al.2016a).
The scatter plots for the training interval indicate that the best proxy sunspot number, in terms of the correlation coefficient, is \(R_{\mathit{IDV}(1\mathrm{d})}\). However, this is a purely empirical relationship. It is useful to compare with the results from the proxies \(R_{\text{OSF}1}\) and \(R_{\text{OSF}2}\), which are based on the physical continuity equation for OSF. \(R_{\text{OSF}1}\) and \(R_{\text{OSF}2}\), like \(R_{\mathit{IDV}(1\mathrm{d})}\), both depend on empirical fit parameters, but the use of the continuity equations means that the fits are more constrained than is the case for \(R_{\mathit{IDV}(1\mathrm{d})}\). In addition, OSF is more satisfactory because it is a global solar parameter, like the sunspot number, whereas IDV(1d), and hence \(R_{\mathit{IDV}(1\mathrm{d})}\), are local parameters related to the nearEarth heliosphere.
In addition, whereas using \(R_{\mathit{IDV}(1\mathrm{d})}\) means that we have to assume that the IDV(1d) geomagnetic index depends only on the simultaneous sunspot number, \(R_{\text{OSF}1}\) and \(R_{\text{OSF}2}\) both allow for the effect of persistence in the data series (see Lockwood et al.2011; Lockwood 2013), whereby the current value also depends upon recent history, to a degree that is defined by the bestfit parameters. For the training period the correlation of all sunspot numbers with \(R_{\text{OSF}1}\) is consistently slightly lower than with \(R_{\text{OSF}2}\) (Table 1) and \(R_{\text{OSF}2}\) reveals lower scatter and heteroscedasticity (shown in Figure 6b for the comparison of \(R_{\text{OSF}2,\text{BB}}\) with \(R_{\mathrm{BB}}\), but this is also true for all other series tested). Hence \(R_{\text{OSF}2}\) provides the most satisfactory test, which is shown in Figure 5c. We note that the training procedures for \(R_{\text{OSF}2}\), \(R_{\text{OSF}1}\), and \(R_{\mathit{IDV}(1\mathrm{d})}\) all employed four sunspot number series [\(R_{\mathrm{BB}}\), \(R_{\mathrm{C}}\), \(R_{\mathrm{G}}\), and \(R_{\text{ISNv2}}\)] that give almost identical variations. All are here given equal statistical weight.
The auroral data show that the same tendency extends back to 1780, which means that it covers the Dalton minimum (around Solar Cycle 6) and before. Dividing these data by solar cycle phase reveals an interesting feature of the data (Figure 8): for both the solarmaximum and solarminimum data the longterm variation in \(N_{\mathrm{A}}\) is closer to those in \(R_{\mathrm{C}}\) and \(R_{\text{ISNv}1}\), while \(R_{\mathrm{BB}}\) is consistently larger. It is noticeable that the variations for sunspotminimum and sunspot maximum have similar forms for \(R_{\mathrm{C}}\), \(R_{\text{ISNv}1}\), \(R_{\mathrm{G}}\), and \(N_{\mathrm{A}}\). However, \(R_{\mathrm{BB}}\) is different. For cycles before the Dalton minimum (Solar Cycles 5 and before) the sunspot minimum values exceed those seen in modern times (the normalised cycleaverage values frequently exceed unity, whereas the same cycles are giving values near unity for solar maximum). Thus the drift to higher values in \(R_{\mathrm{BB}}\) is greater in the solarminimum values than it is in the solarmaximum values. This implies that the cause of the drift in \(R_{\mathrm{BB}}\) is more than the effect of the calibration observer \(k\)factors as they would influence the values around solar minimum and around solar maximum to the same fractional extent.
We note that, selfevidently, if we normalised using another cycle, then all values would be the same for that cycle and values for Cycle 19 would be different. But we recall that the point of Figure 5 is to evaluate for each tested and test data series how earlier solar cycles compare in amplitude with Cycle 19, i.e. to study the ratio \(\langle R\rangle_{\text{Cn}} /\langle R\rangle_{19}\), as is shown.
The consistency with which the geomagnetic and auroral data give lower values (normalised to modern values) than \(R_{\mathrm{BB}}\) and the way that the difference grows as one goes back in time strongly suggests that there may be calibration drift in the values of \(R_{\mathrm{BB}}\). In particular, this calls for a check on the compilation of \(R_{\mathrm{BB}}\). This could be done by repeating it with different regression procedures because the necessary daisychaining of calibrations means that both systematic and random errors will be amplified as one goes back in time. Article 3 is this series (Lockwood et al.2016b) shows that the inflation of \(R_{\mathrm{BB}}\) as one goes back in time is consistent with the effect of regressions and the assumptions made by Svalgaard and Schatten (2016), in particular that the sunspot group counts by different observers are proportional. This assumption of proportionality was initially made by Wolf (1861) when he devised sunspot numbers because he envisaged the \(k\)factors as being a constant for each combination of observer and observing instrument. However, in 1872 he realised that this was an invalid assumption (Wolf 1873), and thereafter observer \(k\)factors were computed either quarterly or annually (using daily data) at the Zürich observatory: Wolf also recalculated all prior calibrations the same way (see Friedli 2016). It is also important to recognise that the common practice of taking ratios of different sunspot numbers or sunspotgroup numbers either to make or to test calibrations of sunspot observers inherently assumes proportionality and will also give misleading values.
Acknowledgements
The authors are grateful to staff and funders of the World Data Centres from where data were downloaded: particularly the WDC for the sunspot index, part of the Solar Influences Data Analysis Centre (SIDC) at the Royal Observatory of Belgium. The work of M. Lockwood, C.J. Scott, M.J. Owens, and L.A. Barnard at Reading was funded by STFC consolidated grant number ST/M000885/1. The work of I.G. Usoskin was done under the framework of the ReSoLVE Center of Excellence (Academy of Finland, project 272157).
Disclosure of Potential Conflicts of Interest
The authors declare that they have no conflicts of interest.
Funding information
Funder Name  Grant Number  Funding Note 

Science and Technology Facilities Council (GB) 
 
Academy of Finland 

Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.