Tests of Sunspot Number Sequences: 1. Using Ionosonde Data
 First Online:
 Received:
 Accepted:
DOI: 10.1007/s1120701608558
 Cite this article as:
 Lockwood, M., Scott, C.J., Owens, M.J. et al. Sol Phys (2016) 291: 2785. doi:10.1007/s1120701608558
Abstract
More than 70 years ago, it was recognised that ionospheric F2layer critical frequencies [foF2] had a strong relationship to sunspot number. Using historic datasets from the Slough and Washington ionosondes, we evaluate the best statistical fits of foF2 to sunspot numbers (at each Universal Time [UT] separately) in order to search for drifts and abrupt changes in the fit residuals over Solar Cycles 17 – 21. This test is carried out for the original composite of the Wolf/Zürich/International sunspot number [\(R\)], the new “backbone” group sunspot number [\(R_{\mathrm{BB}}\)], and the proposed “corrected sunspot number” [\(R_{\mathrm{C}}\)]. Polynomial fits are made both with and without allowance for the whitelight facular area, which has been reported as being associated with cycletocycle changes in the sunspotnumber–foF2 relationship. Over the interval studied here, \(R\), \(R_{\mathrm{BB}}\), and \(R_{\mathrm{C}}\) largely differ in their allowance for the “Waldmeier discontinuity” around 1945 (the correction factor for which for \(R\), \(R_{\mathrm{BB}}\), and \(R_{\mathrm{C}}\) is, respectively, zero, effectively over 20 %, and explicitly 11.6 %). It is shown that for Solar Cycles 18 – 21, all three sunspot data sequences perform well, but that the fit residuals are lowest and most uniform for \(R_{\mathrm{BB}}\). We here use foF2 for those UTs for which \(R\), \(R_{\mathrm{BB}}\), and \(R_{\mathrm{C}}\) all give correlations exceeding 0.99 for intervals both before and after the Waldmeier discontinuity. The error introduced by the Waldmeier discontinuity causes \(R\) to underestimate the fitted values based on the foF2 data for 1932 – 1945, but \(R_{\mathrm{BB}}\) overestimates them by almost the same factor, implying that the correction for the Waldmeier discontinuity inherent in \(R_{\mathrm{BB}}\) is too large by a factor of two. Fit residuals are smallest and most uniform for \(R_{\mathrm{C}}\), and the ionospheric data support the optimum discontinuity multiplicative correction factor derived from the independent Royal Greenwich Observatory (RGO) sunspot group data for the same interval.
Keywords
Sunspots Statistics1 Introduction
1.1 Definitions of Sunspot Numbers
We note that the observer calibration factors \(k\) in Equation (1) are relative and not absolute, independently determined factors, being defined for an interval \(T\) as \(\langle R_{\mathrm{W}} /R_{\mathrm{O}} \rangle_{\mathrm{T}}\) where \(R_{\mathrm{W}}\) is Wolf’s sunspot number from a central reference observatory (for which \(k\) is assumed to be constant and unity) and \(R_{\mathrm{O}}\) is that derived by the observer in question. Because the \(k\)values in the modern era vary by a factor of up to three with location, equipment, and observer, all of which change over time, in general we must expect \(k\)values for historic observations to have the potential to vary with time by at least this factor, and probably more (Shapley 1947). The same is true for the \(k '\)factors used in the compilation of \(R_{\mathrm{G}}\).
Another point about the definitions of \(R\) and \(R_{\mathrm{G}}\) is that they both inevitably require subjective decisions to be made by the observer to define both spots and groups of spots on the visible solar disk. Hence observer bias is a factor. Furthermore, the nature of the subjective decisions required has changed with observing techniques and as new guidelines and algorithms were established to try to homogenise the observations, and it may even have changed for one observer over their lifetime. Some of the effects of these subjective decisions are subsumed into the \(k\)values but others are not because they change with time. The assumption that \(k = 1\) at all times for the reference station must also be challenged. With modern digital whitelight images of the solar disk, it is possible to deploy fixed and objective algorithms to deconvolve all instrumental effects and define what constitutes a spot and what constitutes a group of spots. For such data, the main subjective decision needed is as to when obscuration by clouds, mists, or atmospheric aerosols is too great for a given site: with sufficient observatories around the globe, unobscured observations are always available from some locations, but a decision is needed as to which to employ to ensure that average sunspot numbers are not influenced by inclusion of data from observatories suffering from partial obscuration. Before the availability of digital images, photographic plates are available. For these, there are additional considerations about image contrast, telescope focus, scatteredlight levels, image exposure time, and resolution (collectively giving net observer acuity).
The most important subjective decision required of observers is what constitutes a group, which is of crucial importance given the weighting given to \(N_{\mathrm{G}}\) in Equation (1). However, there are other subjective decisions that influence both \(N_{\mathrm{S}}\) in \(N_{\mathrm{G}}\). For example, sunspots must be distinguished from pores, which are smaller than sunspots (typically \(1\,\mbox{}\,6~\mbox{Mm}\), compared to \(6\,\mbox{}\,40~\mbox{Mm}\) for sunspots) and sometimes, but not always, develop into sunspots (Sobotka 2003). Their intensity range overlaps with that for sunspots, at their centre being \(0.2I_{\mathrm{ph}}\,\mbox{}\,0.7 I_{\mathrm{ph}}\) (where \(I_{\mathrm{ph}}\) is the mean photospheric intensity) compared to the \(0.05 I_{\mathrm{ph}}\,\mbox{}\,0.3 I_{\mathrm{ph}}\) for sunspots. In images with sufficiently high resolution, sunspots and pores are distinguished by the absence of a sunspot penumbra around pores (although some pores show unstable filamentary structures that can be confused with a sunspot penumbra).
The original photographic glass plates acquired by the Royal Observatory, Greenwich, and the Royal Greenwich Observatory (collectively here referred to as “RGO”) during the interval 1918 – 1976 still survive. These are currently stored in the “Book Storage Facility” in South Marston, near Swindon, UK, as part of the Bodleian Libraries, Oxford. The RGO glass plates for the earlier interval 1873 – 1917 are thought to have been destroyed during the First World War. However, contact prints (photographs) were made of some, but certainly not all, of these earlier glass plates before they were lost (in particular, plates not showing any obvious sunspots were not copied). The fraction of days for which there are no contact prints is considerably higher before 1885 (Willis, Wild, and Warburton 2016). The extant contact prints form part of the official RGO Archives, which are stored in the Cambridge University Library (Willis et al.2013a; 2013b)
Most of the information available before 1874 is in the form of sketches of the solar disk and/or tabulated sunspot and/or sunspotgroup counts compiled by observers using a telescope. (However, we note that even after 1918 sunspot numbers were frequently compiled without the use of photographic images.) It is for these nonphotographic records that the subjective nature of sunspot number data is greatest and the \(k\) and \(k'\)factors are most uncertain and least stable. Because observers will have used different criteria to define both spots and spot groups (and even a given observer’s criteria may have changed with time) and because observer acuity varies from observer to observer and with time, intercalibrations of data are required (e.g. Chernosky and Hagan 1958). All longterm sunspotnumber data sequences are therefore an observational composite: this is true of the muchused original Wolf/Zurich/International sunspotnumber data sequence (version 1 of the International Sunspot Number, here termed \(R\)) as published by Solar Influences Data Analysis Center (SIDC, the solar physics research department of the Royal Observatory of Belgium) and hence of all sunspot series based on \(R\) with corrections for known or putative discontinuities, for example, the corrected sequence [\(R_{\mathrm{C}}\)] suggested by Lockwood, Owens, and Barnard (2014). This is equally the case for the new (second) version of the Wolf/Zürich/International composite recently published by SIDC, the sunspotgroup number [\(R_{\mathrm{G}}\)] (Hoyt and Schatten 1994, 1998), and the “backbone” group number data series [\(R_{\mathrm{BB}}\)] proposed by Svalgaard and Schatten (2016).
To compile the backbone series [\(R_{\mathrm{BB}}\)], a primary observation source was selected to cover a given interval, and the quality of other observers was judged by how well they correlate with the chosen backbone. Sequences put together this way were then “daisychained” using intercalibrations of the segments from the interval of overlap between the two to give \(R_{\mathrm{BB}}\). Obviously, the choices of which data sequences were chosen to be backbones are critical. It is important to note that the intercalibration of observers should be done on a daily basis because sunspot groups can appear and disappear in as little as one day (Willis, Wild, and Warburton 2016). Cloud cover means that observers do not, in general, make observations on the same days, and this will introduce errors if intercalibration is carried out on annual, or even monthly, means of the two incomplete data sets. Hence intercalibrations carried out on daily data, such as those by Usoskin et al. (2016), are much more reliable than those done on annual means, as used to generate \(R_{\mathrm{BB}}\).
The differences in sunspot numbers caused by the subjective decisions required of the observers, by their instrumentation performance, and by local cloud and atmospheric airquality conditions, make definitive calibration of individual observers extremely difficult, if not impossible. The term “daisychaining” refers to all methods for which the calibration is passed from one data segment to the next using a relationship between the two, derived from the period of overlap. Usually this relationship has been obtained using some form of regression fit. However, as noted by Lockwood et al. (2006) and by Article 3 in this series (Lockwood et al.2016b), there is no definitively correct way of making a regression fit, and tests of fit residuals are essential to ensure that the assumptions made by the regression have not been violated, as this can render the fit inaccurate and misleading for the purposes of scientific deduction of prediction. Article 3 shows that large intercalibration errors from regression techniques (\({>}\,30~\%\)) can arise even for correlations exceeding 0.98 and that no one regression method is always reliable in this context: use of regression frequently gives misleading results that amplify the amplitude of solar cycles in data from loweracuity observers. The problem with daisychaining is that any errors (random and systematic) in the relationship will apply to all data before that error (assuming modern data are the most accurate), and if there is a systematic bias, the systematic errors are in the same sense and will compound, such that very large deviation can result by the start of the data sequence. The intercalibrations also depend upon subjective decisions about which data to rely on most, over which intervals to intercalibrate, and on the sophistication and rigour of the chosen statistical techniques. Thus daisychaining of different data is a likely source of spurious longterm drift in the resulting composite. Most observational composites until now have been assembled using some form of daisychaining and so are prone to the propagation of errors (this is certainly true of \(R\), \(R_{\mathrm{BB}}\), and \(R_{\mathrm{C}}\)). An important exception, which avoids both daisychaining and regression, is the new composite of group numbers \([R_{\mathrm{UEA}}]\) assembled by Usoskin et al. (2016), who compared data probability distribution functions for any interval with those for a fixed standard reference set (the RGO data after 1920 were used). In particular, they used the fraction of observed days that revealed no spots to obtain a calibration rather than passing the calibration from one data segment to the next. These authors assumed that the calibration of each observer remained constant over their observing lifetime; however, their method could be refined and applied to shorter intervals to allow for the drift in each observer’s calibration factor over time.
For a number of reasons, it is highly desirable that sunspot data series are compiled using only sunspot observations. Other data, such as geomagnetic observations, the frequency of occurrence of lowlatitude aurorae, or cosmogenic isotope abundance measurements, correlate on a range of timescales, but it cannot be assumed that the regression coefficients are independent of timescale. Hence using such data to calibrate the sunspot data on centennial timescales may introduce longterm differences. An example, in the context of the present article, is that ionospheric Flayer critical frequencies [foF2] and sunspot numbers correlate very well on decadal timescales. However, it has been proposed that anthropogenic warming of the troposphere by greenhouse gases and the associated cooling of the stratosphere, mesosphere, and thermosphere cause lowering of ionospheric layers (through atmospheric contraction) and could potentially influence ionospheric plasma densities and critical frequencies (Roble and Dickinson 1989; Rishbeth 1990; Ulich and Turunen 1997). Furthermore, any such effects will be complicated by changes in the local geomagnetic field (Cnossen and Richmond 2008). If sunspot calibration were to be based on foF2 values, such effects, if present, would not be apparent because it would be included in the sunspotnumber intercalibrations, and the sunspot data sequence would contain a spurious longterm drift introduced by the atmospheric and geomagnetic effects. This is just one of many potential examples where using ionospheric data to calibrate sunspot data could seriously harm ionospheric studies by undermining the independence of the two datasets. However, we note that these studies are also damaged if an incorrect sunspot data series is used.
It must always be remembered that sunspot numbers have applications only because they are an approximate proxy indicator of the total magnetic flux threading the photosphere and hence can be used to estimate and reconstruct terrestrial influences such as the received shortwave Total Solar Irradiance (TSI) and UV irradiance (Krivova, Balmaceda, and Solanki 2007; Krivova et al.2009, respectively), the open solar magnetic flux (Solanki, Schüssler, and Fligge 2000; Lockwood and Owens 2014a), and hence also the nearEarth solarwind speed (Lockwood and Owens 2014b), mass flux (Webb and Howard 1994), and interplanetary magneticfield strength (Lockwood and Owens 2014a). Sunspot numbers also provide an indication of the occurrence frequency of transient events, in particular coronal mass ejections (Webb and Howard 1994; Owens and Lockwood 2012), and the phase of the decadalscale sunspot cycle is used to quantify the tilt of the heliospheric current sheet (Altschuler and Newkirk 1969; Owens and Lockwood 2012) and hence the occurrence of fast solarwind streams and corotating interaction regions (Smith and Wolf 1976; Gazis 1996). Because all of the above factors influence the terrestrial space environment, sunspot numbers are useful in providing an approximate quantification of terrestrial spaceweather and spaceclimate phenomena, and hence it is vital that the \(k\)factor intercalibrations inherent in all sunspotnumber composites mean that their centennial drifts correctly reflect trends in the terrestrial responses.
From the above arguments, we do not advocate using correlated data to calibrate sunspot numbers, but we do think it important to evaluate any one sunspotnumber data sequence against the trends in terrestrial effects because it is these effects that give sunspot numbers much of their usefulness.
1.2 The “Waldmeier Discontinuity”
In this article, we look at the longterm relationship between the sunspot number data sequences \(R\), \(R_{\mathrm{C}}\), and \(R_{\mathrm{BB}}\) and the ionospheric F2region critical frequency [foF2] for which regular measurements are available since 1932. This interval is of interest as there has been discussion about a putative inhomogeneity in the calibration of sunspots data series around 1945 that has been termed the “Waldmeier discontinuity” (Svalgaard 2011; Aparicio, Vaquero, and Gallego 2012; Cliver, Clette, and Svalgaard 2013). This is thought to have been caused by the introduction of a weighting scheme for sunspot counts according to their size and a change in the procedure used to define a group (including the socalled “evolutionary” classification that considers how groups evolve from one day to the next); both changes that may have been introduced by the then director of the Zürich observatory, Max Waldmeier, when he took over responsibility for the production of the Wolf sunspot number in 1945. We note that these changes affect both sunspot numbers and sunspotgroup numbers, but not necessarily by the same amount. Svalgaard (2011) argues that these corrections were not applied before this date, despite Waldmeier’s claims to the contrary. By comparison with other long timeseries of solar and solarterrestrial indices, Svalgaard makes a compelling case that this discontinuity is indeed present in the data. Svalgaard argues that sunspotnumber values before 1945 need to be increased by a correction factor of 20 %, but it is not clear how this value was arrived at beyond visually inspecting a plot of the temporal variation of the ratio \(R_{\mathrm{G}} /R\) (neglecting low \(R\)values below an arbitrarily chosen threshold as these can generate very high values of this ratio). We note that this assumes that the correction required is purely multiplicative, i.e. that before the discontinuity the corrected value \(R ' = f_{\mathrm{R}} \times R\) (and Svalgaard estimates \(f_{\mathrm{R}} = 1.2\)) to make the prediscontinuity values consistent with modern ones.
Lockwood, Owens, and Barnard (2014) studied fit residuals when \(R\) is fitted to a number of corresponding sequences. These were i) the independent sunspotgroup number from the RGO dataset, ii) the total group area data from the RGO dataset, and iii) functions of geomagneticactivity indices that had been derived to be proportional to sunspot numbers. For each case, they studied the difference between the mean residuals before and after the putative Waldmeier discontinuity and quantified the probability of any one correction factor with statistical tests. These authors found that the best multiplicative correction factor [\(f_{\mathrm{R}}\)] required by the geomagnetic data was consistent with that for the RGO sunspotgroup data, but that the correction factor was very poorly constrained by the geomagnetic data. Because both the sample sizes and the variances are not the same for the two data subsets (before and after the putative discontinuity), these authors used Welch’s ttest to evaluate the probability pvalues of the difference between the mean fit residuals for before and after the putative discontinuity. This twosample ttest is a parametric test that compares two independent data samples (Welch 1947). It was not assumed that the two data samples are from populations with equal variances, so the test statistic under the null hypothesis has an approximate Student’s tdistribution with a number of degrees of freedom given by Satterthwaite’s approximation (Satterthwaite 1946). The distributions of residuals were shown to be close to Gaussian and so application of nonparametric tests (specifically, the Mann–Whitney U (Wilcoxon) test of the medians and the Kolmogorov–Smirnov test of the overall distributions) gave very similar results. These tests yielded a correction factor of 11.6 % (\(f_{\mathrm{R}} = 1.116\)) with an uncertainty range of 8.1 – 14.8 % at the \(2\sigma\) level. The probability of the factor being as large as the 20 % estimated by Svalgaard (2011) was found to be minuscule (\(1.6 \times 10^{ 5}\)). Lockwood, Owens, and Barnard (2014) carried out these tests in two ways. The “before” period was 1874 – 1945 (i.e. all of the preRGO data were used) in both cases, but two “after” periods were used: 1945 – 2012 and 1945 – 1976. The former uses data from both the RGO and the Solar Optical Observing Network (SOON), with some data gaps that are filled using the “Solnechniye Danniye” (Solar Data, SD) Bulletins issued by the Pulkovo Astronomical Observatory in Russia. These data need to be intercalibrated with the RGO data (for example the RGO and SD records were photographic, whereas the SOON data are based on sketches) (Foukal 2013). In the second analysis, for the shorter “after” interval, only the RGO data were used.
In relation to this analysis by Lockwood, Owens, and Barnard (2014), it has been argued that the RGO data are not homogeneous, particularly before about 1915 (Clette et al.2015; Cliver and Ling 2016). To be strictly rigorous, the RGO count of the number of sunspot groups on the solar disk is inhomogeneous essentially by definition, since this count is based on information derived from photographs acquired at different solar observatories, which use different solar telescopes, experience different seeing conditions, and employ different photographic processes (Willis et al.2013a; 2013b). With this rigorous definition, the RGO count of the number of sunspot groups is also inhomogeneous after 1915. It can be shown that the RGO count of the number of sunspot groups in the interval 1874 – 1885 behaves as a “quasihomogeneous” time series (Willis, Wild, and Warburton 2016), but the correct decisions have to be taken about how to deal with days of missing data. Moreover, changes in the metadata do not appear to invalidate the integrity of the time series. The stability of the RGO sequence calibration is of relevance here because any drift in the RGO group data could, it has been argued, be at least part of the reason why Lockwood, Owens, and Barnard (2014) derived a lower correction factor for the Waldmeier discontinuity than Svalgaard (2011). The argument is that because they used all of the RGO data, extending back to 1874, this may have introduced some poorly calibrated data. In the present article, as well as studying the relationship to ionospheric data, we repeat the analysis of Lockwood, Owens, and Barnard (2014), but using shorter intervals and RGO data only; namely, 1932 – 1945 for the “before” interval and 1947 – 1976 for the “after” interval. The choice of 1932 is set by the availability of ionospheric data that can be used to make the corresponding tests (the results of which are therefore directly comparable with the tests against the RGO data presented here), but 1932 is also well after the interval of any postulated RGO data calibration drift. The shorter periods mean fewer data points, which necessarily broadens the uncertainty band around the optimum correctionfactor estimates.
The reddashed line in Figure 1 shows the corresponding deviation for \(R_{\mathrm{BB}^{*}}\), which is \(R_{\mathrm{BB}}\) with application to 1945 of a 12 % correction to allow for an overestimation of the Waldmeier discontinuity in the compilation of \(R_{\mathrm{BB}}\). It can be seen that this correction, which is derived in the present article, brings \(R_{\mathrm{BB}}\) broadly in line with \(R_{\mathrm{C}}\) for the interval studied here. (We note that \(R_{\mathrm{C}}\), by definition, is the same as \(R\) after the Waldmeier discontinuity, but \(R_{\mathrm{BB}}\) differs from them because it contains some corrections to the Locarno data, which were used as the standard reference (\(k = 1\)) for much of this interval – those corrections will also be tested in the present article.)
Lastly, we note that the corrections proposed by both Svalgaard (2011) and Lockwood, Owens, and Barnard (2014) assume that the corrected values are proportional to the uncorrected ones so that a single multiplicative factor can be used (i.e.\(R ' = f_{\mathrm{R}} R\)). However, Article 3 in this series (Lockwood et al.2016b) shows that this assumption can be very misleading, and Article 4 (Lockwood, Owens, and Barnard 2016) carries out a number of tests assuming linearity but not proportionality by also allowing for a zerolevel offset, \(\delta\) (i.e.\(R' = f_{\mathrm{R}} R + \delta\)).
1.3 Ionospheric FRegion Critical Frequency
Because foF2 is the largest ordinarywave mode HF radio frequency that can be reflected by the ionosphere at vertical incidence, it is where the pulse timeofflight (and hence virtual reflection height) goes to infinity and hence is readily scaled from ionograms generated by ionosondes (vertical sounders with colocated transmitter and receiver). Under the “spreadF” condition, which at middle latitudes occurs predominantly at night, echoes at frequencies above foF2 can be received, caused by reflections off ionosphericplasma irregularities; however, rules for scaling foF2 under these conditions were soon established under international standards (e.g. Piggott and Rawer 1961), and foF2 can be readily scaled from the asymptotic limit of the lower edge of the spread in the ionogram trace. Other problems, such as external radio interference, can make the trace hard to define at all frequencies. These problems are greater if transmitter power is low (although much lower powers can be used if advanced pulsecoding techniques are deployed). The main instrumental uncertainty is the accuracy of the transmitter carrierwave frequency at the relevant point of each frequency sweep, and this varies with the manufacture of the ionosonde in use. Most of the time, especially at middle latitudes during the day, foF2 is a straightforward, objective measurement.
 i)
Comparison with data from Washington for Cycle 19 shows that the drift in the foF2–\(R\) relationship continued after the Waldmeier discontinuity (giving the 7.5 % difference between Cycles 18 and 19 in Figure 2).
 ii)
Smith and King (1981) studied the changes in the foF2–\(R\) relationship at a number of stations (at times after the Waldmeier discontinuity). For all of the stations that they studied, these authors found that foF2 varied with the total area of whitelight faculae on the Sun, as monitored until 1976 by the Royal Greenwich Observatory, as well as with sunspot number. Furthermore, these authors showed that the sensitivity to the facular effect was a strong function of location and that, of the six stations that they studied, it was greatest for Washington, DC and that it was lowest for Slough.
The locationdependent behaviour found by Smith and King (1981) is common in the ionospheric Fregion. Modelling by Millward et al. (1996) and Zou et al. (2000) has shown that the variation of foF2 over the year at a given station is explained by changes to two key influences: i) thermospheric composition (which is influenced by a station’s proximity to the geomagnetic pole) and ii) ionproduction rate (which is influenced by solar zenith angle and the level of solar activity). The composition changes are related to other locationdependent effects, such as thermospheric winds, which blow F2layer plasma up or down field lines where loss rates are lower or higher, and this effect depends on the geomagnetic dip. For Slough, the annual variability in composition dominates the zenithangle effect, resulting in the variation of foF2 being predominantly annual. However, at other locations, at similar geographic latitudes but different longitudes, a strong semiannual variation is both observed and modelled, caused by the compositional changes between Equinox and Winter months being relatively small compared with the effect of the change in solar zenith angle. A method to determine and analyse the ratio of powers in the annual and semiannual variations has been presented by Scott, Stamper, and Rishbeth (2014) and used by Scott and Stamper (2015). We have extended this study to the Washington data and find, as for nearby stations studied by Scott and Stamper (2015), that the semiannual variation dominates at Washington (and the variation of the annual or semiannual power ratio there is almost uncorrelated with that at Slough). Thus ionising solar EUV irradiance is more important in controlling foF2 at Washington than it is at Slough, where the composition effect (on loss rates) dominates. EUV emission (particularly at the softer end of the spectrum) is enhanced through the presence around sunspots of plages and faculae (Dudok de Wit et al.2008), and hence foF2 is expected to be more dependent on both sunspot numbers and facular area at Washington than at Slough.
The results of Smith and King (1981) also help to explain the nonlinearity of the foF2–\(R\) variations that can be seen in Figure 2 (often called the saturation effect, see also Sethi, Goel, and Mahajan 2002). This is because the RGO facular areas increase with \(R\) at lower \(R\), but reach a maximum and then fall again at the largest \(R\) (Foukal 1993). Of all the sites studied by Smith and King (1981), Slough had the lowest sensitivity to facular area. The Slough data also show the lowest solarcycle hysteresis in the foF2–\(R\) relationship for a given UT. Indeed, analysis by Bradley (1994) found that for Slough there were no detectable cycletocycle changes in the average foF2 variations with \(R\) (at a given UT) in that they were smaller than the solarcycle hysteresis effect (which was not systematic) and both geophysical and observational noise.
2 Slough Ionosonde Data
Figure 1c shows the Slough ionosonde foF2 data, retrieved from the UK Space Science Data Centre (UKSSDC) at RAL Space, Chilton (URL given in Section 3). In 2004, a more complete set of scaled and tabulated hourly data for 1932 – 1943 was rediscovered in the archives of World Data Centre C1 at Chilton. These data have been digitised and checked wherever comparisons are possible and by rescaling a few selected ionograms from the surviving original photographic records. A few soundings were not usable because the metadata revealed that the ionosonde was operated in a mode unsuitable for foF2 determination. Regular soundings at Noon began in February 1932, and after January 1933, the sounder was operated six days a week until September 1943, when regular hourly soundings every day began. Before 1943, values for Noon were available every day, but for other UTs only monthly medians were tabulated (of a variable number of samples, but always exceeding 15). Interference was not a problem for the earliest data as the HF radio spectrum was not heavily utilised, but some data carry a quality flag “C” that appears to stand for “cows”, who caused a different kind of interference by breaking through the fence surrounding the neighbouring farm and disrupting performance by scratching themselves against the receiver aerials. The hardware used (at least until later in the data series) was constructed inhouse and evolved from the first sounder made by L.H. BainbridgeBell, to the 249 Pattern, the Union Radio Mark II, and the KEL IPS42. In the present article, annual means of foF2 were compiled for each of the 24 UT separately: for regular hourly values a total of at least 280 soundings in a year (\({\approx}\,75~\%\)) were required to make a useable annual mean, and for monthly median data ten values per year were required. In Figure 1c it can be seen that the noise in the annual mean data is considerably greater before 1943 for most UTs. This could be due to the use of monthly medians rather than the monthly means of daily values and the fact that data were only recorded six days per week, but it might also be associated with the stability of the sounder and observer scaling practices. However, the values for Noon (the black line) show the same yeartoyear consistency before and after 1943, implying that the use of medians and the reduced sampling is the main cause of the increased noise in the earliest data. The grey lines in Figure 1c are for UTs at which the correlation coefficient between \(R\) and \(R_{\mathrm{fit}}\), the best thirdorderpolynomial fit of foF2 to \(R\) for data after 1950 (see next section), does not exceed 0.99, whereas the coloured lines are for UTs (mainly during the daytime) for which this correlation does exceed 0.99.
After 1990, the ionosonde at Slough was relocated to Chilton, Oxfordshire. To avoid the need for a data intercalibration between these two sites and any potential effects that may have, we here only consider data up to an end date of 1990.
3 Analysis
Fits were made using the Nelder–Mead search procedure to minimise the r.m.s. deviation of \(R_{\mathrm{fit}}\) from the sunspot number in question (\(R\), \(R_{\mathrm{BB}}\), and \(R_{\mathrm{C}}\)). We note that the analysis presented below in this article was repeated using a secondorder polynomial (\(\alpha = 0\)) and a linear fit (\(\alpha = \beta = 0\)). The results were very similar in all three cases, the largest difference being that uncertainties are smallest using the full thirdorder polynomial because fit residuals were smaller and had a distribution that was closer to a Gaussian. In the remainder of this article we show the results for the thirdorder polynomial, but the overall results for the lowerorder polynomials will also be given.
As expected from the results of Smith and King (1981), we found that in some cases the facular term was needed but in others it was not. Specifically, the fits for Washington were statistically poorer if the facular term was omitted, and so it was necessary to use \(\varepsilon \neq\) 0. On the other hand, for Slough there was no statistically significant difference between the fits with (\(\varepsilon \neq\) 0) and without (\(\varepsilon = 0\)) the facular term; to demonstrate this, we here discuss both the Washington and the Slough fits, both with and without the facular area. In Section 3.1 the fits employ a thirdorder polynomial in Slough foF2 only (i.e.\(\varepsilon = 0\)), whereas in Section 3.2 we fit the same data using the thirdorder polynomial in foF2 plus a linear term in the RGO whitelight facular area, \(A_{\mathrm{f}}\), (i.e.\(\varepsilon \neq\) 0). The latter fits only use data before 1976, when the RGO measurements ceased. Both \(\varepsilon = 0\) and \(\varepsilon \neq\) 0 fits can be carried out for the Sough data (and are shown to give similar results) because the dependence on \(A_{\mathrm{f}}\) is low. In Section 3.3 we study the Washington data and find that the greater dependence on facular area means that this factor must be included. (Without the \(\varepsilon A_{\mathrm{f}}\) term, the correlations between \(R_{\mathrm{fit}}\) and sunspot numbers for Washington fall short of the required threshold that we here adopt). We note that fitted \(\alpha\)values make the \(\alpha.\,\mathrm{foF2}^{3}\) term small and inclusion of the \(\varepsilon A_{\mathrm{f}}\) term makes the \(\beta.\,\mathrm{foF2}^{2}\) small also, such that \(R_{\mathrm{fit}}\) is approximately a combination of linear terms in foF2 and \(A_{\mathrm{f}}\), as was found by Smith and King (1981).
The sources of the data used in the following sections are the following: the Slough foF2 data and the Greenwich whitelight facular area data were downloaded from the World Data Centre (WDC) for Solar Terrestrial Physics, which is part of the UK Space Science Data Centre (UKSSDC) at RAL Space, Chilton, UK (www.ukssdc.ac.uk/wdcc1/ionosondes/secure/iono_data.shtml); the Washington foF2 data were downloaded from Space Weather Services in Sydney, Australia (formerly known as IPS and the WDC for SolarTerrestrial Science) within the Australian Bureau of Meteorology (ftp://ftpout.ips.gov.au/wdc/iondata/medians/foF2/7125.00); the standard sunspot numbers [\(R\)] are the old data series published (until July 2015) by the WDC for the sunspot index part of the Solar Influences Data Analysis Center (SIDC) at the Royal Observatory of Belgium (sidc.oma.be/silso/versionarchive). The corrected sunspot numbers series [\(R_{\mathrm{C}}\)] is given in the supplementary data to the article by Lockwood et al. (2014), and the backbone sunspot group data [\(R_{\mathrm{BB}}\)] were digitised from the article by Svalgaard and Schatten (2016) that accompanied the call for articles for this special issue. We employ the version of the RGO sunspotgroup data made available by the Space Physics website of the Marshall Space Flight Center (MSFC), which has been compiled, maintained and corrected by D. Hathaway. These data were downloaded in June 2015 from solarscience.msfc.nasa.gov/greenwch.shtml. As noted by Willis et al. (2013b), there are some differences between these MSFC data and versions of the RGO data stored elsewhere (notably those in the National Geophysical Data Center, NGDC, Boulder, www.ngdc.noaa.gov/nndc/struts/results?op_0=eq&v_0=Greenwich&t=102827&s=40&d=8&d=470&d=9), but these are very minor.
3.1 Using Slough Data and Polynomial Fits in foF2 Only (\(\varepsilon = 0\))
In the top panel, the blue line shows that for the standard sunspot number [\(R\)] the fit residuals are small and reasonably constant for the calibration interval (and after 1990), but become persistently negative before then. This means that \(R\) in this interval is systematically smaller than the bestfit extrapolation based on foF2. The deviation is slightly smaller than the \(2\sigma\) uncertainty (but exceeds the \(1\sigma\) uncertainty). The sense of this persistent deviation is consistent with the Waldmeier discontinuity. The second panel is for \(R_{\mathrm{BB}}\). In this case, the red line shows even better fits during the calibration interval and after, but \(R_{\mathrm{BB}}\) for before 1945 becomes consistently greater than the bestfit extrapolation from the calibration interval. Again the deviation is slightly smaller than the \(2\sigma\) uncertainty, but is almost as large in magnitude as that for \(R\). Thus the Slough foF2 data imply that the effective correction for the Waldmeier discontinuity in \(R_{\mathrm{BB}}\) is roughly twice what it should be for the 20 % correction postulated by Svalgaard (2011), as was found by Lockwood, Owens, and Barnard (2014). In the third panel, the green line shows the results for \(R_{\mathrm{C}}\), which uses the 11.6 % bestfit correction found by Lockwood, Owens, and Barnard (2014). In this case the fit residuals before the calibration interval are similar to those during it.
As implied by Figure 6, the values of \(R\) before 1945 are, on average, slightly lower than those in the “after” interval at the same foF2. Otherwise the variations of foF2 with \(R\) for the two intervals are very similar. The solidcyan line in each panel is the bestfit thirdorder polynomial to the calibration data. It can be seen that most mauve points lie above the cyan line, consistent with \(R\) being underestimated before the Waldmeier discontinuity. The blue dotanddash line and the dashedorange line are for this bestfit \(R\) divided by correction factors \(f_{\mathrm{R}}\) of 1.116 and 1.200 on which the mauve points should (if the real \(R\)–foF2 variation has remained the same) all lie if the correction needed for the Waldmeier discontinuity were 11.6 % (as derived by Lockwood, Owens, and Barnard 2014) or 20 % (as derived by Svalgaard 2011), respectively. The separations of the lines are small, but inspection shows that the “before” interval test points (in mauve) are most clustered around the bluedashed lines (51.5 % of all the mauve points in all the panels in Figure 7 line lie below the blue dashed lines, whereas 49. 5 % lie above it). In contrast, 73 % of all the points lie below the orange lines and only 27 % above, strongly implying that \(f_{\mathrm{R}} = 1.200\) is an overestimate of the correction needed.
The probability pvalue for each difference between the two means is computed using the procedure described by Lockwood, Owens, and Barnard (2014) and in Section 1.2. This peaks when the difference falls to zero, but it also gives the probability for all other values of \(f_{\mathrm{R}}\). All pvalue distributions are normalised so that the area below the curve is unity.
In addition to carrying out this test using the Slough foF2 data, we have repeated it for the RGO sunspotgroup number [\(N_{\mathrm{G}}\)] and the RGO sunspotgroup area [\(A_{\mathrm{G}}\)]. This test is the same as that which was carried out by Lockwood, Owens, and Barnard (2014), except that here we use shorter intervals; the “before” interval being 1932 – 1945 (the same as for the foF2 data used here) and the “after” interval being 1945 – 1976 (data between 1976 and 1990 were not used as they come from the SOON network and would require intercalibration with the RGO data). This eliminates any possibility that either the drift in the early RGO data (before 1932) or the RGO–SOON calibrations are influencing our estimate of the optimum \(f_{\mathrm{R}}\).
The results are shown in Figure 8. In both parts of the figure, the black line is for the foF2 data, the green line is for \(A_{\mathrm{G}}\), and the mauve line is for \(N_{\mathrm{G}}\). The blue line is the combination of the \(A_{\mathrm{G}}\) and \(N_{\mathrm{G}}\) probability variations. The distribution for foF2 is very much narrower than those for \(A_{\mathrm{G}}\) and \(N_{\mathrm{G}}\) (meaning that the optimum value is much better constrained), and the peak pvalue is therefore much greater. The verticaldashed line marks the peak for the foF2 test at \(f_{\mathrm{R}} =1.121\) (i.e. a 12.1 % correction) and the grey band marks the uncertainty band of \({\pm}\,2\sigma\) of the pvalue distribution (between 1.1110 and 1.1298, i.e. a correction of \(11.10\,\mbox{}\,12.98~\%\)). This result was obtained by employing a thirdorderpolynomial fit to the Slough foF2 data: if a secondorder polynomial was used, the optimum value was 12.6 % with a \({\pm}\,2\sigma\) uncertainty range of \(11.11\,\mbox{}\,14.17~\%\) and hence the optimum value is slightly higher and the uncertainty band considerably wider. To within the uncertainties, use of the second and thirdorder polynomials gives the same result. If a linear variation was used, the optimum value was 13.85 % with a \({\pm}\,2\sigma\) uncertainty range of \(12.33\,\mbox{}\,15.38~\%\), which is a significantly higher value and with an uncertainty band that does not overlap with that for the thirdorderpolynomial analysis: however, this value is here discounted because the linear variation cannot reproduce the marked “rollover” in the foF2–\(R\) plots presented in Figures 3 and 7. The solidblue vertical line marks the optimum value from the combination of the \(A_{\mathrm{G}}\) and \(N_{\mathrm{G}}\) pvalue distributions (at \(f_{\mathrm{R}} = 1.1360\), i.e. a 13.60 % correction) for the same intervals. This is slightly higher than the \(11.6\pm3.3~\%\) correction found by Lockwood, Owens, and Barnard (2014) using the same test, but applied to the RGO data that extended back to 1874. This shows that the early RGO data had reduced the optimum correction factor derived from the RGO data somewhat, but only by 2 %. This difference is comfortably within the \({\pm}\,3.35~\%\) uncertainty band estimated by Lockwood, Owens, and Barnard (2014). The dotted line is the 20 % correction proposed by Svalgaard (2011), which is also inherent in \(R_{\mathrm{BB}}\). Because the pvalue distributions for \(N_{\mathrm{G}}\) and \(A_{\mathrm{G}}\) are broad, the correction factor for \(R\) of \(12.0\pm1.0~\%\) derived here using foF2 is consistent with them, but using them for foF2 provides a much betterdefined test value than the RGO sunspot data. The reason why the foF2 test constrains the required correction to a much greater extent than do the RGO data is twofold: firstly the correlations for both the “before” and “after” intervals are so high (\({\geq}\,0.99\)); and secondly, the use of the six UTs that met this criterion means that there are six times the number of datapoints available per year compared to either \(N_{\mathrm{G}}\) or \(A_{\mathrm{G}}\). The probability pvalue from the foF2 test for a 20 % correction is lower than \(10^{20}\).
3.2 Using Slough Data with Polynomial Fits in foF2 and a Linear Dependence on Facular Area
As shown by Figure 2, the test presented in Section 3.1 will not work at all ionosonde stations and, in particular, those where Smith and King (1981) found a greater dependence of the \(R\)–fof2 relationship on facular area [\(A_{\mathrm{f}}\)]. To test that this factor has not altered the results for the Slough data, we here repeat the analysis in Section 3.1 using a multivariate fit with a thirdorder polynomial in foF2 and a linear term in the RGO whitelight facular area \([A_{\mathrm{f}}]\) (i.e.\(\varepsilon \) in Equation (4) is no longer assumed to be zero). Because RGO whitelight facular measurements ceased in 1976, we here use 1945 – 1976 for the “after” calibration interval. Otherwise the test is conducted as in the last section. To maximise the number of data points after the Waldmeier discontinuity, the interval 1947 – 1976 is used here.
3.3 Using Washington Data with Polynomial Fits in foF2 and a Linear Dependence on Facular Area
3.4 Comparing and Combining Slough and Washington Ionospheric Data and RGO Data
Panels a and b of Figure 11 show that the \(R_{\mathrm{fit}}\) values from foF2 data imply very similar \(f_{\mathrm{R}}\) factors (which need to be applied to \(R\) to allow for the Waldmeier discontinuity) as are derived from the RGO sunspot data. We combine the results of all of the data that give \(r > 0.99\) from the last three sections by multiplying independent pvalue distributions. The black line in the bottom panel shows the product of results for Slough foF2 with \(\varepsilon = 0\); Washington foF2 with \(\varepsilon \neq 0\); and RGO \(N_{\mathrm{G}}\) and RGO \(A_{\mathrm{G}}\). Again the grey band is the uncertainty around the peak at the \({\pm}\,2\sigma\) level. If we make the assumption that the differences between “before” and “after” ionospheric data are due only to changes in the calibration of \(R\), Figure 11c shows that all tests give an optimum \(f_{\mathrm{R}}\) from the combination of all of the independent pvalue distributions of 12.0 % (the peak of the black line in Figure 11c) and the \(2\sigma\) uncertainty band around this optimum value (in grey in Figure 11c) is 11.16 – 12.57 %.
4 Conclusions
We conclude that the ionosonde data give an extremely accurate test for the Waldmeier discontinuity correction factor and that the best value (maximum pvalue with a \({\pm}\,2\sigma\) uncertainty) from a combination of the Slough and Washington ionospheric data is \(12.0\pm1~\%\), which is very similar to the results obtained from the RGO sunspot group data. In general, allowance for the dependence of the foF2–\(R\) relationship on the facular area [\(A_{\mathrm{f}}\)] is required but is sufficiently small for Slough (where foF2 is dominated by the composition effect) that the results are essentially the same if it is neglected. For Washington (where foF2 is dominated by the solar illumination effect), the \(A_{\mathrm{f}}\)factor cannot be neglected. The probability that the Waldmeier correction is as large as the \({\approx}\,20~\%\) adopted by Svalgaard (2011) and the \({>}\,20~\%\) that is inherent in the backbone sunspot number \(R_{\mathrm{BB}}\) is, by this test, essentially zero. The results show that \(R_{\mathrm{BB}}\) is \(12.0\pm1.0~\%\) too large for 1945 and before. The dashedred line in the middle panel of Figure 1 shows that the effect of applying this correction to \(R_{\mathrm{BB}}\) makes it almost identical to \(R_{\mathrm{C}}\) for the interval studied here.
The fact that \(R_{\mathrm{BB}}\) matches the bestfit ionospheric data better than the other series after the Waldmeier discontinuity reveals a very important implication. This improvement is possible because \(R_{\mathrm{BB}}\) corrects for a drift in the \(k\)values for the Locarno Wolf numbers (Clette et al.2015). This drift was found by research aimed at explaining why the relationship between the \(\mathrm{F}_{10.7}\) radio flux and international sunspot number (Johnson 2011) broke down so dramatically just after the long and low activity minimum between Cycles 23 and 24. The Locarno \(k\)values were reassessed using the average of sixteen other stations (out of a total of about eighty) that provided nearcontinuous data over the 32year interval studied. The results showed that the Locarno \(k\)factors had varied by between \({+}\,15~\%\) in 1987 and \({}\,15~\%\) in 2009. Before these tests were made, the Locarno \(k\)values formed the “backbone” of the international sunspot number series and were assumed to be constant. We note that this drift of 30 % occurred, and went undetected, in this key backbone for twentytwo years in the modern era, despite there being at least eighty observatories available, and with defined and agreed procedures and related test data available such as \(\mathrm{F}_{10.7}\). We have to be aware that in earlier times, with fewer stations, less welldefined procedures, less stable instrumentation, and with fewer (if any) data to check against, larger drifts will almost certainly have occurred in the prior “backbone” data series that are daisychained to generate \(R_{\mathrm{BB}}\).
Using ionosonde data, we can only test the sunspotnumber series back to 1932. But even at this relatively late date, the tests using the Slough and Washington ionosonde data indicate that \(R_{\mathrm{BB}}\) is significantly too large. Given the daisychaining of intercalibrations involved in the construction of \(R_{\mathrm{BB}}\), all values before 1945 need to be 12 % lower (relative to modern values) to make proper allowance for the Waldmeier discontinuity. However, the difference between \(R\) (or \(R_{\mathrm{C}}\)) and \(R_{\mathrm{BB}}\) also grows increasingly large as one goes back in time (see Article 2, Lockwood et al.2016a): from the study presented here we cannot tell if this trend has the same origin as the detected difference during Cycle 17; however, Cycle 17 is consistent with the longerterm trend. That an error as large as 12 % can be found in \(R_{\mathrm{BB}}\) as late as 1945 does not give confidence that there are not much larger errors in \(R_{\mathrm{BB}}\) at earlier times.
Acknowledgements
The authors wish to thank the staff of a number of data centres: the Slough foF2 data and the Greenwich whitelight facular area data were downloaded from the World Data Centre (WDC) for Solar Terrestrial Physics, which is part of the UK Space Science Data Centre (UKSSDC) at RAL Space, Chilton, UK; the Washington foF2 data were downloaded from Space Weather Services in Sydney, Australia, part of the Australian Bureau of Meteorology; the international sunspot numbers (version 1) were downloaded (after July 2015 and from the archive section) from the WDC for the sunspot index part of the Solar Influences Data Analysis Center (SIDC) at the Royal Observatory of Belgium. The RGO sunspot group data made available by the Space Physics website of the Marshall Space Flight Center (MSFC). The work of M. Lockwood, C.J. Scott, M.J. Owens, and L.A. Barnard at Reading was funded by STFC consolidated grant number ST/M000885/1.
Disclosure of Potential Conflicts of Interest
The authors declare that they have no conflicts of interest.
Funding information
Funder Name  Grant Number  Funding Note 

Science and Technology Facilities Council 

Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.