Tests of Sunspot Number Sequences: 4. Discontinuities Around 1946 in Various Sunspot Number and SunspotGroupNumber Reconstructions
Abstract
We use five test data series to search for, and quantify, putative discontinuities around 1946 in five different annualmean sunspotnumber or sunspotgroupnumber data sequences. The data series tested are the original and new versions of the Wolf/Zürich/International sunspot number composite [\(R_{\text{ISNv1}}\) and \(R_{\text{ISNv2}}\)] (respectively Clette et al. in Adv. Space Res.40, 919, 2007 and Clette et al. in The Solar Activity Cycle35, Springer, New York, 2015); the corrected version of \(R\)_{ISNv1} proposed by Lockwood, Owens, and Barnard (J. Geophys. Res.119, 5193, 2014a) [\(R _{\mathrm{C}}\)]; the new “backbone” groupnumber composite proposed by Svalgaard and Schatten (Solar Phys.291, 2016) [\(R_{\text{BB}}\)]; and the new groupnumber composite derived by Usoskin et al. (Solar Phys.291, 2016) [\(R_{\text{UEA}}\)]. The test data series used are the groupnumber [\(N_{\mathrm{G}}\)] and total sunspot area [\(A _{\mathrm{G}}\)] from the Royal Observatory, Greenwich/Royal Greenwich Observatory (RGO) photoheliographic data; the Ca K index from the recent reanalysis of Mount Wilson Observatory (MWO) spectroheliograms in the Calcium ii K ion line; the sunspotgroupnumber from the MWO sunspot drawings [\(N_{\text{MWO}}\)]; and the dayside ionospheric F2region critical frequencies measured by the Slough ionosonde [foF2]. These test data all vary in close association with sunspot numbers, in some cases nonlinearly. The tests are carried out using both the beforeandafter fitresidual comparison method and the correlation method of Lockwood, Owens, and Barnard, applied to annual mean data for intervals iterated to minimise errors and to eliminate uncertainties associated with the precise date of the putative discontinuity. It is not assumed that the correction required is by a constant factor, nor even linear in sunspot number. It is shown that a nonlinear correction is required by \(R_{\mathrm{C}}\), \(R_{\mathrm{BB}}\), and \(R_{\text{ISNv1}}\), but not by \(R_{\text{ISNv2}}\) or \(R_{\text{UEA}}\). The five test datasets give very similar results in all cases. By multiplying the probability distribution functions together, we obtain the optimum correction for each sunspot dataset that must be applied to prediscontinuity data to make them consistent with the postdiscontinuity data. It is shown that, on average, values for 1932 – 1943 are too low (relative to later values) by about 12.3 % for \(R_{\text{ISNv1}}\) but are too high for \(R_{\text{ISNv2}}\) and \(R_{\mathrm{BB}}\) by 3.8 % and 5.2 %, respectively. The correction that was applied to generate \(R_{\mathrm{C}}\) from \(R\)_{ISNv1} reduces this average factor to 0.5 % but does not remove the nonlinear variation with the test data, and other errors remain uncorrected. A valuable test of the procedures used is provided by \(R_{\text{UEA}}\), which is identical to the RGO \(N_{\mathrm{G}}\) values over the interval employed.
Keywords
Sunspot number Historic reconstructions Calibration Longterm variation1 Introduction
The sunspotgroup number [\(R_{\mathrm{G}}\)] was introduced by Hoyt, Schatten, and NesmeRibes (1994) and Hoyt and Schatten (1998). For after about 1900 it matches quite well the behaviour of sunspot numbers, such as Version 1 of the Wolf/Zürich/International sunspot number composite [\(R_{\text{ISNv1}}\): Clette et al.2007], but is well known to be significantly lower for earlier years (e.g. Lockwood, Owens, and Barnard, 2014a; 2014b). This topical issue includes two articles detailing two new sunspotgroupnumber series that are intended to be homogeneous and of stable calibration. Svalgaard and Schatten (2016) have proposed the “backbone” sunspotgroup series [\(R_{\text{BB}}\)], and Usoskin et al. (2016) proposed a groupnumber series that is here termed \(R_{\text{UEA}}\). Compared to the (suitably scaled) original \(R_{\mathrm{G}}\), both of these new groupnumber data series give higher values before 1900, but in the case of \(R_{\text{BB}}\), they are radically higher. The main differences between \(R_{\text{BB}}\) and \(R_{\text{UEA}}\) arise from the method used to calibrate the historic data. The backbone series passes the calibration from one dataset to an adjacent one using a relationship between the two (usually a regression fit for the period of overlap between the two). This is called “daisychaining”, and the problem with this method is that both systematic and random errors, compared to modern values, compound as one goes back in time. Furthermore, as discussed in Article 3 of this series (Lockwood et al.2016c), there are problems and pitfalls with regression techniques in general, and there are particular concerns about the way that they were implemented by Svalgaard and Schatten (2016) in the generation of \(R_{\text{BB}}\) (specifically, the assumption that data from different observers are proportional to each other is not generally correct in either principle or practice). Usoskin et al. (2016) avoided all of these pitfalls, and the potential for error propagation inherent in daisychaining, by devising a method that calibrates all data against one standard dataset. Note that, in general, observed groupnumbers from different observers vary nonlinearly (Usoskin et al.2016; Lockwood et al.2016c).
In addition to these new groupnumber series, a new version of the Wolf/Zürich/International sunspotnumber composite (ISN Version 2, \(R_{\text{ISNv2}}\)) has recently been issued by the Solar Influences Data Analysis Center (SIDC, the Solar Physics research department of the Royal Observatory of Belgium). Like \(R_{\text{BB}}\), this uses daisychaining of calibrations and, also like \(R_{\text{BB}}\), gives higher values for the eighteenth and nineteenth centuries (Clette et al.2015). A less “rootandbranch” approach to correcting \(R_{\text{ISNv1}}\) was taken by Lockwood, Owens, and Barnard (2014a), who made simple corrections for errors to generate a “corrected” series [\(R_{\mathrm{C}}\)]. It should be noted that because \(R_{\mathrm{C}}\) makes corrections at only two dates in the series, other errors in \(R_{\text{ISNv1}}\), such as the recently revealed error in modern data that is due to the drift in the Locarno standard (Clette et al.2015), are carried forward and not corrected.
This article concentrates on differences between these sunspotnumber and sunspotgroupnumber data series in the twentieth century, specifically around 1946. Larger differences, inferred from geomagneticactivity data, lowlatitude auroral sightings, and cosmogenic isotope abundances in ice sheets, tree trunks, and meteorites, are found for earlier years, which are discussed in Article 2 (Lockwood et al.2016b) and in the article by Asvestari et al. (2016). Changes around 1946 are of interest as there has been discussion about a putative inhomogeneity in the calibration of the original Zürich sunspotnumber data series [\(R_{\text{ISNv1}}\)] that has been termed the “Waldmeier discontinuity”, as discussed in Article 1 (Lockwood et al.2016a). This is thought to have been caused by the introduction of a weighting scheme for sunspot counts according to their size, a change in the procedure used to define a group, and, in particular, the “evolutionary” aspect of the new sunspotgroup classification scheme (called the Zürich scheme) introduced by Waldmeier (Waldmeier 1947; Kiepenheuer 1953). This raises two important questions: i) What is this the correct quantification of this effect? ii) Which datasets employed the Zürich classification scheme and so would be subject to any such effect or may have been recalibrated using the Zürich data? It is now agreed that \(R_{\text{ISNv1}}\) needs correcting for this effect, but it is unclear if, why, and how it influences other data series. Tests comparing against ionospheric data (Lockwood et al.2016a), auroral sightings, and geomagnetic data (Lockwood et al.2016b) all suggest that, somehow, an excessive or inappropriate allowance for the Waldmeier discontinuity has been introduced into \(R_{\text{BB}}\).
In the past, corrections to sunspot numbers have often been applied by taking ratios, which implicitly assumes that proportionality between the different data applies. This is often not the case (Lockwood et al.2016c). A particular problem occurs when sunspot numbers are small because the errors in such ratios become highly asymmetric, and both the ratio and its error tend to infinity if the denominator approaches zero. Two ways of avoiding this (in its most extreme form) have been employed. The first is to consider ratios only when the denominator exceeds an arbitrarily chosen threshold (e.g. Svalgaard 2011), but this preferentially removes sunspotminimum values, which do not always go to zero. The second way is to employ averages over one or more solar cycles so that the denominator remains large (outside grand minima): this matches longterm average values, but loses information about cycle amplitudes (because values at sunspot minimum do not always fall to zero). Consequently, Lockwood, Owens, and Barnard (2014a) devised two different procedures to test for discontinuities. The first fits the same polynomial form of a proxy or test dataset to two intervals, one before the putative error, one after it, and studies the probability of the difference in the mean fit residual for the “before” and “after” intervals. The second method looks at the effect of a full range of assumed discontinuities on the correlation between the data and the test data. Generally the methods provide similar answers, but uncertainties are lower for the fitresidual procedure, so that it is the more stringent test. We here make a number of improvements to the implementation of the Lockwood, Owens, and Barnard (2014a) methods.
In the original analysis of the Waldmeier discontinuity by Svalgaard (2011), it was assumed that the correction required was a single multiplicative (“inflation”) scaling factor [\(f_{\mathrm{R}}\)], such that before the discontinuity the data were adjusted by multiplying by \(f_{\mathrm{R}}\) (i.e. the corrected sunspot number is \(R' = f_{\mathrm{R}} R\)). This assumption was also used by Lockwood, Owens, and Barnard (2014a) and Lockwood et al. (2016a). In general, it is not clear what the functional form of the correction for the Waldmeier discontinuity should be and it will be different for different sunspotnumber and groupnumber series, depending on how they were compiled. Svalgaard, Cagnotti, and Cortesi (2016) and Clette and Lefèvre (2016) have analysed the effect on Zürich sunspot numbers by applying both the pre1946 and post1946 procedures to modern data. The effects depend on timescale and, in general, are nonlinear in \(R\). The effect on annual averages is not as clear as for daily or monthly means.
In addition, Clette and Lefèvre (2016) make the valuable point that there are other factors that may have influenced the correction factor derived by Lockwood, Owens, and Barnard (2014a). The first is that other errors in the data series may be influencing the optimum correction for the Waldmeier discontinuity. The second is that the precise date of the discontinuity [\(t_{\mathrm{d}}\)] has an effect and is not known because Waldmeier’s documentation is not clear on when the changes were actually implemented. Clette and Lefèvre (2016) made use of the ratio of \(R/R_{\mathrm{G}}\) to define \(t_{\mathrm{d}}\), something that had been avoided by Lockwood, Owens, and Barnard (2014a) because the error in such ratios tends to infinity when \(R_{\mathrm{G}}\) tends to zero and \(R_{\mathrm{G}}\) has a minimum in 1944, just before the putative discontinuity: hence changes would naturally become more apparent as sunspots began to rise in the subsequent cycle. From the \(R/R_{ \mathrm{G}}\) ratio, Clette and Lefèvre (2016) placed the discontinuity in 1946 (whereas Lockwood, Owens, and Barnard 2014a and Lockwood et al.2016a used 1945), although they noted that there is some documentary evidence that at least some of the new procedures that are thought to be the cause of the discontinuity were in use earlier than this date. Clette and Lefèvre (2016) analysed the effects of both the start date of the comparison and the assumed discontinuity date [\(t_{\mathrm{d}}\)] on the \(R_{\text{ISNv1}}\) correction. They reproduced the Lockwood, Owens, and Barnard (2014a) values when using the same dates; however, they found that the required correction could be larger if other dates were adopted. The analysis presented in this article makes improvements to the procedure of Lockwood, Owens, and Barnard (2014a) to remove these potential uncertainties.
2 Analysis
The analysis presented here employs five test data series and is applied to five tested sunspot reconstructions.
2.1 Tested Sunspot Data Series
Sunspot data series tested
Symbol  Name  Brief description  Reference(s) 

\(R_{\text{ISNv1}}\)  Wolf/Zürich/International sunspot number, Version 1  Sunspotnumber composite compiled at Zürich observatory and then the Royal Observatory of Belgium. Used as the standard series until July 2015  
\(R_{\mathrm{C}}\)  Corrected sunspot number  \(R_{\text{ISNv1}}\) with simple corrections for discontinuities at 1945 and 1849  Lockwood, Owens, and Barnard (2014a) 
\(R_{\text{ISNv2}}\)  Wolf/Zürich/International sunspot number, Version 2  Sunspotnumber composite from the same data as used to generate \(R_{\text{ISNv1}}\) with a number of corrections. Used as the standard series after July 2015  Clette et al. (2015) 
\(R_{\text{BB}}\)  Backbone sunspotgroup number  Sunspotgroupnumber composite compiled from various observers using the “backbone” method  Svalgaard and Schatten (2016) 
\(R_{\text{UEA}}\)  Usoskin et al. sunspotgroup number  Sunspotgroupnumber composite compiled from various observers using the statistics of activeday fractions. It equals the RGO groupnumber [\(N_{\mathrm{G}}\)] for the interval tested here.  Usoskin et al. (2016) 
2.1.1 The Original Composite of the Wolf/Zürich/International Sunspot Number [\(R_{\text{ISNv1}}\)]
\(R_{\text{ISNv1}}\) is still available in the archive section of the SIDC website, but has not been updated since 01 July 2015. This is a composite of sunspot numbers, initially generated by Wolf and continued at the Zürich observatory until 1980 and then subsequently compiled by SIDC (until July 2015, when it was replaced by Version 2). This is the dataset that moved to the Zürich classification scheme and so will show all aspects of the Waldmeier discontinuity. As for all the tested data series, with the exception of that by Usoskin et al. (2016), the calibration is by daisychaining, i.e. the calibration is passed from one observer to the next (or previous) one by comparison of simultaneous data from both observers.
2.1.2 The New SIDC Composite of the Wolf/Zürich/International Sunspot Number [\(R_{\text{ISNv2}}\)]
\(R_{\text{ISNv2}}\) became SIDC’s default series on 01 July 2015. It corrects for a number of causes of longterm change in \(R_{\text{ISNv1}}\), including the Waldmeier discontinuity and the correction of a drift in the calibration of the main station (Locarno), which had varied by \({\pm}\,15~\%\) between 1987 and 2009 (Clette et al.2015). Note that this no longer uses the traditional scaling factor of 0.6 employed in \(R_{\text{ISNv1}}\).
2.1.3 The New “Backbone” GroupSunspot Number [\(R_{\mathrm{BB}}\)]
\(R_{\text{BB}}\) was proposed by Svalgaard and Schatten (2016). This groupnumber composite differs in its longterm variation from the Hoyt and Schatten (1998) groupnumber [\(R_{\mathrm{G}}\)] and dispenses with the scaling factor of 12.08 introduced by Hoyt, Schatten, and NesmeRibes (1994) and Hoyt and Schatten (1998) (to make means of \(R_{\mathrm{G}}\) and \(R_{\text{ISNv1}}\) the same in modern data). \(R_{\text{BB}}\) is the mean of the results of two different methods: taking such a mean has the problem that, although errors can be halved, any error in either method is propagated into the final result, something that can be avoided if a probabilistic combination technique is applied. The main method employed in the construction of \(R_{\text{BB}}\) involves daisychaining of compiled “backbone” data sequences using linear regression. The exception to this is the earliest join for which a different method is used: this is not within the interval studied here, however, and so this inhomogeneity in the series compilation is not a factor for this article. The assembly of the backbones assumes proportionality, and although their use reduces the number of linear regressions between backbones, it makes no difference to the number of observers through which the calibration is passed in the daisychaining. The second method involves taking the largest group number defined by any observer in each year and scaling this to a backbone series. That four such intervals are required implies the relationship of the highest value to the optimum values changes over time, and the calibration of this is again passed from one sequence to the previous one and hence this is also daisychaining. The daisychaining calibrations in \(R_{\text{BB}}\) assume not only linearity of the data between different observers, but also proportionality, which is not in principle correct and generated errors in the tests carried out by Lockwood et al. (2016c).
2.1.4 The “Corrected” Sunspot Number [\(R_{\mathrm{C}}\)]
\(R_{\mathrm{C}}\) was proposed by Lockwood, Owens, and Barnard (2014a) to provide a sensitivity analysis of the effect of different inputs to the modelling of open solar flux and streamerbelt width by Lockwood and Owens (2014). \(R_{\mathrm{C}}\) is based on \(R_{\text{ISNv1}}\) with the best estimate by Lockwood, Owens, and Barnard (2014a, 2014b) of the correction required for the Waldmeier discontinuity plus, for earlier times, the correction derived by Leussu, Usoskin, and Mursula (2013) using data by Schwabe, which applies to all data before 1848. The sequence was extended back to before the Maunder minimum using linearly regressed \(R_{\mathrm{G}}\) values. This series contains no correction for any other errors that have subsequently been revealed, such as the Locarno calibration error.
2.1.5 The Usoskin et al. GroupNumber Composite [\(R_{\text{UEA}}\)]
The composite groupnumber series assembled at the University of Oulu by Usoskin et al. (2016) [\(R_{\text{UEA}}\)] directly calibrates all data to the number of groups [\(N_{\mathrm{G}}\)] defined by observers of the RGO for 1900 – 1976 from the photoheliographic plates. Note that, like \(R_{\text{BB}}\), it does not employ the 12.08 scaling factor that was used in the generation of \(R_{\mathrm{G}}\). This series is unique in that it avoids using either daisychaining or regression techniques, and it makes no assumptions about linearity or proportionality between different datasets. For the test interval presented here (1920 – 1976), \(R_{\text{UEA}}\) and \(N_{\mathrm{G}}\) are identical. Note also that the original groupnumber by Hoyt and Schatten (1998) [\(R_{\mathrm{G}}\)] is also of the same in form as \(N_{ \mathrm{G}}\) over this interval (being \(12.08N_{\mathrm{G}}\) for the interval analysed here) and so tests of \(R_{\mathrm{G}}\) are not performed here as they would give identical answers to \(R_{\text{UEA}}\).
2.1.6 Summary of the Tested Data Series
The tested data series are summarised in Table 1. Figure 1a shows the five tested data series. Some are group numbers, while others are sunspot numbers, and they employ different scaling factors, as discussed above: hence, so that they can be compared in Figure 1, each has been regressed against the RGO sunspotgroup number [\(N_{\mathrm{G}}\)] over the interval 1921 – 1945. The start date of this interval is chosen to be after any interval when there are some concerns over the calibration of the RGO data (Cliver and Ling 2016); the end date is just before the Waldmeier discontinuity (Svalgaard 2011; Clette and Lefèvre 2016). Figure 1a shows that before 1946 (the vertical dot–dashed line) all the series are either identical (other than the scaling factors) or very similar indeed. In fact, \(R_{\mathrm{C}}\) is, by its definition, identical to \(R_{\text{ISNv1}}\) between 1848 and 1945 the scaled \(R_{\text{ISNv2}}\) is found to also be virtually identical to \(R_{\text{ISNv1}}\) for the interval 1921 – 1946. After 1946 it can be seen that these scaled variations diverge. Because some of the differences are rather small in Figure 1a, Figure 1b shows the deviations of each from the mean of the five scaled sequences [\(\Delta [R_{\mathrm{G}}]_{\text{fit}}\)]. The Waldmeier discontinuity is clear in \(R_{\text{ISNv1}}\) because after 1946 there are high positive values of this deviation around each sunspot maximum. Both \(R_{\text{ISNv2}}\) and \(R_{\text{BB}}\) show similar variations, but of the opposite sense to those for \(R_{\text{ISNv1}}\); the variations for \(R_{\text{BB}}\) being larger than those for \(R_{\text{ISNv2}}\). These deviations for \(R_{\mathrm{C}}\) and \(R_{\text{UEA}}\) oscillate around zero.
Great care must be taken when using linear regressions. For example, errors caused by inadequate and/or inappropriate regression techniques were discussed by Lockwood et al. (2006) in relation to differences between reconstructions of the magnetic field in nearEarth space from geomagneticactivity data. Nau (2016) has neatly summarised the problems: “If any of the assumptions is violated (i.e., if there are nonlinear relationships between dependent and independent variables or the errors exhibit correlation, heteroscedasticity, or nonnormality), then the forecasts, confidence intervals, and scientific insights yielded by a regression model may be (at best) inefficient or (at worst) seriously biased or misleading.” In the context of sunspot numbers and sunspotgroup numbers, Lockwood et al. (2016c) found that the most complex problems were associated with nonnormal distributions of data errors (especially if linearity or proportionality was inappropriately assumed), which violate the assumptions made by most regression techniques: such errors should always be tested for before a correlation is used for any scientific inference or prediction (Lockwood et al., 2006, 2016c). A normal distribution of fit residuals can be readily tested for using a quantile–quantile (Q – Q) plot (e.g. Wilk and Gnanadesikan 1968). This is a graphical technique for determining whether two datasets come from populations with a common distribution; hence by making one of the datasets normally distributed, we can test the other to see if it also has a normal distribution. The lefthand panel of Figure 2 gives the corresponding Q – Q plots in which the ordered standardised fit residuals [\(\mathrm{e}_{(\mathrm{i}\mathrm{n})}/\sigma \), where \(\sigma \) is their standard deviation] are plotted as a function of quantiles of a standard normal distribution [\(\mathrm{F}_{\mathrm{N}}^{1}(i0.5/n)\)]. To be a reliable and useable regression fit, the points in a Q – Q plot should form a straight line along the diagonal as this shows the errors in the fitted data form a Gaussian distribution, which is one of the assumptions of leastsquares regression fitting. It can be seen that this condition is reasonably well met for \(R_{\text{ISNv1}}\) (and hence \(R_{\mathrm{C}}\)) and \(R_{\text{ISNv2}}\) (which are almost identical in form over the interval used) but not for \(R_{\text{BB}}\) (Figure 2d). Hence the error distribution for \(R_{\text{BB}}\) is not Gaussian. The form of Figure 2d suggests that the \(R_{\text{BB}}\) distribution has a different kurtosis (sharpness of peak) compared to \(N_{\mathrm{G}}\) and is asymmetric. This applies for all of the \(R_{\text{BB}}\) data series, but Figure 2 shows that it even applies for the interval of the regression shown here (1921 – 1945), over which \(R_{\text{BB}}\) and the other data series appear, at least visually, to be very similar (see Figure 1a). Hence Figure 2 stresses that although some linear regressions give valid Q – Q plots, others do not. In general, linear regression fits therefore cannot be relied upon and are used here in Figure 1 for illustrative purposes only in displaying the tested data series.
2.2 Test Data Series
Test data series used and the coefficients of the bestfit secondorder polynomial fit of test series \(x\) to \(N_{ \mathrm{G}}\), given by \([N_{\mathrm{G}}]_{\mathrm{fit}} = ax^{2} + bx + c\).
Symbol  Brief description  Reference(s)  Units  Secondorder polynomial fitcoefficients  

a  b  c  
\(N_{G}\)  The number of sunspotgroups identified from photographic plates by RGO observers  Annual mean of daily number  0  1  0  
\(A_{\mathrm{G}}\)  Corrected (for limb foreshortening) total sunspot area identified from photographic plates by RGO observers  10^{−6} of a solar hemisphere  −4.8253 × 10^{−7}  5.6452 × 10^{−3}  0.4232  
\(N_{\text{MWO}}\)  The number of sunspotgroups identified from solar drawings by MWO observers  Number of distinct groups in ten months  −5.1202 × 10^{−4}  0.2070  −0.5617  
CaKi  The Ca K line index from MWO observations  Bertello, Ulrich, and Boyden (2010)  –  5.5864 × 10^{−7}  2.5681 × 10^{−2}  −7.0114 
foF2  The mean dayside ionospheric F2layer critical frequency from the Slough ionosonde  MHz  6.5004 × 10^{−3}  2.2589  −10.9406 
2.2.1 Total Spot Area from the Greenwich Photoheliographic Results [\(A_{\mathbf{G}}\)]
The total sunspot area was computed (corrected for limb foreshortening) [\(A_{\mathrm{G}}\)] from the RGO dataset (also called the Greenwich Photoheliographic Results: GPR) (Baumann and Solanki 2005; Willis et al.2013a, 2013b). This dataset was compiled using whitelight photographs (photoheliograms) of the Sun from a small network of observatories to produce a dataset of daily observations between 17 April 1874 and the end of 1976, thereby covering nine solar cycles. The observatories used were The Royal Observatory, Greenwich (until 02 May 1949); the Royal Greenwich Observatory, Herstmonceux (03 May 1949 – 21 December 1976); the Royal Observatory at the Cape of Good Hope, South Africa; the Dehra Dun Observatory, in the North–West Provinces (Uttar Pradesh) of India; the Kodaikanal Observatory, in southern India (Tamil Nadu); and the Royal Alfred Observatory in Mauritius. Any remaining data gaps were filled using photographs from many other solar observatories, including the Mount Wilson Observatory, the Harvard College Observatory, Melbourne Observatory, and the US Naval Observatory. The sunspot areas were measured from the photographs with the aid of a large position micrometer (see Willis et al.2013a, 2013b and references therein). The \(A_{\mathrm{G}}\)values are the total sunspot area (umbrae plus penumbrae) and have been corrected for the effect of foreshortening, which increases as sunspots are closer to the limb of the solar disc.
2.2.2 The number of sunspot groups from the Greenwich photoheliographic results [\(N_{\mathbf{G}}\)]
The number of groups [\(N_{\mathrm{G}}\)] was computed from the same RGO photographs as were used to generate \(A_{\mathrm{G}}\). The RGO data did not employ the Zürich groupclassification scheme so that \(N_{\mathrm{G}}\) is not influenced by the Waldmeier discontinuity. It is well known that the RGO groupnumbers show a drift relative to the Zürich sunspot numbers (e.g. Jakimcowa 1966). This is not necessarily a calibration error as there are a number of ways in which it could have arisen from real changes in solar activity. The most obvious is that there has been a drift in the ratio of the number of individual spots to the number of spot groups, which would influence \(N_{\mathrm{G}}\) and sunspot numbers differently. However, in addition, over the same interval there has been a drift in the lifetimes of spot groups, giving an increase in the number of recurrent groups (groups that are sufficiently longlived to be seen for two or more traversals of the solar disc as seen from Earth) (Henwood, Chapman, and Willis 2010). This has the potential to have influenced group numbers derived using different classification schemes in different ways.
2.2.3 The Mount Wilson Ca K Index [CaKi]
Spectroheliograms in the ionized calcium K line Ca ii K (393.37 nm) were obtained between 1915 and 1985 using the 60foot solar tower at Mount Wilson Observatory as part of their solarmonitoring programme. Calibration of these images is, however, not straightforward. A new and homogeneous index quantifying the area of plages and active network in the Ca ii K line has been derived from the digitization of almost 40,000 photographic solar images by Bertello, Ulrich, and Boyden (2010) (here referred to as the Ca K index: CaKi). Although these data are available up to 1985, there were changes to the calibration procedure employed with stepwedge exposures used from 09 October 1961. Because we wish to exclude effects by inhomogenities in the data caused by such changes, and because for the purposes of this article the later data are not required, we here only employ CaKi data from before this date. Note that the Ca K index has a pronounced nonlinear variation with sunspot numbers (e.g. Foukal et al.2009).
2.2.4 The Slough F2 Layer Critical Frequencies [foF2]
Ionospheric F2 region critical frequencies are observed at Slough [foF2]. As discussed in Article 1 (Lockwood et al.2016a), the location of Slough means that the variation over each year is dominated by the plasma loss rate (and so by thermospheric composition), giving a dominant annual variation, as opposed to the semiannual variation that dominates at some other stations (Scott and Stamper 2015), and a close variation with sunspot numbers. Additional effects, quantified by the area of whitelight faculae, are small for the Slough data (Smith and King 1981), and Article 1 shows that the main effect of including them in quantifying the Waldmeier discontinuity is to increase noise levels. Hence in this article, Slough foF2 values are used without allowance for facular areas. In Article 1 (Lockwood et al.2016a), nine dayside Universal Times (UTs) were identified for which the correlation of foF2 with sunspot numbers (after the Waldmeier discontinuity) exceeds 0.99 for all of the sunspotdata series tested. Rather than treat these as independent data series, we average the nine together in the present article.
2.2.5 The Mount Wilson Observatory (MWO) SunspotGroup Number [\(N_{\text{MWO}}\)]
\(N_{\text{MWO}}\) has been compiled routinely from January 1917 onwards using the 150foot solar tower telescope from sketches of the solar disc. These data did not use the Zürich group classification scheme, employing instead the scheme originally developed by Hale and coworkers (Hale et al.1919). Thus \(N_{\text{MWO}}\) will not be influenced by the Waldmeier discontinuity. Because of different equipment and procedures, \(N_{\text{MWO}}\) does not vary linearly with \(N_{\mathrm{G}}\).
2.2.6 Summary of the Test Data Series
2.3 Analysis
Article 3 (Lockwood et al.2016c) shows that it is important not to force linear regression fits between different sunspotnumber sequences through the origin of the scatter plot. Doing so means that proportionality between the sequences is assumed and results in the inflation of solarcycle amplitudes in data from a loweracuity observer. Furthermore, Lockwood et al. (2016c) and Usoskin et al. (2016) showed that results from different observers often have a nonlinear dependence. Most previous studies of the Waldmeier discontinuity (Svalgaard 2011; Lockwood, Owens, and Barnard 2014a; Lockwood et al.2016a; Clette and Lefèvre 2016) implicitly made the assumption of proportionality because they assumed that correction for the Waldmeier discontinuity could be achieved using a single multiplicative factor. In this article, we do not make this assumption, instead we evaluate a correction for before the Waldmeier discontinuity from \(R\) to \(R '\) that is given by Equation (1). Adjusting the values before the putative Waldmeier discontinuity with the optimum \(f_{\mathrm{R}}\), \(n\), and \(\delta \) means that the sequence of older data is made consistent with the postdiscontinuity data.
Clette and Lefèvre (2016) made the valuable point that the precise date of the Waldmeier discontinuity is not known, and this can influence the results if the “before” and “after” intervals used in the method of Lockwood, Owens, and Barnard (2014a) end and start, respectively, at an assumed date for the discontinuity. (This is because if that date were wrong, some dataset that is from before the discontinuity can be placed in the after interval, or vice versa). Here we remove this dependency by ending the “before” interval in 1943 and starting the “after” interval in 1949. Thus, the precise date or the waveform of discontinuity does not have an effect, provided the bulk of it is within the sixyear interval around 1946, which is the most likely date defined by Clette and Lefèvre (2016). The length of the “before” and “after” intervals was varied until an optimum was achieved, as discussed below.
The procedure used was to first determine the exponent [\(n\)] and offset [\(\delta \)] required by Equation (1). Because these relate to the correction needed for a given tested sunspotnumber series, the same values of \(n\) and \(\delta \) are used when testing against all five test series. These values were obtained using the Nelder–Mead search procedure to find the optimum combination of \(n\), \(\delta \), and \(f_{\mathrm{R}}\) that made \(R '\) correlate best with each of the test data series for the period between the start of the “before” interval and the end of the “after” interval. Because the test series are so similar (see Figure 3), they gave very similar optimum \(n\), \(\delta \), and \(f_{\mathrm{R}}\) values, and the values of \(n\) and \(\delta \) adopted here were those for the test series that gave the highest correlation (which was invariably for the RGO sunspotgroupnumber \([N_{\mathrm{G}}]\)). Having defined the optimum values of \(n\) and \(\delta \), the procedure used was to vary the factor \(f_{\mathrm{R}}\) between 0.5 and 1.3 (in steps of 0.001) to evaluate the mean fit residuals in the “before” and “after” intervals.
Another valuable point made by Clette and Lefèvre (2016) is that if the “before” and “after” intervals are too long in duration, then other errors (such as the Locarno calibration error in the case of \(R_{\text{ISNv1}}\)) can enter into both the tested and test series and so influence the estimate of the discontinuity correction. On the other hand, if these intervals are too short, then the interannual variability that is due to “geophysical noise” in both the test and tested data will also degrade the final value. Hence an optimum compromise is needed. To reduce the number of variables, the “before” and “after” intervals were assigned the same duration [\(T\)]. The value of \(T\) was then varied between 1 year and 23 years (the latter using all the test data shown in Figure 3, except for the sixyear interval around the putative Waldmeier discontinuity). As expected from the above, both the lowest and the highest values of \(T\) gave a low peak value of \(p_{\mathrm{m}}\), and hence broad distributions [\(p_{0}(f _{\mathrm{R}})\)]. The narrowest \(p_{0}(f_{\mathrm{R}})\)distribution, giving the highest peak value [\(p_{\mathrm{m}}\)], was for \(T = 11\) years (approximately one full solar cycle). Hence we used a “before” interval of 1932 – 1943 and an “after” interval of 1949 – 1960, as this minimised the width of the overall probability distribution function obtained, and hence the uncertainties. This is the optimum compromise between having sufficient data points and minimising the potential to introduce other errors and discontinuities present in either data series.
3 Results
3.1 Results for \(R_{\text{ISNv1}}\)
The middle panel of Figure 5 shows the statistical significances of the difference between the \(r\) at general \(f_{\mathrm{R}}\) and the peak value using the same colour scheme. The black line shows the overall significance \(S_{0}(f_{\mathrm{R}})\), given by Equation (3).
Optimum values of the fitted values of \(\delta \), \(n\), and \(f_{\mathrm{R}}\) in Equation (1) for the five tested sunspot data series.
Symbol  δ  n  Optimum \(f_{\mathrm{R}}\)  Percent change required to “before” interval 

\(R_{\text{ISNv1}}\)  2.7309  1.0884  0.7350 ± 0.0231  +12.2787 ± 3.3692 
\(R_{\mathrm{C}}\)  3.4957  1.0950  0.6240 ± 0.0198  +0.4396 ± 3.0098 
\(R_{\text{BB}}\)  0.3108  1.0932  0.7410 ± 0.0191  −5.7380 ± 2.2532 
\(R_{\text{ISNv2}}\)  1.4938 × 10^{−4}  0.9967  0.9760 ± 0.0295  −3.7960 ± 2.9081 
\(R_{\text{UEA}}\)  0.0000  1.0000  1.0000 ± 4.7568 × 10^{−4}  +0.0050 ± 0.0476 
3.2 Results for \(R_{\text{ISNv2}}\)
This test finds that \(R_{\text{ISNv2}}\) overestimates the mean for the “before” interval by \(3.80 \pm 2.91~\%\). Thus the Waldmeier discontinuity has been slightly overestimated in \(R_{\text{ISNv2}}\). Note that the ideal value of zero is (just) outside the \(2\sigma \) uncertainty for \(R_{\text{ISNv2}}\). The very small \(\delta \) and the closeness of \(n\) to unity mean that the correction needed is very close to being proportional. Hence the correction in \(R_{\text{ISNv2}}\), although slightly too large, has removed the nonlinearity introduced by the changes made by Waldmeier.
3.3 Results for \(R_{\mathrm{C}}\)
3.4 Results for \(R_{\mathrm{BB}}\)
3.5 Results for \(R_{\text{UEA}}\)
This test of \(R_{\text{UEA}}\) shows that the procedure works well, and that when presented with one dominant correlation the other test series, which give slightly different optimum \(f_{\mathrm{R}}\), do not degrade the result.
4 Conclusions
We have tested five sunspot data series around the putative Waldmeier discontinuity in sunspot numbers around 1945 using five diverse test datasets that are all completely independent of the Zürich sunspot number, which is the source of this discontinuity. The test data are the sunspotgroup number from the RGO dataset [\(N_{\mathrm{G}}\)], the total sunspot area from the RGO dataset (corrected for foreshortening) [\(A _{\mathrm{G}}\)], the Mount Wilson Ca K index [CaKi], the Mount Wilson sunspotgroup number [\(N_{\text{MWO}}\)], and the ionospheric F2 region critical frequency observed at Slough [foF2]. We have tested various sunspot data series in two ways, using the fit residuals and using the correlation coefficient. In all cases, the results of these two methods are remarkably consistent, but the uncertainties are lower for the fit residual method. The most persistent difference between the two methods occurs for the ionospheric foF2 data, which are here not included in overall tests but are nevertheless plotted to show that these terrestrial data still give results that are consistent with those for the solar test data to within the \(2\sigma \) uncertainties. The diversity of the derivations and sources of these test series means that the chances that all suffer from the same error around 1946 are negligible and comparison shows random data noise differences between them (Figure 3) and not systematic errors.
Figure 10b is for \(R_{\text{ISNv1}}\) and the Waldmeier discontinuity is clearly visible in the blue line as low values during Solar Cycle 17. The red line demonstrates how effective the correction is – and this is true for all of the tested series. Figure 10c is for \(R_{\text{BB}}\) and the blue line shows that values in Cycle 17 are persistently too high. It is not at all clear how this has occurred because \(R_{\text{BB}}\) was compiled from various observers, most of whom did not change practices in defining groups when such changes were made at Zürich. However, it appears that \(R_{\text{BB}}\) has somehow been adjusted to allow for the Waldmeier discontinuity, and this adjustment is either not warranted or excessive. Figure 10d shows \(R_{\text{ISNv2}}\), and the Waldmeier discontinuity is much reduced compared to \(R_{\text{ISNv1}}\). However, there appears to be a slight overcorrection for the discontinuity, as values for Cycle 17 are slightly too high. This is consistent with the estimated inflation factors used to correct \(R_{\text{ISNv1}}\), which was 18 % (Clette and Lefèvre 2016), which is higher than the value for the mean of \(R_{\text{ISNv1}}\) over Cycle 17 of \(12.28 \pm 3.37~\%\) that was derived here. Figure 10d confirms the effects of the mean for \(R_{\text{ISNv2}}\) for Cycle 17 being too large by the \(3.80 \pm 2.91~\%\) that was derived in this article. Figure 10e shows the results for \(R_{\mathrm{C}}\) and, although a good match to the mean for Cycle 17 is obtained, the effects of the residual nonlinearity can be seen with values at both sunspot minimum and sunspot maximum being slightly low in \(R_{\mathrm{C}}\). Figure 10f shows the effects of the mean for \(R_{\text{UEA}}\); because the tested series and one of the test series are the same here, the blue and red lines are essentially identical and both match the main test series very well.
Table 3 gives the optimum corrections needed for the five tested sunspot data series. Direct and careful allowance for this discontinuity has been made in Version 2 of the Wolf/Zürich/International sunspot number [\(R_{\text{ISNv2}}\)] but we here show that the correction applied is slightly too large but does remove the nonlinearity inherent in \(R_{\text{ISNv1}}\). Note that because \(R_{\text{ISNv2}}\) is compiled by daisychaining of calibrations, this systematic error will be passed to all prior data. The correction used in the “backbone” sunspotgroup series [\(R_{\text{BB}}\)] of Svalgaard and Schatten (2016) is also too large. A large part of this is likely to be the 7 % correction introduced by Svalgaard and Schatten to allow for the “evolutionary” aspect of Waldmeier’s classification scheme, but it is not at all obvious that this is required for the data used to compile \(R_{\text{BB}}\). The backbone series is the only one not to give usable Q – Q plots when regressed against other sunspot series. From the analysis presented in Article 3 (Lockwood et al.2016c), some of the error probably has arisen from the use of linear intercorrelation of segments of annual mean data (when in general the relationship is nonlinear) and because fits were unnecessarily forced through the origin, which tends to amplify solarcycle amplitudes in fitted data. As for \(R_{\text{ISNv2}}\), \(R_{\text{BB}}\) uses daisychaining of calibrations, and this error will be passed to prior data and such errors will accumulate as one goes back in time.
The correction applied by Lockwood, Owens, and Barnard (2014a) to \(R_{\text{ISNv1}}\) to generate \(R_{\mathrm{C}}\) is designed to remove the Waldmeier discontinuity on average data series. These tests show that this is achieved, but that the nonlinear variation with the test data, as also found for \(R_{\text{ISNv1}}\) has not been removed. In addition, \(R_{\mathrm{C}}\) only considered two known errors and others certainly exist; for example the modern values were not corrected for the drift in the Locarno calibration values (Clette et al.2015).
Acknowledgements
The authors are grateful to staff and funders of the World Data Centres from where data were downloaded: specifically, the Slough foF2 data were obtained from WDC for Solar Terrestrial Physics, part of the UK Space Science Data Centre (UKSSDC) at RAL Space, Chilton, and the \(R_{\text{ISNv1}}\) and \(R\)ISNv2, data from the WDC for the sunspot index, part of the Solar Influences Data Analysis Center (SIDC) at the Royal Observatory of Belgium. We also thank David Hathaway and the staff of the Solar Physics Group at NASA’s Marshall Space Flight Center for maintaining the online database of RGO data used here. Other sunspot data used [\(R_{\text{BB}}\), \(R_{\mathrm{C}}\), \(N_{\text{MWO}}\), and \(R_{\text{UEA}}\)] were taken from the respective cited publications. This work has been funded by STFC consolidated grant number ST/M000885/1.
Disclosure of Potential Conflicts of Interest
The authors declare that they have no conflicts of interest.
Funding information
Funder Name  Grant Number  Funding Note 

Science and Technology Facilities Council 

Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.