1 Introduction

The total number of sunspots seen in the Sun varies with an approximately 11-year cycle. This cycle itself is not a regular one, its amplitude varies with time with no particular regularity and occasionally goes through extreme phases of high or low activity (Hathaway, 2015; Saha, Chandra, and Nandy, 2022). The solar cycle plays an important role in governing space weather that in turn has major impacts on our modern society. These include disruptions of satellite operations that impact telecommunication networks and global-positioning systems, geomagnetic storms that lead to electric power grid failures and air-traffic disruptions over polar routes (Schrijver et al., 2015). The economic cost of a severe magnetic storm, say, e.g., of the magnitude of the great magnetic storm of 1859 – the Carrington event – is estimated to be greater than the economic cost of hurricane Katrina (Committee on the Societal and Economic Impacts of Severe Space Weather Events: A Workshop, 2008). Long-term solar-activity variations have stimulated the growth of the field of space climate (Nandy and Martens, 2007) with the emerging understanding that solar-activity variations have profound impacts on planetary space environments, atmospheric evolution (Bharati Das et al., 2019; Basak and Nandy, 2021), and habitability (Nandy, Valio, and Petit, 2017; Nandy et al., 2021).

Such considerations have spurred the field of solar-cycle forecasting with diverse techniques employed to forecast upcoming solar cycles (Nandy, 2021) that is deemed to be one of the most outstanding challenges in heliophysics (Daglis et al., 2021). Petrovay (2020) classified such techniques into three groups: model-based methods, precursor methods, and extrapolation methods. Each has their strengths and weaknesses. Most importantly, the first two are closely connected with the physical insight of the solar dynamo that determines the solar cycle and the last one is model-agnostic and data-based. The solar magnetic cycle is thought to originate in a dynamo mechanism through complex, nonlinear interactions between magnetic fields and plasma flows in the Sun’s convection zone (Charbonneau, 2010). It is believed to be weakly chaotic in nature (Petrovay, 2020). The extreme conditions and turbulent nature of the solar convection zone, combined with a lack of observational constraints, make computational modeling of the solar dynamo mechanism quite challenging. There have been a few model-based forecasts for Solar Cycle 24 that has just concluded (Dikpati, De Toma, and Gilman, 2006; Dikpati and Gilman, 2006; Dikpati et al., 2007; Choudhuri, Chatterjee, and Jiang, 2007; Jiang, Chatterjee, and Choudhuri, 2007). However, these model-based forecasts were highly inconsistent – which was a result of differing assumptions regarding the turbulent nature of the Sun’s convection zone (Yeates, Nandy, and Mackay, 2008). A NASA-NOAA panel that typically attempts to generate a consensus prediction before the start of a sunspot cycle made an early forecast of a very strong solar Cycle 24, which proved to be incorrect. In fact, this panel revised its forecast to a weak Cycle 24 a few years following its first forecast. This indicates the uncertainty and challenges in predicting solar cycles. We note that terrestrial weather forecasting follows a similar route. Crucially, for the case of terrestrial weather forecasting the computational models are well established, observation provides much stronger constraints, and, compared to solar cycles, massive computational resources are utilized. Moreover, the solar dynamo model and its parameters are not as well constrained by observations as models for terrestrial weather forecasting.

A recently developed physical model-based forecasting scheme relied on coupling two distinct models of magnetic-flux transport on the Sun, namely a solar surface flux-transport model and a solar convection-zone dynamo model (Bhowmik and Nandy, 2018). This physics-based modeling technique was quite successful in hindcasting and matching nearly a century of solar-cycle observations and predicted a weak, but not insignificant Solar Cycle 25 similar to or slightly stronger than the previous cycle peaking in 2024 – 2025. Independent observations indicate that Solar Cycle 25 initiated in late 2019 (Nandy, Bhatnagar, and Pal, 2020). Given major advances in both understanding the theory of solar-cycle predictability, as well as application of data-based machine-learning techniques to solar-cycle forecasting, it would be interesting to assess at this juncture whether the best of these very diverse techniques result in predictions that are consistent with each other.

A recent review on progress in solar-cycle predictions by Nandy (2021) points out that divergences and inconsistencies in solar-cycle forecasts persist across sunspot Cycles 24 and 25; however, physical model-based predictions have converged for Solar Cycle 25. Nandy (2021) argues this convergence is based on insights into solar-cycle predictability gleaned in recent times. These insights include the understanding that long-term solar-cycle forecasts are not possible as theoretical processes limit the cycle memory across only one solar cycle (Karak and Nandy, 2012; Hazra, Brun, and Nandy, 2020). This is borne out by observations (Muñoz-Jaramillo et al., 2013). However, the analysis by Nandy (2021) indicates divergence across these physics-based and data-based, model-agnostic forecasting techniques. Can we bridge this gap? This is one of the central motivations of our work.

The last decade has found machine-learning techniques to be extraordinarily successful in making forecasts. They are particularly useful in those cases where a mechanistic model is either unavailable or poorly constrained, as is the case in many astrophysical and geophysical problems. They have played an increasingly important role in making data-based forecasts in several problems in solar physics (Bobra and Couvidat, 2015; Bobra and Ilonidis, 2016; Dhuri, Hanasoge, and Cheung, 2019; Sinha et al., 2022). A recent, comprehensive analysis by Sinha et al. (2022), in particular, indicates the fidelity of machine-learning models in the domain of flare forecasting. Starting with Koons and Gorney (1990) and Fessant, Bengio, and Collobert (1996), different neural networks have been used with varying degrees of success to forecast solar cycles (Pesnell, 2012), including a few recent ones (Pala and Atici, 2019; Covas, Peixinho, and Fernandes, 2019; Benson et al., 2020) who made forecasts for the ongoing Cycle 25.

Let us note that machine-learning techniques, particularly those based on deep neural networks, although sometimes spectacularly successful, are treated mostly as black boxes by most practitioners. This may lead to mistaken conclusions, particularly when applied to limited data (Riley, 2019) – as is the case of the solar cycle. Hence, it is necessary to choose machine-learning algorithms with care and to critically review their forecasts. In this paper, we show how to make such a choice with the specific example of solar-cycle forecasting.

We use four different machine-learning algorithms, all within a general framework called Recurrent Neural Networks (RNNs). The characteristic feature of RNNs is that their connection topology possesses cycles (Jaeger, 2001). They are well known for their ability to model sequential data. In addition, we use a simple linear Autoregressive (linear AR) algorithm, which we use as a reference to compare the performance of the different RNNs. Therefore, we use five different algorithms:

  • Linear Autoregressive (linear AR).

  • Echo State Network (ESN).

  • Vanilla Recurrent Neural Network (vanilla RNN) (Chen, 2016).

  • Long Short-Term Memory Networks (LSTMs) (Hochreiter and Schmidhuber, 1997).

  • Gated Recurrent Units (GRUs) (Chung et al., 2014).

A detailed mathematical treatment of all of these RNN architectures can be found in Goodfellow, Bengio, and Courville (2016). In our problem, the ESN architecture works the best. For this reason, in the next section, we concentrate on ESN. We mention the other algorithms only for comparison purposes with ESN.

The rest of this paper is organized in the following manner: in Section 2 we give a brief introduction to the ESN algorithm. To study its limitations we use data from simulations of a stochastic dynamo model that allows us to generate an infinite amount of data. We describe this model in Section 3. In Section 4, we apply the ESN to the real sunspot data. Finally, we present the conclusions. Important implementation details regarding how to manually choose certain parameters of the ESN are listed in Appendix A. Then, in Appendix B, the reader can also find how we train the remaining RNN models.

2 Echo State Networks

In this paper we obtain the best forecasts for a particular technique called Echo State Network (ESN), (Jaeger, 2001; Maass, Natschläger, and Markram, 2002; Jaeger and Haas, 2004; Lukoševičius and Jaeger, 2009). Within the machine-learning community the name ESN is more popular, whereas in the physics community reservoir computing is the most common name used to describe this network.

It has been used successfully to forecast delay differential equations, low-dimensional chaotic systems, and even large spatiotemporally chaotic systems (Pathak et al., 2018). This method has so far not been used to forecast the solar cycle, although theoretical considerations suggest that the solar dynamo mechanism can be represented by a system of delay differential equations (Wilmot-Smith et al., 2006).

Let us first briefly introduce the idea of ESN as applied to the problem of forecasting the next solar cycle, see Figure 1. At the heart of the algorithm lies a neural network with a large number of nodes – the reservoir. Every node of this network is called a neuron. The state of the reservoir is given by the state vector \(\boldsymbol {x}\) of dimension \(N\). Every neuron receives its input from all the neurons in the network (including itself), a different input signal \(\mathit {u_{\mathrm {in}}}\) (constant), and a feedback neuron. Each neuron is updated by operating a nonlinear function (often tangent hyperbolic) on a linear combination of its inputs each multiplied by a different random weight:

$$ \boldsymbol {x}(\mathit {t}+1) = \tanh \left ( \mathit {u_{\mathrm {in}}}\boldsymbol {w}_{\mathrm {in}}+ \mathbf {W}_{\mathrm {res}}\boldsymbol {x}( \mathit {t}) + \mathbf {W}_{\mathrm {fb}}\boldsymbol {y}(\mathit {t}) \right )\/. $$
(1)

The random weight that connects any two neurons of this network, \(\mathbf {W}_{\mathrm {res}}\), is given by the corresponding element of a large, sparse, random matrix whose spectral radius is less than unity. The connection weights between the feedback neuron and the reservoir are given by the feedback matrix \(\mathbf {W}_{\mathrm {fb}}\). The linear combination of the output of each individual neuron weighted by another set of weights, \(\mathbf {W}_{\mathrm {out}}\), is the output of the reservoir:

$$ \hat {\boldsymbol {y}}(\mathit {t}) = \mathbf {W}_{\mathrm {out}}\boldsymbol {x}(\mathit {t}) + \boldsymbol {b}\mathit {u_{\mathrm {in}}}. $$
(2)

First, we must train the reservoir. This proceeds in the following manner. A time series \(\boldsymbol {y}(1),\boldsymbol {y}(2),\ldots ,\boldsymbol {y}(\mathit {T})\) is fed into the feedback neuron sequentially. At every time instant an output of the reservoir, \(\hat {\boldsymbol {y}}(\mathit {t})\), is obtained. The weights \(\mathbf {W}_{\mathrm {out}}\) and the bias \(\boldsymbol {b}\) are chosen so as to minimize the ridge regression loss function (Shalev-Shwartz and Ben-David, 2014):

$$ \mathcal{C} \equiv \frac{1}{\mathit {T}}\sum _{\mathrm{\mathit {t}=1}}^{\mathit {T}}|| \boldsymbol {y}(\mathit {t}) - \hat {\boldsymbol {y}}(\mathit {t})||_{2}^{2} + \mathit{\beta} || \mathbf {W}_{\mathrm {out}}||_{2}^{2}. $$
(3)

We can see that the loss function \(\mathcal{C}\) has two contributions. The first is the Mean Squared Error (MSE) between the true signal \(\boldsymbol {y}\) and the forecast \(\hat {\boldsymbol {y}}\). We use the symbol \(||\cdot ||_{2}^{2}\) to indicate the square of the second norm. The second term \(\mathit{\beta} ||\mathbf {W}_{\mathrm {out}}||_{2}^{2}\) is a penalty on the second norm of \(\mathbf {W}_{\mathrm {out}}\) – this is called L2 regularization. The constant \(\mathit{\beta}\) controls the strength of this penalty term. Regularization tries to avoid the overfitting problem. If we optimize \(\mathbf {W}_{\mathrm {out}}\) such that the output of the reservoir is a very good approximation to the training data the forecast may actually become less reliable. A central feature of machine-learning techniques in general and ESN in particular is that to obtain a reliable forecast often a very large amount of training data is necessary. The forecast is expected to get better the longer the training period is, but there is no a-priori constraint on how long a training period is appropriate. This is true for almost any problem in natural sciences, particularly so for the case of forecasting of solar cycles.

Figure 1
figure 1

The reservoir is a collection of \(N\) nodes. The state of the reservoir is given by the state vector \(\boldsymbol {x}\). The connections between nodes of the reservoir, \(\mathbf {W}_{\mathrm {res}}\), depicted by red arrows, are taken from a large, sparse, random matrix with a spectral radius less than unity. During training, the time series, \(\boldsymbol {y}(1),\boldsymbol {y}(2),\ldots ,\boldsymbol {y}(\mathit {T})\), is fed into the feedback neuron. The update rule for each node is \(\boldsymbol {x}(\mathit {t}+1) = \tanh \left ( \mathit {u_{\mathrm {in}}}\boldsymbol {w}_{\mathrm {in}}+ \mathbf {W}_{\mathrm {res}}\boldsymbol {x}( \mathit {t}) + \mathbf {W}_{\mathrm {fb}}\boldsymbol {y}(\mathit {t}) \right )\), where \(\boldsymbol {w}_{\mathrm {in}}\) and \(\mathbf {W}_{\mathrm {fb}}\) are random weights and \(u_{\mathrm{in}}\) is a constant. The output of the reservoir, \(\hat {\boldsymbol {y}}\), is a weighted sum – with weights \(\mathbf {W}_{\mathrm {out}}\) – over the state of the reservoir. To forecast, the output of the reservoir is fed into the feedback neuron.

The sunspot data is a one-dimensional time series \(\mathit {z}(1),\mathit {z}(2),\ldots ,\mathit {z}(\mathit {t}),\ldots , \mathit {z}(\mathit {T})\). The sunspot time series can be directly used in the ESN if we treat \(\boldsymbol {y}(t)\) as a scalar, i.e., \(\boldsymbol {y}(\mathit {t}) = \mathit {z}(\mathit {t})\). In this case the output \(\hat {\boldsymbol {y}}(\mathit {t}) = \hat{\mathit {z}}(\mathit {t})\) is also a scalar. We show in Section 3, see Figure 2, that this direct method generates decent, but not very good forecasts, at later times. To improve the forecast at later times, we develop a variation on the standard ESN algorithm – we call this algorithm Modified Echo State Network (MESN). This algorithm constitutes three changes in data preparation and use. One, instead of feeding the input signal one data point at a time, at time \(t\) we construct a \(\mathit{p}\)-dimensional vector \(\boldsymbol {y}(t) = [\mathit {z}(\mathit {t}), \mathit {z}(\mathit {t}-1), \ldots , \mathit {z}(\mathit {t}-\mathit{p}+1)]^{\top}\) that contains a history of \(\mathit{p}\) values. This vector is fed to the feedback neuron as per Equation (1). Two, we change the dimension of the output of the reservoir, such that we no longer forecast one time instant after every update of the reservoir, but we forecast a \(\mathit{q}\)-dimensional vector \(\hat {\boldsymbol {y}}(\mathit {t}) = [\hat{\mathit {z}}(\mathit {t}),\hat{\mathit {z}}(\mathit {t}+1), \ldots ,\hat{\mathit {z}}(\mathit {t}+\mathit{q}-1)]^{\top}\) at time \(t\) as per Equation (2). Three, the target in Equation 3 is no longer \(\boldsymbol {y}(\mathit {t})\) but rather the future points \(\boldsymbol {y}^{\ast }(\mathit {t}) = [\mathit {z}(\mathit {t}),\mathit {z}(\mathit {t}+1),\ldots , \mathit {z}(\mathit {t}+\mathit{q}-1)]^{\top}\).

Figure 2
figure 2

In (a), we show the dynamo signal from Hazra, Passos, and Nandy (2014). We train the reservoir with this signal except for the last four cycles, which we leave for testing the forecasts obtained. The dashed line separates the training signal from the test signal. In (b), the green dotted line represents the average of an ensemble of 10 independent forecasts of the last four cycles. The standard deviation of the ensemble is plotted in orange. The red line represents the test signal. The figure exhibits that the variance of the ensemble grows after the first cycle and that the forecast is no longer accurate. In (c), we show a zoomed-in plot of the forecast obtained for the first cycle.

3 Forecast for a Dynamo Model of Solar Cycle

To study how the algorithm performs when it is not constrained by too limited data we first apply it to a model of the solar dynamo. There are several low-dimensional, stochastic models that describe the same qualitative features as the global sunspot data, namely, oscillations whose frequency and amplitude may change abruptly from one cycle to another. We use a widely studied model for this purpose (Wilmot-Smith et al., 2006; Hazra, Passos, and Nandy, 2014; Tripathi, Nandy, and Banerjee, 2021). Specifically, we follow the prescription in Hazra, Passos, and Nandy (2014) and construct a solar dynamo model consisting of two coupled time-delay differential equations:

$$\begin{aligned} \frac{\mathrm {d}\mathit {B}(\mathit {t})}{\mathrm {d}\mathit {t}} = \frac{\mathit{\omega}}{\mathit{L}}\mathit {A}(\mathit {t}-\mathit {T}_{0}) - \frac{\mathit {B}(\mathit {t})}{\mathit{\tau}}, \end{aligned}$$
(4)
$$\begin{aligned} \frac{\mathrm {d}\mathit {A}(\mathit {t})}{\mathrm {d}\mathit {t}} = \alpha (\mathit {t}) f_{1}( \mathit {B}(\mathit {t}-\mathit {T}_{1}))\mathit {B}(\mathit {t}-\mathit {T}_{1})- \frac{\mathit {A}(\mathit {t})}{\mathit{\tau}} . \end{aligned}$$
(5)

The square of the signal \(\mathit {B}(\mathit {t})\) corresponds to the global sunspot number, that is \(\mathit {z}(\mathit {t})=\mathit {B}^{2}(\mathit {t})\) and \(\mathit {A}(\mathit {t})\) is the poloidal field strength. The parameters \(\mathit {T}_{0}\) and \(\mathit {T}_{1}\) control the time delay of the equations and the function \(\alpha (\mathit {t})\) is stochastic:

$$ \alpha (\mathit {t}) = \alpha _{0}[1+\frac{\delta}{100}\sigma (\mathit {t}, \mathit {\tau _{c}})] \/. $$
(6)

Here, \(\sigma (\mathit {t}, \mathit {\tau _{c}})\) is a uniform random number in the interval [-1, +1]. We draw a new random number from this distribution at every \(\mathit {\tau _{c}}\). The parameter \(\mathit{\delta} \in [0, 100]\) controls the strength of the noise. The function \(f_{1}\) is called the quenching factor and it involves the error function (\(\mathrm {erf}\)) and two thresholds \(\mathit {B}_{\mathrm {min}}\), \(\mathit {B}_{\mathrm {max}}\):

$$\begin{aligned} f_{1}(\mathit {B}(\mathit {t}-\mathit {T}_{1})) = \left [ \frac{1+\mathrm {erf}(\mathit {B}^{2}(\mathit {t}-\mathit {T}_{1})-\mathit {B}_{\mathrm {min}}^{2})}{2}\right ] \\ \left [ \frac{1-\mathrm {erf}(\mathit {B}^{2}(\mathit {t}-\mathit {T}_{1})-\mathit {B}_{\mathrm {max}}^{2})}{2}\right ] \/. \end{aligned}$$
(7)

We use the model parameters listed in Table 1 and we choose the initial conditions \(A(\mathit {t}\leq 0) = B(\mathit {t}\leq 0) = (1/2)(\mathit {B}_{\mathrm {min}}+\mathit {B}_{\mathrm {max}})\). We solve the differential equations from \(\mathit {t}=0\) to \(\mathit {t}=1100\) with a time step \(\mathrm {d}\mathit {t}=10^{-3}\). We use the trapezoidal rule to integrate the delay terms and a second-order Runge–Kutta method for the nondelay parts. Once we obtain the solution \(B(t)\) we sample it such that the final time series has a time step of \(10^{-1}\). This is necessary, as the ESN algorithm can not forecast far in time if it learns a time series with a very short time step. We divide the time series into two parts: a long training signal that contains 52 cycles and a short test signal that are the next 4 cycles, as shown in Figure 2(a). Before feeding the training signal into the reservoir, we normalize it by dividing it by the maximum amplitude found. This helps to prevent the saturation of the tanh function in Equation (1). We use a reservoir size of \(N=1000\) neurons and a regularization parameter \(\mathit{\beta}=10^{-5}\). As the connections \(\mathbf {W}_{\mathrm {res}}\) are initialized randomly, we obtain different forecasts every time we run it. For example, we run it 10 times, generating an ensemble of 10 independent forecasts. We take the mean of the forecasts as the final result that we compare against the test data. Figures 2(b) and (c) show that we obtain good agreement for the first cycle only, as for the next three cycles the variance of the ensemble grows and the mean falls far from the true signal. We note though that the forecast for the first cycle is accurate, with a test MSE of 8.89.

Table 1 Parameters of the dynamo model from Hazra, Passos, and Nandy (2014).

Qualitatively, this result is not a particularity of the test signal we decided to forecast. For example, if we train with 60 cycles instead of 52, we keep obtaining accurate forecasts for cycle 61 only. It is at this point when we state that our ESN is a physical model-validated recurrent neural network.

4 Application to Solar Cycle

Next, we apply all the RNN algorithms to forecast solar cycles. To be specific, let us consider the case of forecasting one particular cycle, say Cycle 23. We train the algorithms with the sunspot data with a thirteen-month running average till the end of Cycle 22. Then, we continue running to produce the forecast.

In Figure 3 we show the forecasts for Solar Cycles 22, 23, and 24 using five different algorithms. In red, we show the thirteen-month running average of the sunspot data, plotted with some width to distinguish it better from the other curves. Clearly, the linear Autoregressive (linear AR) algorithm performs the worst. For Cycle 22 ESN is the best in forecasting the peak followed by vanilla RNN, LSTM, and GRU. For Cycle 23 the ESN and the vanilla RNN are able to capture the first peak of the data reasonably well. The other algorithms forecast significantly lower number of sunspots near the peak. For Cycle 24 again ESN makes the best forecast. Both vanilla RNN and LSTM make reasonable forecasts but GRU forecasts a significantly larger number of sunspots near the peak than actually observed. Overall, the forecast for Cycle 22 is the least accurate. This may be because the earlier the cycle is, the less data we have to train the algorithms. In Appendix B we provide a detailed quantitative comparison that supports the qualitative discussion here. Note that both LSTM and GRU have more fitting parameters than ESN and also require more computing resources. In general, in machine-learning problems with limited data it is often observed that algorithms with too large a number of parameters perform badly due to overfitting, whereas algorithms with too few parameters also perform badly. We conclude that ESN is not only a physical model-validated network but also the appropriate algorithm for our purpose.

Figure 3
figure 3

Forecast for Solar Cycles 22 (a), 23 (b), and 24 (c) using five different algorithms: MESN, linear AR, vanilla RNN, GRU, and LSTM.

Next, we concentrate on our forecasts for Cycles 23, 24, and the ongoing Cycle 25 using ESN. In Figure 4, we show the mean (green) and standard deviation (orange) of an ensemble of 10 independent forecasts for each case. For Cycles 23 and 24 we compare our forecast with the thirteen-month running average of the sunspot data (red). We also indicate in light blue how noisy the original sunspot data was by painting the standard deviation of the running average. The forecast for Cycle 23, obtained with regularization \(\mathit{\beta}=10^{-7}\), is quite accurate till the first peak. The standard deviation of the forecast increases with time till the first peak, but after the peak it decreases. For Cycle 24, also with \(\mathit{\beta}=10^{-7}\), the overall agreement is better but the standard deviation is larger near the peak.

Figure 4
figure 4

Our forecasts for Cycles 23 (a, d), 24 (b, e), and 25 (c, f). For the past Cycles 23 and 24, the red line shows the number of observed sunspots as a function of time with a thirteen-month smoothing window. The shaded-blue region shows the standard deviation of the observation gleaned from the smoothing window. The top row (a, b, c) shows our forecasts with the ESN algorithm. The green dotted line is the forecast obtained by calculating the mean of the ensemble of forecasts gleaned from ten independent realizations. The orange-shaded region shows the standard deviation of the ensemble. The bottom row (d, e, f) shows the forecasts with the MESN.

By its construction, the reliability of the forecast decreases as time progresses. This is because the forecast of one point depends on the forecast of the previous one. Consequently, small errors at early times are gradually magnified as the forecast progresses. As described in Section 2, we develop a variation on the standard ESN algorithm, which we call Modified ESN (MESN) to improve the forecasts. All forecasts given by the MESN are done with \(\mathit{\beta}=10^{-10}\), \(\mathit{p}=15\), and \(\mathit{q}=129\). Note that the value of \(\mathit{q}\) is approximately the number of months of an 11-year cycle (\(11\times 12\)), which means we forecast one cycle in one go. The forecasts are plotted in the bottom panel of Figure 4. Compared to the standard algorithm, our modified algorithm not only gives better results when tested against the observation for Cycles 23 and 24, it also gives more robust forecasts, as the standard deviation of the ensemble is smaller.

Our forecasts for the ongoing Cycle 25 with both algorithms are shown in the right-most column of Figure 4. Using the standard algorithm with \(\mathit{\beta}=10^{-6}\) we forecast that the maximum of Cycle 25 is going to happen between May and June 2024 and that the maximum number of sunspots is going to be \(113\pm 15\). Our modified algorithm gives a maximum that is flatter, almost constant between June 2023 and August 2024, with a maximum number of \(124\pm 2\) sunspots. Note that the averaged sunspot data shows a distinctive two-peak behavior in both Cycles 23 and 24 – this behavior is not present in all sunspot cycles – that is not captured by either of our algorithms. We expect the same to happen for Cycle 25 – none of our algorithms can forecast whether it may or may not have this two-peak feature. Both forecasts show that Cycle 25 is expected to reach a minimum near the beginning of the year 2030. Qualitatively, Cycle 25 is going to be weaker than Cycle 23 but stronger than Cycle 24. We note this qualitative result differs from other machine-learning based forecasts (Pala and Atici, 2019; Benson et al., 2020). The first one forecasts that Cycle 25 will have similar strength to that of Cycle 23, while the second one forecasts that Cycle 25 will be slightly weaker than Cycle 24. However, our result agrees with recent physics-based forecasts (Bhowmik and Nandy, 2018).

Next, we show how robust our forecasts are with respect to when we stop training and start forecasting. We expect the following: near the minima of the cycle the signal-to-noise ratio in the sunspot data is the lowest. Hence, if we stop training at the lowest point of a cycle we expect the worst result if the level of noise is roughly constant as a function of time. A look at the sunspot data shows that the noise is not a constant but is actually significantly larger near the peak of the cycle – the noise increases as the signal increases. Nevertheless, the signal-to-noise ratio is less than unity near the minima of the cycles. In Figure 5 we show four representative cases with the standard ESN algorithm for Cycle 24. The best forecast is obtained when we are in the rising phase of the cycle. We also discover the standard ESN algorithm becomes unstable if the training is stopped very close to the present minima. The forecast for Cycle 25 is generated in 2020, when we are very near its minimum, hence we expect our forecast to improve if we recalculate our results introducing the new data that we have now in the middle of 2022.

Figure 5
figure 5

The forecasts depend crucially on when we stop training and start forecasting. We show four representative cases with the ESN for Solar Cycle 24. In each plot, the vertical dashed line on the left represents where we stop training and start forecasting, and the right dashed line indicates where we stop forecasting. The best forecast (d) is obtained when we stop training at the rising phase of the cycle.

We show the recalculated forecasts for each ESN algorithm in Figure 6. We observe that the ESN (a) gives a higher peak when trained with the new data, as the maximum amplitude has changed from \(113 \pm 15\) to \(131 \pm 14\) sunspots. The MESN (b) also increases the amplitude, from \(124 \pm 2\) to \(137 \pm 2\) sunspots, reaching better agreement with the ESN.

Figure 6
figure 6

New forecasts for the ESN (a) and MESN (b) algorithms when additional training data is available. The orange dashed line gives the mean and standard deviation of the ensemble of forecasts when data until 2020 is used. The green dotted line shows the new forecast when additional data until 2022 is available and utilized in the training.

5 Summary and Conclusion

We use five different algorithms based on Recurrent Neural Networks (RNNs) to forecast Solar Cycle 25. First, we try to forecast with them Cycles 23 and 24 and we realize that the Echo State Network performs the best. We note that the ESN performs fairly well for the last five cycles (20 – 24) except for Cycle 21. But the forecasts are inaccurate beyond one cycle, even in the case of the dynamo model for which we can generate very large amounts of data. This corroborates what physical model-based studies have already demonstrated – the theoretical processes limit the cycle memory across only one solar cycle (Karak and Nandy, 2012; Muñoz-Jaramillo et al., 2013; Hazra, Brun, and Nandy, 2020).

Although we have validated the ESN with a solar dynamo model, note that we have treated the dynamo model and the solar data separately. We have not first tuned the neural network on the simulations of the dynamo model and then fine tuned it on the solar data. We leave such a technique, called transfer-learning, as future work, which may improve our forecasts, although not beyond one cycle.

This motivates the design of the Modified ESN algorithm. The MESN produces more accurate and more robust forecasts, but it also remains limited to forecasting only one cycle ahead. We also note the forecasts given by both algorithms strongly depend on when we stop training and start forecasting, and that the best forecasts are obtained when we are at the rising phase of the cycle. Therefore, we use the data until 2022 to forecast sunspot Cycle 25. The two algorithms agree that Cycle 25 is going to last for about 10 years. They also agree on the approximate time location of the peak, the ESN places the peak in July 2024, while the MESN shows it in April 2024. As for the maximum number of sunspots, the ESN forecasts it to be \(131 \pm 14\), whereas the MESN forecasts \(137 \pm 2\). Hence, both algorithms forecast that Cycle 25 will be slightly stronger than Cycle 24, but weaker than Cycle 23.

We have taken a novel approach towards bridging the gap between physics-based and machine-learning-based solar-cycle forecasts by first applying our techniques with simulated data from a physics-inspired solar dynamo model and subsequently utilizing the same algorithm on observational data. Our forecast is consistent with physical model-based approaches.