Background

The present study aimed to develop tools for ionospheric parameter analysis and anomaly detection during ionospheric disturbances. The Earth’s ionosphere is part of the atmosphere, stretching from 80 to 1000 km and affecting radio wave propagation (Kato et al. 2009; Nakamura et al. 2009; Watthanasangmechai et al. 2012). Its structure is changeable and heterogeneous, and its investigation is based on the variation analysis of environmental registered parameters. The ionospheric parameters clearly change with the height, depend on the solar activity cycle, geomagnetic conditions, and geographic coordinates, and have characteristic diurnal and seasonal variations (Afraimovich et al. 2000, 2001; Nakamura et al. 2009; Watthanasangmechai et al. 2012; Danilov 2013). Ionospheric anomalies appear as significant deviations (increase or decrease) of the electron concentration in relation to the background level. During anomalies, local features of different shapes and durations are observed in the registered ionospheric parameters (Mandrikova et al. 2014a). In most cases, the ionospheric disturbances result from an increased solar and geomagnetic activity and, in seismically active regions, they can be observed during increased seismic activity (Afraimovich et al. 2000, 2001; Nakamura et al. 2009; Maruyama et al. 2011; Klimenko et al. 2012a, 2012b; Watthanasangmechai et al. 2012).

The most important tasks in ionospheric parameter processing and analysis are the monitoring of the ionospheric conditions and the detection of anomalies (Afraimovich et al. 2000, 2001; Liu et al. 2008a, 2008b; Nakamura et al. 2009; Watthanasangmechai et al. 2012; Danilov 2013; Ezquer et al. 2014; Zhao et al. 2014), which affect many aspects of our life and have a negative impact on satellite system operation and radio communication propagation. The problems associated with the analysis of ionospheric conditions and detection of anomalies have been addressed by many authors (Bilitza and Reinisch 2007; Liu et al. 2008a, 2008b; Nakamura et al. 2009; Maruyama et al. 2011; Klimenko et al. 2012a, 2012b; Oyekola and Fagundes 2012; Watthanasangmechai et al. 2012; Ezquer et al. 2014; Zhao et al. 2014). The main approaches include the traditional moving median method (Mikhailov et al. 1999; Afraimovich et al. 2000, 2001; Kakinami et al. 2010), ionosphere empirical models (Bilitza and Reinisch 2007; Nakamura et al. 2009; Klimenko et al. 2012b; Oyekola and Fagundes 2012; Watthanasangmechai et al. 2012), the application of adaptive algorithms based on neural networks (Martin et al. 2005; Nakamura et al. 2007, 2009; Mandrikova et al. 2012a, 2012b; Wang et al. 2013; Zhao et al. 2014), and the wavelet transform (Hamoudi et al. 2009; Kato et al. 2009; Mandrikova et al. 2012a, 2012b, 2013a, 2014a). At present, the International Reference Ionosphere (IRI) model (Jee et al. 2005; Bilitza and Reinisch 2007; Klimenko et al. 2012b; Oyekola and Fagundes 2012) is the best ionospheric empirical model. It is based on a wide range of ground and space data and, since its parameter estimation accuracy for a particular region depends significantly on the availability of local registered data, its results can largely deviate from the experimental data (Bilitza and Reinisch 2007; Ezquer et al. 2014). Therefore, the IRI-based forecasts are more accurate for mid-latitudes than for equatorial and auroral latitudes. Previous studies also showed that the accuracy of the IRI model depends on the level of solar activity, decreasing with a solar activity increase (Jee et al. 2005; Nakamura et al. 2009; Oyekola and Fagundes 2012). The recent development of empirical models using pattern recognition techniques and neural networks (Nakamura et al. 2007, 2009; Wang et al. 2013; Zhao et al. 2014) allowed for a significant improvement of the forecast quality in comparison with the IRI model, as they are easy to implement automatically and flexible enough. However, these models belong to the “black box” model class. Therefore, for feature spatial description, long training samples are required, which are prone to overfitting and can lead to unexpected results with very noisy data. The proposed multicomponent model (MCM) is based on autoregressive-integrated moving average models (ARIMA) (Box and Jenkins 1970), which allow obtaining quite accurate estimates with limited samples and, after the model identification phase, can be easily implemented automatically. Their main advantage is their mathematical basis and consequent ability to obtain results with a given confidence probability.

Previous investigation of the ionospheric parameters variation in the Kamchatka region showed a complex non-stationary structure, which significantly impedes the application of traditional classical methods for modeling and analysis of data. As the latest research (Huang et al. 1998; Odintsov et al. 2000; Rilling 2003; Huang and Wu 2008; Klionsky et al. 2008, 2009; Hamoudi et al. 2009; Kato et al. 2009; Yu et al. 2010; Akyilmaz et al. 2011; He et al. 2011; Mandrikova et al. 2012a, 2012b, 2014a; Ghamry et al. 2013; Zaourar et al. 2013) shows, the most natural and effective way of representing such data is the construction of non-linear adaptive approximating schemes. As a result, methods of empirical mode decomposition (Huang et al. 1998; Rilling 2003; Klionsky et al. 2008, 2009; Huang and Wu 2008; Yu et al. 2010) and adaptive wavelet decomposition (Hamoudi et al. 2009; Kato et al. 2009; Akyilmaz et al. 2011; He et al. 2011; Mandrikova et al. 2012a, 2012b, 2013a, 2014a; Ghamry et al. 2013; Zaourar et al. 2013) are being intensively developed at present. Given the large variety of orthogonal basis wavelets with compact support and the presence of numerically stable fast algorithms for data transformation, wavelet decomposition provides many possibilities for the analysis of data with a complex structure (Chui 1992; Daubechies 1992; Mallat 1999), including geophysical data (Hamoudi et al. 2009; Kato et al. 2009; Akyilmaz et al. 2011; He et al. 2011; Mandrikova et al. 2012a, 2012b, 2013a, 2014a; Ghamry et al. 2013; Zaourar et al. 2013). In this paper, a multiscale wavelet decomposition (MSA) of an ionospheric parameter time series was used. Based on the MSA, the time series was presented as different scale components with a simpler structure than the original series. This representation allowed the distinction of stationary components and the application of classical methods of time series modeling and analysis for their identification. As mentioned above, an ARIMA model class (Box and Jenkins 1970; Kay and Marple 1981; Basseville and Nikiforov 1993; Huang et al. 2013) was used in this study. Practical research has confirmed the power and flexibility of the ARIMA method in solving many applied problems (Box and Jenkins 1970; Basseville and Nikiforov 1993; Huang et al. 2013). At present, these methods are being developed in geophysical studies (Mabrouk et al. 2008; Huang et al. 2013; Mandrikova et al. 2013a, 2014a). However, there are some restrictions regarding their application to separate time series and determined regularities (Kay and Marple 1981; Huang et al. 2013; Mandrikova et al. 2013a, 2014a). The estimation, diagnostics, and optimization of ARIMA model parameters are based on the assumption that the data have a standard distribution, which is not always correct. Extending the application of these methods, we suggested a new MCM, based on the combination of wavelets with ARIMA models. This approach was proposed for the first time to reveal anomalies in subsoil radon data and proved to be efficient (Geppener and Mandrikova 2003). The present paper describes a method to construct and estimate an MCM. The efficiency of the suggested model was assessed using ionospheric data. The model allowed the elimination of noise, the simplification of the data structure, and the detection of stationary components liable for identification. We compared the obtained MCM with the IRI model and the moving median method, widely applied in the modeling and analysis of ionospheric parameters. The comparison showed promising results for the newly proposed method. To study ionospheric parameters in detail, we used the suggested modeling method combined with a continuous wavelet transform. The computational solutions allowed the detection of different scale anomalies in the ionosphere and the estimation of their occurrence time, duration, and intensity was based on the continuous wavelet transform.

Methods

MCM identification

Considering a random time series f 0 containing stationary components and noise, based on the multiscale wavelet decomposition up to the mth level, the f 0 time series was presented as a linear combination of multiscale components (Chui 1992; Daubechies 1992):

$$ {f}_0(t)={\displaystyle \sum_{j=-1}^{-m}}g\left[{2}^jt\right]+f\left[{2}^{-m}t\right] $$
(1)

where \( f\left[{2}^{-m}t\right]={\displaystyle \sum_k}{c}_{-m,k}{\varphi}_{-m,k}(t) \) is a smoothed component of a time series; coefficients c − m,k  = 〈 f, φ − m,k  〉 and φ − m,k (t) = 2− m/2 φ(2− m t − k) are a scaling function; \( g\left[{2}^jt\right]={\displaystyle \sum_k}{d}_{j,k}{\varPsi}_{j,k}(t) \) is the detailing components of a time series; and coefficients d j,k  = 〈 f, Ψ j,k  〉 and Ψ j,k (t) = 2j/2 Ψ(2j t − k) are the wavelet basis.

By changing the decomposition level m, we could obtain various representations of a time series. Our task was to determine the best representation that allowed the extraction of the stationary components from the noise and the acquisition of an adequate ARIMA model. The smoothed components of the wavelet decompositions f[2− m t] were less affected by the random factor than the detailing components g[2j t]. Therefore, the solution was based on the analysis of the smoothed components as follows:

Step 1. We performed multiscale wavelet decompositions of the time series to levels \( m=\overline{1,M} \) (the maximum acceptable decomposition level M was determined by the length N of the time series: M ≤ log2 N) and obtained a set of smoothed components: \( f\left[{2}^{-m}t\right]={\displaystyle \sum_k}{c}_{-m,k}{\varphi}_{-m,k}(t) \), \( m=\overline{1,M} \).

Step 2. We determined the stationary components from a set of f[2− m t] components, \( m=\overline{1,M} \). Applying the traditional approaches (Box and Jenkins 1970; Marple 1987), we determined the models from the ARIMA model class from the approximation of the f[2− m t] stationary components. Each component was represented as:

$$ {f}_{-m}(t)={\displaystyle \sum_k}{s}_{-m,k}{\varphi}_{-m,k}(t), $$

where \( {s}_{-m,k}={\displaystyle \sum_{l=1}^p{\gamma}_{-m,l}{\omega}_{-m,k-l}}-{\displaystyle \sum_{n=1}^h{\theta}_{-m,n}{a}_{-m,k-n}} \) is an estimated smoothed component, ω − m,k  = ∇ν c − m,k , ∇ν is a difference operator of ν order, p and γ − m,l are the order and parameters of a smoothed component autoregression, h and θ − m,n are the order and parameters of a moving average of a smoothed component, and a − m,k are the residual errors of the model.

Step 3. We estimated the component model errors as:

$$ {E}_m={\displaystyle \sum_{k=1}^K}{\displaystyle \sum_{q=1}^Q}{e}_{k+q}^m $$

where \( {e}_{k+q}^m={\left({s}_{-m,k+q}^{\mathrm{actual}}-{s}_{-m,k+q}^{\mathrm{predict}}\right)}^2 \) is the component model error at point k with time step q, \( {s}_{-m,k+q}^{\mathrm{actual}} \) are the actual values of a time series component, \( {s}_{-m,k+q}^{\mathrm{predict}} \) are the model values of a time series component, Q is the length of the data time step, and K is the length of a time series component.

Step 4. We considered that the best representation of a time series was the one corresponding to a multiscale wavelet decomposition to level m*, where \( {m}^{*}:{E}_{m^{*}}=\underset{m}{ \min }{E}_m \).

Step 5. We determined the stationary components from a set of detailing components g[2j t] and \( j=\overline{-1,-{m}^{*}} \). Applying the traditional approaches (Box and Jenkins 1970; Marple 1987), we determined the models from the ARIMA model class for the approximation of the stationary components g[2j t].

Step 6. Components g[2j t], which were not stationary, contained local features and noise and were investigated by another method.

Step 7. Using Eq. 1, we combined the obtained component models in a joint multi-component construction:

$$ {f}_0(t)={\displaystyle \sum_{\mu =\overline{1,T}}}{\displaystyle \sum_{k=\overline{1,{N}_j^{\mu }}}}{s}_{j,k}^{\mu }{b}_{j,k}^{\mu }(t) $$
(2)

where \( {s}_{j,k}^{\mu }={\displaystyle \sum_{l=1}^{p_j^{\mu }}}{\gamma}_{j,l}^{\mu }{\omega}_{j,k-l}^{\mu }-{\displaystyle \sum_{n=1}^{h_j^{\mu }}}{\theta}_{j,n}^{\mu }{a}_{j,k-n}^{\mu } \) is an estimated μth component, \( {p}_j^{\mu } \) and \( {\gamma}_{j,l}^{\mu } \) are the order and parameters of the μth component autoregression, \( {h}_j^{\mu } \) and \( {\theta}_{j,k}^{\mu } \) are the order and parameters of a moving average of the μth component, \( {\omega}_{j,k}^{\mu }={\nabla}^{\nu^{\mu }}{\beta}_{j,k}^{\mu } \), ν μ is the difference order of the μth component, \( {\beta}_{j,k}^1={c}_{j,k} \), \( {\beta}_{j,k}^{\mu }={d}_{j,k},\mu =\overline{2,T} \), Τ is the number of modeled components, \( {a}_{j,k}^{\mu } \) are the residual errors of the μth component model, \( {N}_j^{\mu } \) is the length of the μth component, \( {b}_{j,k}^1={\varphi}_{j,k} \) is a scaling function, and \( {b}_{j,k}^{\mu }={\varPsi}_{j,k},\mu =\overline{2,T} \) is a wavelet basis of the μth component.

For the prediction of \( {s}_{j,k+q}^{\mu } \), q ≥ 1 determines the prediction of \( {s}_{j,k}^{\mu } \) for point k and time step q. The \( {s}_{j,k+q}^{\mu } \) was determined based on the μth component model as follows:

$$ {s}_{j,k+q}^{\mu }={\displaystyle \sum_{l=1}^{p_j^{\mu }}}{\gamma}_{j,l}^{\mu }{\omega}_{j,k+q-l}^{\mu }-{\displaystyle \sum_{n=1}^{h_j^{\mu }}}{\theta}_{j,n}^{\mu }{a}_{j,k+q-n}^{\mu }. $$

The residual errors of the μth component model were determined as the difference between the actual and predicted values for point k + q: \( {a}_{j,k+q}^{\mu }={s}_{j,k+q}^{\mu, \mathrm{actual}}-{s}_{j,k+q}^{\mu, \mathrm{predict}} \).

Equation 2 represents the typical data changes. During abnormal data changes, the absolute residual errors of the component models rise. For this reason, the anomaly detection was based on the following conditional test:

$$ {\varepsilon}_{\mu }={\displaystyle \sum_{q=1}^{Q_{\mu }}}\left|{a}_{j,k+q}^{\mu}\right|>{T}_{\mu } $$
(3)

where Q μ is the length of the data time step based on the μth component model and T μ is the threshold value of the μth component defining the presence of an anomaly.

The T μ threshold in Eq. 3 was determined by the variance estimation of the data prediction errors (Box and Jenkins 1970):

$$ {T}_{\mu}\left({Q}_{\mu}\right)={\left\{1+{\displaystyle \sum_{q=1}^{Q_{\mu }-1}}{\left({\psi}_{j,q}^{\mu}\right)}^2\right\}}^{1/2}{\sigma}_{a_{j,k+q}^{\mu }}, $$
(4)

where \( {\psi}_{j,q}^{\mu } \) are the weighting coefficients of the μth component model, which can be determined by \( \left(1-{\gamma}_{j,1}^{\mu }B-{\gamma}_{j,2}^{\mu }{B}^2-\dots -{\gamma}_{j,{p}_j^{\mu }+{\nu}^{\mu}}^{\mu }{B}^{p_j^{\mu }+{\nu}^{\mu }}\right)\left(1+{\psi}_{j,1}^{\mu }B+{\psi}_{j,2}^{\mu }{B}^2+\dots \right)=\left(1-{\theta}_{j,1}^{\mu }B-{\theta}_{j,2}^{\mu }{B}^2-\dots -{\theta}_{j,{h}_j^{\mu}}^{\mu }{B}^{h_j^{\mu }}\right), \) where B is a backward shift operator: \( {B}^l{\omega}_{j,k}^{\mu }(t)={\omega}_{j,k-l}^{\mu }(t), \) \( {\psi}_{j,0}^{\mu }=0. \)

It is also possible to use the following probability limits:

$$ {T}_{\mu}\left({Q}_{\mu}\right)={u}_{\varepsilon /2}{\left\{1+{\displaystyle \sum_{q=1}^{Q_{\mu }-1}}{\left({\psi}_{j,q}^{\mu}\right)}^2\right\}}^{1/2}{\sigma}_{a_{j,k+q}^{\mu }}, $$
(5)

where u ε/2 is the quantile of the 1 − ε/2 level of the standard normal distribution.

Results and discussion

Construction of the MCM for the Kamchatka region

Model identification

For the model construction, we used hourly data of the ionospheric critical frequency f 0 F2 (Paratunka station, 52° 58′ N, 158° 15′ E, Kamchatka, Russia, Institute of Cosmophysical Research and Radio Wave Propagation FEB RAS (IKIR FEB RAS)) from 1968 to 2013. To determine the degree of geomagnetic disturbance, we used the K-index based on the Paratunka station geomagnetic data. To model the ionospheric parameters for a quiet period, the time intervals for a relatively calm geomagnetic field (sum of the daily K-indices ΣK < 24), without strong seismic events (without earthquakes of Ks ≥ 12, within a 300 km radius from the station), were used as estimates.

Considering the seasonality of ionospheric processes, the different seasons were modeled separately. The level of solar activity was also considered. A detailed description of the model identification and diagnosis for winter and summer is given below. The time intervals used for identification are shown in Table 1. The solar activity was estimated according to the average monthly radio radiation at a wavelength of f10.7. For f10.7 < 100, the activity was considered low, while for f10.7 > 100, it was considered high.

Table 1 Time intervals of the f 0 F2 data used for the construction of the multicomponent model (MCM)

The model identification was performed using the method described in “Model identification” section.

The multiresolution wavelet decomposition of the f o F2 data (Eq. 1) was performed using Daubechies wavelets of third order. The wavelet basis was chosen among other orthogonal functions and allowed us to perform a numerically stable multiscale wavelet decomposition of the data (Daubechies 1992). To determine the type of orthogonal wavelet, we applied the criterion suggested by Mallat (1999), which allowed the minimization of the number of approximated summands and approximation error. In the dictionary \( D={\displaystyle \underset{\lambda \in \varLambda }{\cup }}{W}^{\lambda } \) of orthonormal bases, the basis \( {W}^{\alpha }={\left\{{q}_z^{\alpha}\right\}}_{1\le z\le N} \) was better than the \( {W}^{\gamma }={\left\{{q}_z^{\gamma}\right\}}_{1\le z\le N} \) to approximate function f, if it gave the smallest error for the same number of approximating summands, i.e., for all Z ≥ 1,

$$ {\varepsilon}^{\alpha}\left[Z\right]\le {\varepsilon}^{\gamma}\left[Z\right] $$

where ε[Z] is the approximation error determined as \( {\varepsilon}^{\lambda}\left[Z\right]={\displaystyle \sum_{z\notin {I}_Z^{\lambda }}}{\left|\Big\langle f,{q}_z^{\lambda },\Big\rangle \right|}^2={||f||}^2-{\displaystyle \sum_{z\in {I}_Z^{\lambda }}}{\left|\Big\langle f,{q}_z^{\lambda },\Big\rangle \right|}^2 \), where I Z is the set of indices of power Z.

Using steps 1–4 from the “Model identification” section, we determined that the best representation of a time series corresponding to the multiscale wavelet decomposition to an m* = 3 level was:

$$ {f}_0(t)=f\left[{2}^{-3}t\right]+g\left[{2}^{-3}t\right]+{\displaystyle \sum_{j=-1}^{-2}}g\left[{2}^jt\right] $$
(6)

where \( f\left[{2}^{-3}t\right]={\displaystyle \sum_k}{c}_{-3,k}{\varphi}_{-3,k}(t) \) is the smoothed stationary component containing periods of more than 8 h, \( g\left[{2}^{-3}t\right]={\displaystyle \sum_k}{d}_{-3,k}{\varPsi}_{-3,k}(t) \) is the detailing stationary component containing periods of 8–16 h, and \( g\left[{2}^jt\right]={\displaystyle \sum_k}{d}_{j,k}{\varPsi}_{j,k}(t) \) and \( j=\overline{-1,-2} \) are the detailing components containing the local features and noise.

The obtained approximation (Eq. 6) agrees with Shi et al. (2015), who showed that the largest variance of the ionospheric periodic oscillations ranged 2–4 days and decreased with the period increase.

Equation 6 allowed us to conclude that the initial f o F2 series, and smoothed components f[2− m t], m = 1, 2 had a complex structure that could not be approximated by an ARIMA model. Figure 1 shows the autocorrelation function (ACF) of the original series and its first difference stationary and extracted components, confirming that the direct application of ARIMA methods will not adequately model the time series. The extracted components of a time series f[2− 3 t] and g[2− 3 t] had damped and partial autocorrelation functions of third order (Fig. 1), allowing the identification of an autoregressive model of third order (Box and Jenkins 1970) and the confirmation of the efficiency of the suggested method.

Fig. 1
figure 1

a Autocorrelation function (ACF) of the f o F2 time series, b ACF of the first difference of the f o F2 time series, c ACF of the f[2− 3 t] component, d partial ACF of the f[2− 3 t] component, e ACF of the g[2− 3 t] component, and f partial ACF of the g[2− 3 t] component for 19 December 2011–08 January 2012. The x-axis is marked by the delays used to calculate the autocorrelation coefficients

The estimation of the model parameters for the extracted stationary components f[2− 3 t] and g[2− 3 t] was performed using the traditional method (Box and Jenkins 1970; Marple 1987), showing a dependence with the season and level of solar activity (Tables 2, 3, 4, and 5).

Table 2 Estimated parameters for the smoothed f[2− 3 t] component model (winter)
Table 3 Estimated parameters for the smoothed f[2− 3 t] component model (summer)
Table 4 Estimated parameters for the detailed g[2− 3 t] component model (winter)
Table 5 Estimated parameters for the detailed g[2− 3 t] component model (summer)

According to Tables 2 and 4, we obtained the following models for winter according to Eq. 2. Regarding the models obtained without considering the first differences and depending on the solar activity, we obtained the following equations.

For a high solar activity:

\( {s}_{3,k}^1=16-0.22\cdot {c}_{3,k-1}-0.22\cdot {c}_{3,k-2}+0.77\cdot {c}_{3,k-3}+{a}_{3,k}^1(t) \) for the estimated component f[2− 3 t] and \( {s}_{3,k}^2=-0.14\cdot {d}_{3,k-1}-0.14\cdot {d}_{3,k-2}+0.83\cdot {d}_{3,k-3}+{a}_{3,k}^2(t) \) for the estimated component g[2− 3 t].

For a low solar activity:

\( {s}_{3,k}^1=11-0.19\cdot {c}_{3,k-1}-0.21\cdot {c}_{3,k-2}+0.75\cdot {c}_{3,k-3}+{a}_{3,k}^1(t) \) for the estimated component f[2− 3 t] and \( {s}_{3,k}^2=-0.29\cdot {d}_{3,k-1}-0.26\cdot {d}_{3,k-2}+0.69\cdot {d}_{3,k-3}+{a}_{3,k}^2(t) \) for the estimated component g[2− 3 t].

For a general model for high and low solar activities, obtained by considering the first difference, we obtained:

\( {s}_{3,k}^1=-0.62\cdot {\omega}_{3,k-1}^1-0.63\cdot {\omega}_{3,k-2}^1+0.36\cdot {\omega}_{3,k-3}^1+{a}_{3,k}^1(t) \) and \( {\omega}_{3,k}^1=\nabla {c}_{3,k} \) for the estimated component f[2− 3 t] and \( {s}_{3,k}^2=-0.97\cdot {\omega}_{3,k-1}^2-0.93\cdot {\omega}_{3,k-2}^2+{a}_{3,k}^2(t) \) and \( {\omega}_{3,k}^2=\nabla {d}_{3,k} \) for the estimated component g[2− 3 t].

Our results were based on the general model for high and low solar activities. According to this model, to obtain a winter forecast, four preceding forecasts were required, taking into account the difference of order ν = 1. For the initial hourly data and a decomposition level of m = 3, this corresponded to 32 h.

According to Tables 3 and 5, we obtained the following models for summer according to Eq. 2. For a high solar activity, we obtained:

\( {s}_{3,k}^1=-0.50\cdot {\omega}_{3,k-1}^1-0.58\cdot {\omega}_{3,k-2}^1+{a}_{3,k}^1(t) \) and \( {\omega}_{3,k}^1=\nabla {c}_{3,k} \) for the estimated component f[2− 3 t] and \( {s}_{3,k}^2=-0.88\cdot {\omega}_{3,k-1}^2-0.80\cdot {\omega}_{3,k-2}^2+{a}_{3,k}^2(t) \) and \( {\omega}_{3,k}^2=\nabla {d}_{3,k} \) for the estimated component g[2− 3 t].

For a low solar activity, we obtained:

\( {s}_{3,k}^1=-0.83\cdot {\omega}_{3,k-1}^1-0.73\cdot {\omega}_{3,k-2}^1+{a}_{3,k}^1(t) \) and \( {\omega}_{3,k}^1=\nabla {c}_{3,k} \) for the estimated component f[2− 3 t] and \( {s}_{3,k}^2=-0.95\cdot {\omega}_{3,k-1}^2-0.86\cdot {\omega}_{3,k-2}^2+{a}_{3,k}^2(t) \) and \( {\omega}_{3,k}^2=\nabla {d}_{3,k} \) for the estimated component g[2− 3 t].

According to these models, to obtain a summer forecast, three preceding forecasts were required, taking into account the difference of order ν = 1. For the initial hourly data and a decomposition level of m = 3, this corresponded to 24 h.

Model diagnostics

The diagnostics of the MCM was based on the adequacy of the constituent component models, using two residual error analysis methods. Based on the goodness-of-fit (Box and Jenkins 1970), a fitting model was adequate if

$$ {Q}^{\mu }=n{\displaystyle \sum_{z=1}^Z}{r}_z^2\left({a}_{\mu}\right) $$
(7)

had a distribution of approximately \( {\chi}^2\left(Z-{h}_j^{\mu }-{p}_j^{\mu}\right) \), where Z is the first autocorrelation of the μth component model residual errors, r z (a μ ) is the autocorrelation of the residual error of the μth component model, and n = N − ϑ, where N is the time series length of the μth component and ϑ is the difference order of the μth component model.

Based on the normalized cumulative periodogram, we also used:

$$ {C}^{\mu}\left({f}_{\beta}\right)=\frac{{\displaystyle \sum_{i=1}^{\beta }}I\left({f}_i\right)}{n{s}^2}, $$

where I(f i ) is the periodogram of a residual error of the μth component model \( {a}_k^{\mu },k=\overline{1,n} \), n is the time series length \( {a}_k^{\mu } \): \( I\left({f}_i\right)=\frac{2}{n}\left[{\left({\displaystyle \sum_{k=1}^n}{a}_k^{\mu } \cos 2\pi {f}_ik\right)}^2+{\left({\displaystyle \sum_{k=1}^n}{a}_k^{\mu } \sin 2\pi {f}_ik\right)}^2\right],{f}_i=i/n \) is the frequency, and s 2 is the estimation \( {\sigma}_{a^{\mu}}^2 \) of the residual error time series of the μth component model.

The diagnostics was performed using the data that was not used for the model identification. The selected intervals for diagnostics are shown in Table 6 and were characterized by a relatively quiet geomagnetic field without strong seismic events.

Table 6 Intervals of the f 0 F2 data used in the model diagnostics

The tests based on the total goodness of fit (Eq. 7) showed that the resulting MCM adequately characterized the time evolution of the f o F2 data. For example, for 9–22 August 2010, the \( {Q}^1=n{\displaystyle \sum_{k=1}^{20}}{r}_k^2\left({a}_1\right)=16,15 \) (for f[2− 3 t]) and the \( {Q}^2=n{\displaystyle \sum_{k=1}^{20}}{r}_k^2\left({a}_2\right)=8,47 \) (for g[2− 3 t]) were consistent with χ 0,05 2(20 − 2) = 28, 9 and in accordance with the total goodness-of-fit test, confirming the adequacy of the constructed models.

The diagnostics based on the normalized cumulative periodogram for 19 December 2011–8 January 2012 (Fig. 2) also confirmed its adequacy.

Fig. 2
figure 2

Diagnostics of the component models: a autocorrelation function (ACF) of the residual errors of the f[2− 3 t] component model, b cumulative periodogram of the residual errors of the f[2− 3 t] component model, c ACF of the residual errors of the g[2− 3 t] component model, and d cumulative periodogram of the residual errors of the g[2− 3 t] component model

Figure 3 shows an example of f o F2 data modeling for a relatively calm geomagnetic field, confirming the good approximation properties of the model and its convergence.

Fig. 3
figure 3

Modeling results of the f o F2 data (Paratunka station, Kamchatka) for 21–25 February 1999 (LT): a observed (black line) and modeled f o F2 data by the f[2− 3 t] component model (blue line), b observed (black line) and modeled f o F2 data by the multicomponent model (MCM, blue line), c MCM errors, and d K-index for the Paratunka station. Graph c shows the standard deviations of the MCM errors (dashed lines)

The comparison of the MCM with the moving median and empirical IRI model, for different seasons and levels of solar activity (Figs. 4 and 5, Table 7), showed that the MCM allowed a more accurate estimate, especially during the solar maximum. In summer, during the solar maximum, the IRI overestimated the f o F2 (Fig. 5a) while, during the solar minimum, it underestimated it (Fig. 5d). During the solar maximum, a significant increase in the IRI model errors was observed from 09:00 to 00:00 LT (Fig. 4c) while, during the solar minimum, the errors increased between 21:00 and 03:00 LT (Fig. 5f), in agreement with Nakamura et al. (2009). The observed correlation of the IRI model errors casts doubt on their adequacy. Oppositely, the MCM errors corresponded to white noise, as confirmed by the diagnostics.

Fig. 4
figure 4

Modeling results of the f o F2 data for winter (LT): a and d observed (black line) and predicted f o F2 by the multicomponent model (MCM, blue line) and the empiric International Reference Ionosphere (IRI) model (green dashed line), b and e MCM errors, and c and f IRI model errors, ac solar maximum (1991), df solar minimum (2006)

Fig. 5
figure 5

Modeling results of the f o F2 data for summer (LT): a and d observed (black line) and predicted f o F2 by the multicomponent model (MCM, blue line) and the empiric International Reference Ionosphere (IRI) model (green dashed line), b and e MCM errors, and c and f IRI model errors, ac solar maximum (2002), df solar minimum (2004)

Table 7 Error estimation for the multicomponent model (MCM) and International Reference Ionosphere (IRI) model

With the MCM, we can obtain predicted data and estimate the confidence intervals during the prediction according to Eq. 5. When the component model errors are beyond the confidence interval, we can identify an anomaly in the ionosphere, which is difficult for the IRI model and the moving median method. The modeling results for a perturbed geomagnetic field are shown in Figs. 6 and 7. At increased geomagnetic activity, the errors of the component models increased beyond the standard deviation, with a confidence level >70 %, indicating anomalous changes in the f o F2 time series. The estimated median f o F2 time series (Fig. 6a, gray line) showed the greatest deviation during high geomagnetic activity (5 February and 15 February 2011) and for a calm geomagnetic field (12 February 2011). The IRI model did not allow the distinction of anomalous periods in the ionosphere and showed a slight error increase in the first analyzed period (Fig. 6f) on 5 February 2011, for a slightly perturbed geomagnetic field, and in the second analyzed period (Fig. 7d) on 19 January 2013, for a slightly perturbed geomagnetic field, and on 25 January 2013, for a calm geomagnetic field.

Fig. 6
figure 6

Modeling results of the f o F2 time series components (Paratunka station, Kamchatka) for 4–17 February 2011 (UT): a observed (black line) and median of the f o F2 time series (gray line), b actual (black line) and modeled f[2− 3 t] component values (blue line), c actual (black line) and modeled g[2− 3 t] component values (blue line), d errors of the f[2− 3 t] component model, e errors of the g[2− 3 t] component model, f International Reference Ionosphere (IRI) model errors, and g K-index for the Paratunka station. Graphs d and e show the standard deviations of the component model errors (dashed lines)

Fig. 7
figure 7

Modeling results of the f o F2 time series components (Paratunka station, Kamchatka) for 16–28 January 2013 (LT): a observed (black line) and median of the f o F2 time series (gray line), b errors of the f[2− 3 t] component model, c errors of the g[2− 3 t] component model, d International Reference Ionosphere (IRI) model errors, and e K-index for the Paratunka station. Graphs b and c show the standard deviations of the component model errors (dashed lines), the arrow indicates the beginning of the earthquake that occurred in Kamchatka on 26 January 2013, and Ks is the energy class of the earthquake

Ionospheric anomaly detection and estimation of their parameters based on the continuous wavelet transform and threshold functions

Regarding each basic wavelet Ψ, the continuous wavelet transform was given by the following formula (Chui 1992; Daubechies 1992):

$$ {W}_{\varPsi }{f}_{b,a}:={\left|a\right|}^{-1/2}{\displaystyle \underset{-\mathit{\infty}}{\overset{\mathit{\infty}}{\int }}}f(t)\varPsi \left(\frac{t-b}{a}\right)dt,\;f\in {L}^2(R),a,b\in R,a\ne 0. $$

A decrease in the |W Ψ f b,a | coefficient amplitudes depending on scale a is associated with the Lipschitz’s uniform and dot smoothness of the Lipschitz function f (Daubechies 1992; Mallat 1999). According to the Zhaffar’s theorem (Jaffard 1991; Mallat 1999), when a decreases, the amplitudes of the |W Ψ f b,a | coefficients rapidly decrease to zero where the function f is smooth and has no local features. Based on this property of the wavelet transform, we used the following threshold function to detect local features in the time series of the f o F2 critical frequency and identify ionospheric anomalies:

$$ {P}_{T_a}\left({W}_{\varPsi }{f}_{b,a}\right)=\Big\{\begin{array}{l}{W}_{\varPsi }{f}_{b,a}, if\left|{W}_{\varPsi }{f}_{b,a}-{W}_{\varPsi }{f}_{b,a}^{\mathrm{med}}\right|\ge {T}_a\hfill \\ {}0, if\left|{W}_{\varPsi }{f}_{b,a}-{W}_{\varPsi }{f}_{b,a}^{\mathrm{med}}\right|<{T}_a\hfill \end{array}\mathrm{none}. $$
(8)

where the threshold T a  = U * St a detects the presence of an anomaly for an a scale near point ξ included in the carrier Ψ b,a (see below), U is a threshold coefficient, and \( S{t}_a=\sqrt{\frac{1}{\varPhi -1}{\displaystyle \sum_{k=1}^{\varPhi }{\left({W}_{\varPsi }{f}_{b,a}-\overline{W_{\varPsi }{f}_{b,a}}\right)}^2}} \), \( \overline{W_{\varPsi }{f}_{b,a}} \) и \( {W}_{\varPsi }{f}_{b,a}^{\mathrm{med}} \) are the average and median for a moving time window of length Φ. Taking into account the diurnal variation of the ionospheric data, the average \( \overline{W_{\varPsi }{f}_{b,a}} \) and median \( {W}_{\varPsi }{f}_{b,a}^{\mathrm{med}} \) were calculated separately for each hour.

Given the randomness of the data, the use of any threshold T a defining the presence or absence of an anomaly is inevitably associated with the possibility of a wrong identification. To assess the quality of the decision, we used the lowest error rate, which represents the most complete data representation, i.e., the posterior risk (Levin 1963) was estimated and minimized. During the estimation of the a posteriori risk in determining the ionospheric conditions, we used ionogram data (Paratunka station, Kamchatka), which were compared with geomagnetic (K-index) and Kamchatka earthquake catalog data. A dependence of the T a threshold on the solar activity was found, with T a increasing for periods of high solar activity. Therefore, separate thresholds for years of high and low solar activity were estimated.

For a wavelet Ψ with a compact carrier equal to [−Ω, Ω], the variety of point pairs (b, a) with ξ included in carrier Ψ b,a determines the influence cone of ξ (Mallat 1999). Since the Ψ b,a carrier for an a scale is [b − Ωa, b + Ωa], the cone of influence of ξ on a was defined by the following inequality:

$$ \left|b-\xi \right|\le \varOmega a $$

The anomaly duration for a was then defined by the influence cone of ξ and equal to:

$$ {\mathrm{H}}_a=2\varOmega a $$
(9)

The anomaly intensity for t = b was defined as:

$$ {Y}_b={\displaystyle \sum_a}\frac{\left|{P}_{T_a}\left({W}_{\varPsi }{f}_{b,a}\right)\right|}{{||{W}_{\varPsi }{f}_{b,a}||}_2} $$
(10)

where the norm \( {||{W}_{\varPsi }{f}_{b,a}||}_2=\sqrt{{\displaystyle \sum_{N_a}}{\left({P}_{T_a}\left({W}_{\varPsi }{f}_{b,a}\right)\right)}^2} \), N a is the series length for scale a.

Figure 8 shows the results of the ionospheric anomaly detection based on Eq. 8 and intensity estimation based on Eq. 10, during the magnetic storm of 25–26 August 1987. If the wavelet coefficients W Ψ f b,a exceeded the corresponding median \( {W}_{\varPsi }{f}_{b,a}^{\mathrm{med}} \) by T a , we considered a positive anomaly, characterized by an increase in the ionospheric electron density compared to the background (Fig. 8, in red). If the median \( {W}_{\varPsi }{f}_{b,a}^{\mathrm{med}} \) exceeded the corresponding wavelet coefficients W Ψ f b,a by T a , we assumed a negative anomaly, characterized by a decrease in the electron density compared to the background (Fig. 8, in blue). A negative anomaly, lasting for more than 1 day, occurred in the ionosphere during a magnetic storm (Fig. 8). Its intensity increased from the beginning of the storm and was maximum during the main phase of the storm. After the magnetic storm, the electron density increased, as indicated by the positive anomalies from 28 August 1987. During the storm, small-scale anomalies, associated with local variations of the ionospheric electron density also occurred. In comparison with the proposed solutions, the calculation of the medians of the f o F2 series (Fig. 8a, gray line) did not allow a detailed analysis of the ionosphere during the storm, the acquisition of quantitative estimates of the disturbances, and the identification of the anomalous period. The largest median deviations in the f o F2 series were observed both during a magnetic storm and for quiet geomagnetic fields, mainly at night.

Fig. 8
figure 8

Results of the ionospheric data processing (Paratunka station, Kamchatka) for 22–31 August 1987: a observed (black line) and median of the f o F2 time series (gray line), b detected anomalies for a threshold coefficient U of 2.3 and a moving time window length Φ of 336 h, c estimation of the anomaly intensity, and d K-indices above three (Paratunka station)

Data analysis during magnetic storms

Figures 9 and 10 show the joint analysis of ionospheric and geomagnetic data during the magnetic storms from 17 March and 2 November 2013. The magnetic storm from 17 March had a sudden start. The comparison between the solar wind parameters and the geomagnetic and ionospheric data processing indicated a common nature. The perturbations, reaching a maximum between 06:15 and 19:50 UT, formed in the geomagnetic field during the significant increase in the solar wind speed, from 410 to 705 km/s between 05:25 and 05:55 UT, based on the \( {E}_b={\displaystyle \sum_a}\left|{W}_{\varPsi }{f}_{b,a}\right| \) (Figs. 9b and 10b, Mandrikova et al. 2013b, 2014b). A large-scale negative anomaly that lasted for about a day and reached its maximum mainly in daytime between 6:00 and 18:00 LT on March 18 was found at the same time in the ionosphere. Before the magnetic storm (15–16 March 2013), local increases in the solar wind speed were observed, accompanied by weak disturbances in the geomagnetic field. A large-scale positive ionospheric anomaly lasting for more than a day and small-scale anomalies associated with local fluctuations in the ionospheric electron density were also observed.

Fig. 9
figure 9

Processing results of the geomagnetic and ionospheric data for 14–22 March 2013: a H-component of the Earth’s magnetic field, b assessment of the geomagnetic disturbance intensities, c identification of the periods of weak and strong geomagnetic disturbances, d identification of the periods of strong geomagnetic disturbances, e solar wind speed, f observed f o F2, g absolute smoothed component model errors, h absolute detailing component model errors, i estimation of the anomaly intensity, and j detected anomalies for a threshold coefficient U of 2.5 and a moving time window length Φ of 336 h. Graphs g and h show the standard deviations of the component model errors (dashed lines)

Fig. 10
figure 10

Processing results of the geomagnetic and ionospheric data for 29 September–5 October 2013: a H-component of the Earth’s magnetic field, b assessment of the geomagnetic disturbance intensities, c identification of the periods of weak and strong geomagnetic disturbances, d identification of the periods of strong geomagnetic disturbances, e solar wind speed, f observed f o F2, g absolute smoothed component model errors, h absolute detailing component model errors, i estimation of the anomaly intensity, and j detected anomalies for a threshold coefficient U of 2.5 and a moving time window length Φ of 336 h. Graphs g and h show the standard deviations of the component model errors (dashed lines)

The analysis of the magnetic storm from 2 November 2013 showed a similar nature of the processes occurring in the magnetosphere and ionosphere. The perturbations in the geomagnetic field formed during the increase in solar wind speed and were largest between 03:30 and 06:25 UT. A positive anomaly, indicated by an increase in the electron density, was observed in the ionosphere before the magnetic storm (1 November 2013) and, at the beginning of the storm, it was replaced by a medium-scale negative anomaly, which reached its maximum at night between 01:00 and 06:00 LT on 3 November. Small-scale anomalies were also observed. After the end of the magnetic storm at night on 4 November, the electron density in the ionosphere decreased significantly, as indicated by a negative anomaly.

A clear increase in the f o F2 (pre-storm enhancement) from ground measurements and total electron content (TEC) data has been observed by many authors (Danilov and Belik 1991; Danilov and Belik 1992; Danilov 2001; Burešová and Laštovička 2007; Mansilla 2007; Liu et al. 2008a, 2008b; Nogueira et al. 2011; Saranya et al. 2011; Adekoya and Chukwuma 2012). For the magnetic storms from 17 March and 2 October 2013, these effects were observed for a calm and weakly disturbed geomagnetic field, lasting from several hours to a day and a half (Figs. 9 and 10).

Conclusions

Using a newly suggested modeling method, we extracted the components that characterize the seasonal and diurnal fluctuations of the ionospheric parameter characteristics for calm conditions in the Kamchatka region. The corresponding models were also constructed. A comparison between the new model and the empirical IRI model and moving median method showed promising results from the suggested method for the studied region, which provided more reliable information about the ionospheric conditions. The computational solutions developed, based on the continuous wavelet transform, allowed the identification of different scale anomalies during ionospheric disturbances and the estimation of their duration and intensity. The ionospheric 1969–2013 data processing showed a dependence of the ionospheric anomaly intensity on the level of solar and geomagnetic activity. The largest and most intense ionospheric anomalies were observed during strong magnetic storms and were mostly characterized by a decrease in the electron density compared with the typical level.

A joint analysis of ionospheric and geomagnetic data from two strong magnetic storms that occurred on 17 March and 2 November 2013 helped to understand the processes involved and the characteristics before and during the events. A comparison between the solar wind parameters and the geomagnetic and ionospheric data processing showed a common nature for the analyzed processes. A significant increase in the solar wind speed before the main phase of magnetic storms was accompanied by disturbances in the geomagnetic field and the emergence of large-scale negative ionospheric anomalies of high intensity. During local small increases in the solar wind speed, weak perturbations were found in the geomagnetic field, accompanied by multiscale abnormal changes in the ionospheric parameters. Before magnetic storms, large-scale positive anomalies, as indicated by the increased electron density, were observed in the ionosphere, together with small-scale anomalies associated with local variations in the ionospheric electron density.

Future research includes the testing and application of the developed MCM as well as obtaining computing solutions for different data registration stations, for a more detailed analysis of the ionospheric processes during disturbances and the study of their spatial and temporal distribution.