Introduction

Over more than a century, the temporal variations of the Earth’s magnetic field have been mainly monitored by continuous recordings in geomagnetic observatories. Although the number of observatories has steadily increased, their geographic coverage remains highly uneven. To overcome this difficulty, satellite data are now available, albeit for only the past two decades. Recent magnetic field models have been constructed from measurements obtained from low Earth orbiting satellites carrying magnetometers, mainly Ørsted, CHAMP, SAC-C and Swarm. Unfortunately, these satellite missions are short in comparison with the correlation times of the Earth’s core magnetic field (Hulot and Le Mouël 1994). Moreover, there may be gaps related to instrumental failure, or non-consecutive missions. Lack of satellite vectorial data between September 2010 (de-orbiting of CHAMP) and November 2013 (launch of Swarm mission) has contributed to the decrease in spatial resolution of main field models during that period (Beggan and Whaler 2018).

Most analyses of the variability of the main field have been based on time series of spherical harmonic coefficients. Lhuillier et al. (2011) estimated the correlation time of the main non-dipole field at \(415/\ell\) years where \(\ell\) is the harmonic degree. In addition to its slow variation, the main magnetic field also has decadal and interannual fluctuations, and satellite data covering the last 20 years can be very useful to investigate this range of periods. Close to 1-year timescale, the contribution of the external field is present and quite strong and a proper separation is not yet possible. Finally, at the high-frequency end of the main field spectrum, its amplitude is too small to be perceived at observatories and satellites.

The time resolution of series of spherical harmonic coefficients from geomagnetic field models has naturally been investigated as a function of \(\ell\). Models based on observatory records had lower time resolution for higher degree \(\ell\): as an example, Langel et al. (1986) represented the secular variation from 1903 to 1982 using cubic B-splines with up to six nodes for degrees 1–4 but only one internal node for degrees 5 and 6. The situation has improved with the availability of satellite data, and yet, for degrees 5–8, Gillet (2018) remarks that only signals with periods above about 6 years can be retrieved from time-varying current satellite field models. The time resolution of satellite field models is better for lower degrees and worse for higher ones (compare the spectrum of the second and third time derivatives of the field in Figure 2 of Lesur et al. (2010)). This is a consequence of the regularisation techniques used to produce the models (Finlay et al. 2016). It is noteworthy that the lower variability of fine scale structures in models of the Earth’s magnetic field results from the nature of the inverse modelling and does not reflect the time variability of small scale features of the actual field. This observation has motivated the development of models of the time-varying Earth’s magnetic field that do not involve a projection of the geomagnetic potential on spherical harmonics (Olsen and Mandea 2007).

Principal component analysis (PCA), also known as principal orthogonal decomposition [see, e.g. Taira et al. (2017) for a recent overview], is a widely used statistical tool to extract a minimal number of energetic coherent structures from a time sequence of scalar maps. It has the advantage of describing the space-time evolution of a physical quantity as the superposition of a few decorrelated components (or modes), a property referred to as dimensionality reduction. It is one among different data-based modal techniques used as a prospective analysis of real or simulated data. Besides, in certain cases, physically meaningful structures displaying the symmetries of underlying source mechanisms may emerge from PCA modes. By directly using records of the geomagnetic field at fixed locations (observatories) to conduct a PCA analysis, it may be possible to extract features—with small length scales and/or short timescales—left aside when constructing spherical harmonic models.

As a matter of fact, PCA analysis has been employed to calculate spatio-temporal patterns of the external component of the field from geomagnetic observatory data (Balasis and Egbert 2006; Shore et al. 2016), giving only an ancillary role to spherical harmonic modelling. Furthermore, PCA may allow for the separation between the internal and the external field (which is key to detect rapid variations of the internal field). The analysis of the geomagnetic field vector requires a slight modification of the standard PCA method in order to ensure that the spatial modes for the three components are each subject to the same time variation. We shall refer to it as vectorial PCA. Shore et al. (2016) performed such an analysis using ground-based geomagnetic data from 1997 to 2009 keeping only the five quietest days in each month, after removal of internal main field and crustal contributions. The different modes that they identified have a clear physical significance. Their mode 1 describes the magnetospheric ring current and is strongly fluctuating. The mode 2 represents an annual oscillation of the global system of currents and mode 3 describes ionospheric signals with semi-annual period.

The two studies of Balasis and Egbert (2006) and Shore et al. (2016) were concerned with the determination of external magnetic field variations. In the context of main field studies, however, global coverage is crucial. This explains why the first attempts at applying the PCA method to the main field have relied on spherical harmonic coefficients as the quantity to be analysed either directly (Chulliat and Maus 2014) or after projection to produce maps of the radial field at the core-mantle boundary (Pais et al. 2015a). Pais et al. (2015a) derived explicitly the mathematical conditions when the two approaches are equivalent. From CHAMP models centred on epochs ranging from 2002 to 2009, Chulliat and Maus (2014) retrieved a standing wave of period about 6 years. Pais et al. (2015a) used the CM4 model for 1960–2002 (Sabaka et al. 2004), which is mainly based on observatory data, to obtain large-scale modes and discuss the relationship between their time evolution and the occurrence of geomagnetic jerks. In these studies, the time variability of the modes increases with their rank, a result that is compatible with the \(\sim f^{-4}\) energy spectrum of the main field (De Santis et al. 2003; Lesur et al. 2018).

Using PCA in the context of dynamical systems theory, to reconstruct the state space of a system from experimental time series of some univariate stochastic variable, Gibson et al. (1992) showed that discrete Legendre polynomials can very closely approximate the set of orthogonal basis functions that describe time variation. In a recent study, McKinnon et al. (2016) used explicitly the first four Legendre polynomials to project a data matrix on variability of daily maximum and minimum summer temperatures over the Northern Hemisphere, because they are so similar in shape with the time basis resorting from PCA.

Mandea and Olsen (2006) introduced the concept of “Virtual Observatories” (VO) at the satellite altitude in order to describe the core magnetic field and its secular variation. It consists in computing the secular variation directly from the satellite data, as done at ground observatories, but on a dense and regular grid, overcoming the uneven distribution of observatories. Each VO “measurement” averages satellite data inside a defined volume and during a specific time window typically 30 days long. For each VO, Mandea and Olsen (2006) used CHAMP measurements, which they reduced to a common altitude. The correction rests on the hypothesis that the residuals of the measurements (obtained by the subtraction of an a priori spherical harmonic field model) are given by a potential field varying linearly in space. Assuming that over 1 month the external field contributions can be neglected, they estimated the parameters of the residuals and used them to compute a mean magnetic field residual for each VO and each month. The resulting VO time series correlate well with the corresponding ground-based observatories time series (Olsen and Mandea 2007). In contrast with measurements at real ground observatories, VO series are free from local crustal effects. Beggan et al. (2009) followed up the approach developed by Mandea and Olsen (2006) and created VO datasets from both Ørsted and CHAMP satellite measurements. They found that external fields cannot be entirely removed and produce, as a result, unrealistic secular acceleration of the main field. Saturnino et al. (2018) illustrated an alternative methodology with 2 years of Swarm data. They neither selected the data nor removed a model for the external field. They applied instead an equivalent source dipole technique. They accounted for the recorded magnetic field by an ensemble of time-varying dipoles located at the core-mantle boundary and reconstructed VO series at a uniform altitude from these time-varying dipoles.

We undertake here a PCA analysis of the VO datasets recently created by Barrois et al. (2018). They calculated separate datasets from CHAMP and Swarm measurements which are as free as possible from contributions of the external field. Both time series have a time resolution of 4 months with almost no gaps despite strict selection criteria of the magnetic measurements.

The next section (“Data” section) consists in a description of the VO datasets. It is followed in “Methods” section by a summary of the PCA method that we employed. Our main results are presented in “Results” section, where we compare the modes calculated from CHAMP and Swarm data, respectively, and a discussion follows in “Discussion” section, focusing on the issue of time resolution. The paper ends by pointing out, in “Conclusions and perspectives” section, the main conclusions and some interesting future developments.

Data

Virtual observatories

Here, we use two sets of recent VO series, similar to the one presented by Barrois et al. (2018) with some small changes including the number of VOs: one of them was calculated from measurements provided by the CHAMP vector magnetometer (from July 2000 to September 2010) and the other one was computed from three Swarm vector magnetometers (from November 2013 to December 2017). For the self-consistency of this paper, let us summarise the applied procedure.

To calculate the magnetic field values at each VO location, firstly a data selection needs to be applied in order to reduce the noise and aliasing due to the high-frequency currents in the magnetosphere and the ionosphere, and charges circulating near and across the satellites orbit. The initial dataset consists of CHAMP MAG-L3 and Swarm Level 1b MAG-L, version 0501 at 15-second sampling rate data. Large outliers deviating more than 500 nT from the CHAOS-6-x3 model (Finlay et al. 2016), were removed. The next step in data selection consists in considering only night-time (Sun at most \(10^\circ\) above horizon) and geomagnetically quiet conditions (\(K_p < 3^0\), \(|\mathrm {d}RC/\mathrm {d}t| < 3\) nT/h, \(E_m \le 0.8\) mV/m, \(B_z > 0\) and \(|B_y| < 10\) nT, where \(K_p\) is the geomagnetic activity index, RC is the ring current disturbance index, \(E_m\) is the merging magnetic field at the magnetopause, \(B_y\) and \(B_z\) are components of the interplanetary magnetic field, see Barrois et al. (2018) for a precise definition of these parameters). For the CHAMP dataset, along-track sums and differences were used, while the Swarm dataset benefits from using both along-track and cross-track sums and differences, which provide a signal less perturbed by external sources.

To minimise other possible un-modelled sources, several processing steps were applied and the contributions from different sources removed from data: (1) the magnetospheric field (up to degree \(\ell =2\)) and its Earth-induced counterpart as given by the CHAOS-6-x3 model; (2) the ionospheric and its induced fields as given by the CM4 model (Sabaka et al. 2004); (3) the time-dependent internal field for spherical harmonic degrees \(\ell \le 20\) and the static internal field for \(\ell > 20\) given by the CHAOS-6-x3 model. Note that these corrections involve different coordinate systems. Both the core and ionospheric fields are estimated in the geographic coordinate system. The fields due to near Earth magnetospheric sources, including the ring current, are calculated in the solar magnetic coordinate system (with the z axis defined as the dipole axis, and the Sun-Earth line contained in the xz plane, see, e.g. Laundal and Richmond 2017). The fields due to remote magnetospheric sources are calculated in the geocentric solar magnetic coordinate system (x axis along the Earth-Sun line, dipole axis in the xz plane).

In order to divide the surface of a sphere into nearly equal area regions, the Recursive Zonal Equal Area Sphere Partitioning Toolbox (Leopardi 2006) was used, giving 300 VOs for the dataset provided by Chris Finlay and Magnus Hammer (personal communication). In the vicinity of each VO, the residual field can then be represented as the gradient of a scalar potential V (determined as a linear combination of cubic polynomials in local Cartesian coordinates) inside a cylinder, separately. For each VO position and at a given epoch all data inside a cylinder of radius 700 km and within 4 months were used to build a local data vector. The three components of the residual field and the locations of the observation points are calculated in a local Cartesian coordinate system with the origin at the VO position.

Data selection for PCA

The VOs data from CHAMP satellite and Swarm constellation of three satellites were used to perform a PCA analysis of the magnetic field. VO data are provided for a fixed set of spatial coordinates, which is a requirement of PCA decomposition. In order to work with the longest possible simultaneous series in a fixed number of VOs, the period under study is shorter than the satellite mission periods, from November 2001 to November 2009 for CHAMP and from March 2014 to November 2017 for Swarm. Both datasets consist of vector magnetic field values every 4 months, for a total of 25 epochs for CHAMP and 12 epochs for Swarm. The number of grid points for CHAMP is \(N_P=292\), since 8 of the original VOs presented recurrent gaps.

For Swarm, a different set was constructed, since field values representative of the core field were systematically missing at high latitude VOs, due to strong external perturbations. For this reason, Swarm dataset consists of only \(N_P=258\) VOs between \(-60^{\circ }\) and \(60^{\circ }\) latitude (sub-auroral grid). Same as in Shore et al. (2016), PCA is applied on a data matrix with \(B_r\), \(B_\theta\) and \(B_\phi\) components (where \((r,\theta ,\phi )\) are spherical polar coordinates).

Although Swarm data are more precise, the longer time series from CHAMP allows to expect a better resolution of modes that have annual to decadal characteristic timescales. The equal area distribution of grid points makes it unnecessary to apply a latitudinal weighting factor, which would have been the case for the geographic regular grid spacing approach or for the existing network of ground observatories (Shore et al. 2016). The altitude \(R_{VO}\) of the VOs obtained from CHAMP and Swarm data is 300 km and 500 km, respectively. This slightly complicates the comparison between the modes obtained from the two datasets.

Methods

Implementation of principal component analysis

To apply PCA, geomagnetic data for the spherical field components at M distinct epochs and from a total number \(N_P\) of VOs are distributed into \(M \times 3N_P\) data matrix \(\mathbf {X}\). Each row i, say \(\mathbf {x}_i^T\), is the set of \(3N_P\) ‘data’ for a certain epoch i, once the time average has been removed. The set of all rows can be seen as the M realisations of a \(3N_P\)-variate stochastic variable (e.g. Jolliffe 2002):

$$\begin{aligned} \mathbf {X} = \left[ \begin{array}{c} \mathbf {x}_1^T \\ \mathbf {x}_2^T \\ \vdots \\ \mathbf {x}_M^T \end{array} \right] . \end{aligned}$$
(1)

Notation and nomenclature follow closely Pais et al. (2015b). By applying PCA, it is expected that a large fraction of the variability of the geomagnetic field may be described with a k-variate variable \(\mathbf {y}_i^T\) where \(k \ll 3N_P\). This is achieved by projecting each \(\mathbf {x}_i^T\) onto the eigenvectors of the covariance matrix \(\mathbf {C}_X=\mathbf {X}^T \mathbf {X}/(M-1)\):

$$\begin{aligned} \mathbf {y}_i^T= \mathbf {x}_i^T \mathbf {P} , \end{aligned}$$

where the columns (\(\mathbf {p}^j\)) of \(\mathbf {P}\) are the normalised eigenvectors of \(\mathbf {C}_X\), i.e. \(\mathbf {p}^{j\, T} \mathbf {p}^j=1\). Then, \(\mathbf {Y}= \mathbf {X} \mathbf {P}\) is the \(M \times 3N_P\) matrix with rows \(\mathbf {y}_i^T\), and \(\mathbf {X}= \mathbf {Y} \mathbf {P}^T\) since \(\mathbf {P}\) is orthogonal. By arranging the columns of \(\mathbf {P}\) in decreasing order of corresponding eigenvalues, the first k elements of \(\mathbf {y}_i\) explain most of the variability in the data matrix \(\mathbf {X}\), meaning that the data matrix can be reconstructed from the first k columns of \(\mathbf {Y}\) and the first k columns of \(\mathbf {P}\):

$$\begin{aligned} \mathbf {X} \simeq \left[ \begin{array}{cccccc} y_1^1 &{} y_1^2 &{} \ldots &{} y_1^k &{} 0 &{} \ldots \\ y_2^1 &{} y_2^2 &{} \ldots &{} y_2^k &{} 0 &{} \ldots \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ y_M^1 &{} y_M^2 &{} \ldots &{} y_M^k &{} 0 &{} \ldots \end{array} \right] \left[ \begin{array}{c} \mathbf {p}^{1 \,T} \\ \mathbf {p}^{2 \,T} \\ \vdots \\ \mathbf {p}^{k \,T} \\ 0 \\ \vdots \end{array} \right] = \sum _{j=1}^k \mathbf {y}^j \otimes \mathbf {p}^{j \,T} = \sum _{j=1}^k \alpha ^j \mathbf {y'}^j \otimes \mathbf {p}^{j \,T} , \end{aligned}$$
(2)

where \(\mathbf {y}^j\) is column-j of \(\mathbf {Y}\), \(\mathbf {y'}^j\) is the corresponding vector normalised to unity (\(\mathbf {y'}^{j\, T} \mathbf {y'}^j=1\)), \(\alpha ^j\) are non-negative scalars (the singular values of data matrix \(\mathbf {X}\)) and \(\otimes\) is for dyadic product (see, e.g. Hannachi et al. 2007). In (2), the data matrix has been decomposed into principal components or modes, each mode-j consisting of the product of a time series called principal component (PC) (the columns of \(\mathbf {Y}\)) and a spatial structure called empirical orthogonal function (EOF) (the columns of \(\mathbf {P}\)). The indices for mode rank and epoch appear as superscript and subscript, respectively.

How close to \(\mathbf {X}\) is (2) depends on how many terms we keep in the expansion, i.e. on k, but also on the intrinsic nature of the signal. As an example, when applying PCA to the main field (MF), its secular variation (SV) and its secular acceleration (SA) during the time period 1960 to 2002, the first mode in each independent analysis explains over 90% of the MF variance, about 70 % of the SV variance and only 30 % of the SA variance (Pais et al. 2015a). The variance explained by mode-j, describing how variable is the (zero mean) parameter \(y^j\) among different realisations, is given by \(\sum _{i=1}^M (y_i^j)^2/(M-1)\). This variance is the element (jj) of diagonal matrix \(\mathbf {C}_Y=\mathbf {Y}^T \mathbf {Y}/(M-1)\), \(\lambda ^j\), which is also the non-negative real eigenvalue j of covariance matrix \(\mathbf {C}_X\) once the ordering by decreasing value has been applied. Note that \(\lambda ^j\) and \(\alpha ^j\) in (2) are related through \(\lambda ^j=(\alpha ^j)^2/ (M-1)\).

The computation of eigenvectors and eigenvalues of the covariance matrix \(\mathbf {C}_X\), on which the principal modes rely, depends on the working sample. The best approximations are sample estimators, for which North et al. (1982) give first-order sampling errors:

$$\begin{aligned} \delta \lambda ^j\sim & {} \sqrt{\dfrac{2}{n^*}} \lambda ^j , \nonumber \\ \delta \mathbf {p}^j \sim \dfrac{\delta \lambda ^j}{\lambda ^j - \lambda ^l} \mathbf {p}^l= & {} \sqrt{\dfrac{2}{n^*}}\dfrac{\lambda ^j}{\lambda ^j - \lambda ^l} \mathbf {p}^l , \end{aligned}$$
(3)

where j and l are consecutive modes and \(n^*\) is the number of degrees of freedom, that takes into account the correlation between consecutive samples. In the absence of some a priori on the autocorrelation structure of the time series, the sample size M will be used instead [but see Hannachi et al. (2007) for other possibilities]. According to (3), plotting \(\delta \lambda ^j\) as error bars associated with each \(\lambda ^j\) gives information not only on the separation of consecutive eigenvalues but also eigenvectors. In case consecutive \(\lambda ^j\) and \(\lambda ^l\) are closer than \(\delta \lambda ^j\), the modes are said to be degenerate and any linear combination of those modes is as significant as each one of them (Hannachi et al. 2007).

Gibson et al. (1992) found that PCs can be identified with discrete Legendre polynomials when some conditions are met. They showed that, in the so-called small-window regime, the temporal orthogonal basis vector \(\mathbf {y}^j\) [see Eq. (2)] is very closely given by \(L_j\), the Legendre polynomial of degree j. Furthermore, the order-j time derivative of the field, taken at the mid-point of the time window, can be approximated as the inner product between \(L_j\) and the data series at each VO. In other terms, the consecutive EOFs give a good approximation of the successive time derivatives of the field at the mean epoch. The condition that has to be met is that \(\tau _W \ll \tau _W^c\), where \(\tau _W\) is the dataset window width and \(\tau _W^c\) is a critical width that depends on the relative variance of the two lowest-order time derivatives of the signal. In certain cases, but not always, \(\tau _W^c\) may be used as an estimate for the correlation time of the signal [see Gibson et al. (1992)]. The degree-0 Legendre polynomial does not arise here as a principal component because the time average of the field components has been subtracted before the PCA analysis. The two lowest-order derivatives in the remaining signal are order 1 and order 2, which together allow to define the critical time width, using as an approximation to these variances the first two \(\mathbf {C}_X\) eigenvalues. From equation (117) in Gibson et al. (1992):

$$\begin{aligned} \tau _W^c = 2 \sqrt{\dfrac{15 \lambda ^1}{\lambda ^2}}, \end{aligned}$$
(4)

in the same unit of time as used in the PCA computations, which in our case is year/3 (4 months).

Denoting \(\tau _W\) the length of the time series (8 year for CHAMP and 3.7 year for Swarm), we can associate a time resolution

$$\begin{aligned} T_B= \tau _W/j \end{aligned}$$
(5)

to each principal component close to a Legendre polynomial of degree j (i.e. all modes in Figs. 12 but the annual mode from Swarm data).

Fig. 1
figure 1

Normalised time series \(\mathbf {y'}^j\) [see Eq. (2)] corresponding to the first four modes from CHAMP, in blue, and corresponding normalised Legendre polynomials, in grey

Fig. 2
figure 2

As in Fig. 1 but from Swarm

Spherical harmonic models

The standard decomposition of the Earth’s irrotational field into spherical harmonic functions allows for the internal and external contributions to be parameterised separately. Note that the construction of VOs series does not account for effects due to poloidal electric currents crossing the cylindrical cells, as those coupling the magnetosphere and the ionosphere (see, e.g. Sabaka et al. 2010). With this approximation, only potential fields can be modelled with internal and external sources defined with respect to VOs, internal if below the VOs altitude and external if above, i.e. \(\mathbf {B}=-\nabla V\) with

$$\begin{aligned} \begin{aligned} V&=R_{VO} \sum _{\ell =1}^{\infty } \left( \dfrac{R_{VO}}{r}\right) ^{\ell +1} \sum _{m=0}^\ell (g_\ell ^m \cos m \phi + h_\ell ^m \sin m \phi ) P_l^m (\cos \theta ) \\&\quad + R_{VO} \sum _{\ell =1}^{\infty } \left( \dfrac{r}{R_{VO}}\right) ^{\ell } \sum _{m=0}^\ell (q_\ell ^m \cos m \phi + s_\ell ^m \sin m \phi ) P_l^m (\cos \theta ) , \end{aligned} \end{aligned}$$
(6)

where \(R_{VO}\) is the spherical shell radius at the VOs altitude, \(P_l^m\) are the associated Legendre functions having Schmidt semi-normalisation and \((g_\ell ^m, h_\ell ^m)\) and \((q_\ell ^m, s_\ell ^m)\) are the internal and external spherical harmonic (SH) coefficients at the VOs altitude.

We can also define an average spherical harmonic degree \(\ell _B\), weighted by the internal magnetic field energy, for each mode of CHAMP and Swarm as

$$\begin{aligned} \ell _B = \frac{\sum _\ell \ell B_\ell ^2}{\sum _\ell B_\ell ^2} \end{aligned}$$
(7)

where

$$\begin{aligned} B_\ell ^2 = (\ell +1) \sum _{m=0}^\ell \left[ (g_\ell ^m)^2 + (h_\ell ^m)^2 \right] . \end{aligned}$$

\(B_\ell ^2\) is the contribution of internal SH degree-\(\ell\) terms for the mean energy of the field at the VOs altitude [see Christensen and Aubert (2006) for the analogous definition of the average spherical harmonic degree of the flow in the Earth’s core].

The eccentric dipole approximation

The field can be approximated as a point dipole moving with respect to the Earth’s centre, with its magnetic moment rotating relative to the geographic frame. Then, the field variation is simulated considering only the eccentric dipole component as given by Schmidt (1934) and Bartels (1936) who imposed the invariance of the dipole moment: the eccentric dipole so-defined has the same moment and the same orientation as the centred dipole defined from the three Gauss coefficients of degree 1. The position \(\mathbf {r}_{ED}\) of the eccentric dipole (i.e. the offset between the dipole centre and the Earth’s centre) is then chosen to minimise the squared quadrupole moment as seen from \(\mathbf {r}_{ED}\) [see, e.g. Lowes (1994) and references therein]. Bartels (1936) gave the equations used to determine \(\mathbf {r}_{ED}\) [see his equations (14) and (15) reproduced in Fraser-Smith (1987)], which depend only on Gauss coefficients of degrees 1 and 2.

On spheres \(r=R\) outside the Earth’s surface and centred at the Earth’s centre, an eccentric dipole located at \(\mathbf {r}_{ED}\), with \(s=r_{ED}/R\ll 1\) produces a field dominated by its components of degree 1 and 2. James and Winch (1967) showed that a dipole of moment \(\mathcal {M} \hat{m}\) located at \(\mathbf {r}_{ED}\) is equivalent to an infinite sum of centred multipoles, all with axes along \(\mathbf {r}_{ED}\) except for the first term (aligned along \(\hat{m}\)). The strength of different multipoles scales with degree \(\ell\) as \(\mathcal {M} s^{\ell -1}/{(\ell -1)}!\), where \(s\sim 0.08\) for the present field and at the satellite altitude.

Application of PCA to VO series

Following Shore et al. (2016) who use the three magnetic components at each observatory as additional parameters of the multivariate variable, we write

$$\begin{aligned} \mathbf {x}_i^T=\left[ \mathbf {x}_{i,r}^{T}, \mathbf {x}_{i,\theta }^{T}, \mathbf {x}_{i,\phi }^{T}\right] \end{aligned}$$

with the \(N_P\) values of the radial component at epoch i concatenated with the \(N_P\) values for each of the other two sets of components. The extracted EOFs have the same structure, i.e. \(\mathbf {p}^{j\,T}=\left[ \mathbf {p}_r^{j\,T}, \mathbf {p}_\theta ^{j\,T}, \mathbf {p}_\phi ^{j\,T}\right]\), the 3 components evolving correlated in time according to the time series given by a single PC. Besides, if the \(\mathbf {x}_i\) parameters satisfy linear constraints such as \(\nabla \cdot \mathbf {B}=0\) and \(\mathbf {B}=-\nabla V\), each EOF will also satisfy the same constraints. Barrois et al. (2018) have used these constraints to construct the VO series but locally only, in cylindrical volumes around each VO location. We can thus measure the physical significance of the different modes by estimating to what extent they satisfy these constraints not only locally but also over the whole spherical surface where VOs are distributed. The potential theory leads us also to discriminate between internal or external sources.

The vectors \(\mathbf {y'}^j\) and \(\mathbf {p}^j\) are adimensional, both normalised to unity. This introduces scaling factors \(1/\sqrt{M}\) and \(1/\sqrt{3N_P}\) in the coefficients of \(\mathbf {y'}^j\) and \(\mathbf {p}^j\), respectively. The geomagnetic field at each grid point at the satellite altitude is recovered from combining the normalised time and space vectors with the scaling factor \(\alpha ^j\) (see Eq. 2), in nT, such that the singular values \(\alpha ^j\) of data matrix \(\mathbf {X}\) are related with the eigenvalues \(\lambda ^j\) of \(\mathbf {C}_X\) through \(\alpha ^j=\sqrt{\lambda ^j}\sqrt{M-1}\). In the end, the mean value of each field component at each grid point due to mode-j, in nT, is given by

$$\begin{aligned} \sqrt{\lambda ^j} \sqrt{\frac{M-1}{M}} \frac{1}{\sqrt{3N_P}} . \end{aligned}$$
(8)

The PCA method is nonparametric and decomposes the entire magnetic field into separate modes only on the basis of existing correlations in the data. In order to separate sources into internal and external, standard spherical harmonic analysis was applied to the individual modes produced by the PCA method. We start by considering an internal field model. If such a model fits closely a given mode, this mode will be identified as internal. In contrast, high misfits are interpreted as meaning that the mode is due, at least partly, to external sources. To clarify the geometry of external sources, we also fitted an external model to each main PCA mode. The normalised misfit is obtained from the residuals of the SH model compared to the PCA mode as in Eq. (9)

$$\begin{aligned} \Phi = \sqrt{(\mathbf {p}^M - \mathbf {p}^j)^T (\mathbf {p}^M - \mathbf {p}^j)} , \end{aligned}$$
(9)

where \(\mathbf {p}^M\) is the fitted model and \(\mathbf {p}^j\) is the EOF vector of mode-j.

Fig. 3
figure 3

Singular spectra for CHAMP full (\(\tau _W= 8\) year) dataset (thick blue), CHAMP shortened (\(\tau _W= 3.7\) year) dataset (thin blue) and Swarm (thick red), in logarithmic scale. The standard error associated with each eigenvalue is given by North’s rule of thumb to help identify modes that can be separated. In grey, the straight lines fitted to the noise “plateau”

Fig. 4
figure 4

Charts representing the scaled spatial features (\(\alpha ^j \mathbf {p}^j\), see Eq. (2)) on the global grid for the first four PCA modes, using data from CHAMP. Each vectorial mode is characterised by the three charts \(B_r\), \(B_\theta\) and \(B_\phi\)

Results

The PCA is applied to 8 years of CHAMP data, from November 2001 to November 2009, and 44 months of Swarm data, from March 2014 to November 2017, with a sampling interval of 4 months. We have chosen to remove the average field (including the crustal field) prior to calculating the PCA. Keeping it would have introduced the mean field, which is approximately a zonal dipole, as another spatial mode.

Identification of the principal components as Legendre polynomials

The (normalised) time series corresponding to the first four modes from CHAMP are represented in Fig. 1. We discuss them together with the time series for modes 1, 2 and 4 from Swarm (see Fig. 2). It appears that these principal components are well approximated by polynomials of degree increasing with the mode order.

Using eigenvalues of the covariance matrices from CHAMP and from Swarm VO series, we obtain \(\tau _W^c \sim\) 30–70 year from Eq. (4), much larger than each corresponding dataset window \(\tau _W\), of 8 and 3.7 year. A more precise gauge for the small-window regime is \(\varepsilon =(\tau _W/\tau _W^c)^2\), that should be kept such that \(\varepsilon \ll 1\) in order that Legendre polynomials may be used as temporal basis functions (Gibson et al. 1992). This condition is verified in our study since \(\varepsilon \sim 0.01\).

We superpose in Fig. 1 the Legendre polynomials of degree 1 to 4 to the principal components of rank 1 to 4 calculated from CHAMP data, all time series normalised to unity. Similarly, in Fig. 2, we show the Legendre polynomials of degree 1, 2 and 3 together with the principal components of rank 1, 2 and 4 obtained from Swarm data. The agreement is remarkable even for the components of rank 4, which we deem not well resolved according to North’s rule of thumb.

Fig. 5
figure 5

As in Fig. 4 but from Swarm

Fig. 6
figure 6

Average SH degree (\(\ell _B\)) of each mode from CHAMP and Swarm as a function of the time resolution (\(T_B\)). Size of symbols is proportional to the misfit to an internal field model (Table 1) and numbers next to symbols represent the mode ordering

Fig. 7
figure 7

(Top) Secular variation due to translation and rotation of the eccentric dipole, for the first half of 2014 at 500 km altitude, using CHAOS-6 to compute the ED parameters. (Bottom) Secular variation computed from the IGRF-12 model (2015–2020). Both represented as \(\dot{B}_r\), \(\dot{B}_\theta\) and \(\dot{B}_\phi\) in nT/year

The singular spectrum and the level of noise variance

The singular spectrum (the set of \({\lambda ^j}\) values) is shown for each dataset, in Fig. 3. After a rapid decrease, not properly represented neither by an exponential nor a power law, a knee follows and then an asymptotic slightly decreasing trend. This is often referred to as a “plateau” and is attained after only a few modes at a height level that is related to the amplitude of the Gaussian noise in the data (see, e.g. Gibson et al. 1992). For eigenvalues above the plateau, North’s rule of thumb (3) is used to locate the order in the PCA expansion where modes are no longer independently resolved and become degenerate. This occurs after order 4 for CHAMP and order 3 for Swarm datasets.

Differences between the two spectra in Fig. 3 might be due to different behaviour of the field in the two periods 2001–2009 and 2014–2017, different quality of the two datasets or different sample size, M. From results shown in Figs. 1 and 2, the order-1 PC is a straight line, i.e. \(\mathbf {y}^1= \alpha ^1 \mathbf {y'}^1 = a (t-t_0)\) where \(t_0\) is the mean epoch for each dataset. Note that, in those two figures, time t is in year/3 (4 months) and the slope is in units of (year/3)\(^{-1}\). The parameter a is quite similar for the two datasets. In fact, from \(\mathbf {y'}^1\) plotted in Figs. 1 and 2 and the values of \(\alpha ^1\), we obtain \(a_{\mathrm {Swarm}}=1063\) nT/year and \(a_{\mathrm {CHAMP}}=1132\) nT/year, a difference of only 6%. We shall assume that at epochs that are not too far apart, the field rate of change has not significantly varied and that estimates of this rate do not depend on M. Taking into account that all \(\mathbf {y}^j\) have zero mean, the variance represented by the first mode, which is linear in time, scales with \(\tau _W\), the width of the time window, as \(\tau _W^2\) (i.e. \(M^2\)). It turns out that this approximate relationship explains well the relative amplitude of this mode calculated from CHAMP or from Swarm data (see Fig. 3). Conversely, eigenvalues (mode variances) in the noise plateau (\(\lambda ^j\) for \(j \gtrsim 6\)) are expected to be independent on M if they represent a white noise stochastic process. Therefore, the noise plateau does not depend much on the width of the time window. We can directly compare the noise levels for CHAMP and for Swarm data (see the grey lines in Fig. 3) as the number of CHAMP and Swarm VOs is almost the same. Estimated in nT, we find that the height of the noise level for Swarm data is about 0.4 times the value for CHAMP data.

Figure 3 also shows results of a test where we compare the PCA spectra using the full CHAMP dataset (8 years) with that using only the first 3.7 years, the same length as the Swarm dataset (see the thinner solid line). This nicely supports the previous assumption that the plateau level is independent of M. It also reproduces the predicted variation in \(\lambda ^1\). Finally, this test illustrates that more and more modes are resolved as the time window is extended: the amplitude of the resolved polynomial modes increases, while the noise level remains constant.

Assuming that the time series sample analytical functions, Gibson et al. (1992) found that the variance of the modes decreases exponentially with their order in the small-window regime. Figure 3 for CHAMP data shows a different law for the decrease of the eigenvalues of the first modes. It should be noted however that the geomagnetic field measurements are arguably better described as realisations of a stochastic process (e.g. Hellio et al. 2014) rather than as values taken by an infinitely derivable function. It would thus be interesting to learn how the eigenvalues in the singular spectrum are expected to decrease when the signal samples the solution of a stochastic differential equation.

Internal and external modes from resolved empirical orthogonal functions

Figure 4 shows the spatial features of the first four modes of the PCA decomposition of CHAMP VOs. The vector PCA decomposition implies that each mode is characterised by three different plots, one for each \(B_r\), \(B_\theta\) and \(B_\phi\) components. In order to recover the spatial distribution of scaled values for a given mode, over time, values in charts of Figs. 4 and 5 have simply to be multiplied by corresponding time series in Figs. 1 and 2 [see Eq. (2)]. In Fig. 5, we show four distinct modes for Swarm, three of them with temporal variations very similar to CHAMP and mode 3 with a much shorter period of 1 year. The grid for Swarm VOs has less representation in the auroral latitudes, due to very intense external fields there, so we do not show the EOFs at these latitudes.

Table 1 shows the results for these calculations. The three first CHAMP modes and the two first Swarm modes are well explained as internal potential fields. Swarm mode 3, which is dominantly axial quadrupole, shows the lowest misfit when compared to an external model. It consists of both internal and external fields. The misfits between the fourth modes of CHAMP and Swarm and an internal field model are twice as small as their misfits to an external field model. This supports the assessment that they are mainly from internal origin.

Table 1 Normalised misfit values comparing PCA modes with internal and external SH models, for CHAMP and Swarm

Spatial resolution of the successive modes

We represent in Fig. 6, for all modes, their average degree \(\ell _B\), which is directly related to a length scale, as a function of their time resolution, estimated from the degree of the appropriate Legendre polynomial. The time resolution \(T_B\) cannot be directly identified with a characteristic timescale as it clearly depends on the length \(\tau _W\) of the time series; the time \(T_B\) cannot exceed \(\tau _W\). As an example, both modes 1 from CHAMP and Swarm correspond to the secular variation of the field, which has a typical timescale much longer than the duration of the satellite records. We nevertheless observe that, globally, \(\ell _B\) increases as \(T_B\) decreases: the smallest features of the field have the shortest timescales as expected from a physical standpoint (Nataf and Schaeffer 2015). This conclusion rests heavily on modes that we do not trust so much, as indicated by the size of the symbols (the larger the symbol, the least the agreement with an internal field model). The strongest evidence yet comes from the comparison between the first two Swarm modes.

Discussion

Main PCA mode well explained as the variation of the eccentric dipole

The first mode of the PCA decomposition from either CHAMP or Swarm data closely follows a straight line that corresponds exactly to the Legendre polynomial of degree 1 (Figs. 1a, 2a). Furthermore, Fig. 6 indicates that the average harmonic degree of the spatial structure is very small, not far from 2. As a result, it does not come as a surprise that the spatial structure associated with mode 1 (in Figs. 45), can be explained to a large extent using a simple model of a point dipole that moves with respect to the Earth’s surface, at the same time as its magnetic moment is rotating.

Denoting \(\mathbf {r}_{i}\) the Cartesian coordinates of each VO in the geographic frame, we calculate the Cartesian coordinates in the eccentric dipole frame (see “The eccentric dipole approximation” section), \(\mathbf {r}_{f}\), as

$$\begin{aligned} \mathbf {r}_{f} =\mathbf {R} (\mathbf {r}_{i} -\mathbf {r}_{ED}) \end{aligned}$$
(10)

where \(\mathbf {R}\) is the rotation matrix transforming the Earth’s axis into the magnetic dipole axis. The degree 1 and 2 coefficients of the CHAOS-6 model (Finlay et al. 2016) suffice to obtain the dipole moment, the offset \(\mathbf {r}_{ED}\) and the matrix \(\mathbf {R}\) at all times. With this, we can transform the coordinates of the VOs grid of points into the reference frame of the eccentric dipole. We then compute vector components of an axial dipole field at each grid point. Finally, these components are rotated back to the geocentric frame and represented at the VOs locations \(\mathbf {r}_{i}\).

Figure 7 shows the secular variation due to the eccentric dipole evolution calculated from the CHAOS-6 model on the top (difference between the field obtained at consecutive epochs) for epoch 2014, and on the bottom we show the secular variation computed directly from the IGRF-12 model, for 2015–2020. The spatial signature we obtain for the moving eccentric dipole associated secular variation is clearly similar to the spatial pattern of mode 1 and to the IGRF-12 model.

The spatial structures of the first mode recovered from the CHAMP and Swarm datasets are very similar. In fact, using the trend and spatial structure of PCA mode 1 computed from CHAMP dataset, we can explain 60% of the magnetic field signal in Swarm dataset. That is,

$$\begin{aligned} \dfrac{\Vert \mathbf {X}_{\mathrm {Swarm}} - \mathbf {y}_{\mathrm {CHAMP}}^1 \otimes \mathbf {p}_{\mathrm {CHAMP}}^{1\,T} \Vert _F}{\Vert \mathbf {X}_{\mathrm {Swarm}} \Vert _F} \sim 0.4 \end{aligned}$$

where \(\mathbf {X}_{\mathrm {Swarm}}\) is the data matrix from Swarm, \((\mathbf {y}_{\mathrm {CHAMP}}^1\), \(\mathbf {p}_{\mathrm {CHAMP}}^{1})\) are the PC and EOF for PCA mode 1 computed using CHAMP dataset and \(\Vert \mathbf {A} \Vert _F\) denotes the Frobenius norm of matrix \(\mathbf {A}\). Had we used PCA mode 1 from Swarm dataset, we would increase the explained signal in \(\mathbf {X}_{\mathrm {Swarm}}\) to 70%. This level of predictability can meet the needs of some fields of study where only the large-scale (or long-distance) component of the main field is important. It is the case of studies on low Earth orbit satellites vulnerability to energetic particle flux (Domingos et al. 2017). It can also be the case of magnetospheric studies where models for electrical current systems take into account the large-scale main field (e.g. Tsyganenko and Sitnov 2005).

PCA mode 1 gives the average secular variation during the time interval of CHAMP and Swarm datasets. As a result, it compares well with the IGRF models for these time periods (Thébault et al. 2015).

Higher order time derivatives

The direct reading of computed PCs as Legendre polynomials of increasing degree is valid only while in the small-window-width regime, which applies to our datasets, for which \(\tau _W<< \tau _W^c\). If \(\tau _W\) increases to the point that it gets close to or even larger than \(\tau _W^c\), PCs must be interpreted in a different way. Often, when using PCA on data matrices as (1), the computed PCs are expected to reveal characteristic timescales of the physical mechanisms contributing to those particular datasets. As an example, Domingos et al. (2017) retrieved a quasi-periodicity of \(\sim\) 11 year when applying PCA to the flux of high energy particles over the South Atlantic Anomaly, due to the solar forcing of the neutral atmosphere density. In the same way, if the VOs global signal includes a component due to a distinct source and with a correlation time short with respect to \(\tau _W\), we expect that PCA succeeds in extracting it, if its energy rises above the noise level. This is in fact what occurs in the mode 3 from Swarm database.

The second modes resolved in both CHAMP and Swarm datasets may be regarded as Secular Acceleration (SA) modes. Their importance comes from the fact that SA is seen as a window on the Earth’s core dynamics, since it is mainly the result of the interaction between the flow acceleration and the field, and flow acceleration is read in terms of contributing force terms (e.g. Aubert 2018). The corresponding PCs are very close to the discrete Legendre polynomial of degree-2 (see Figs. 1b, 2b), which is precisely the required filter to calculate the second time derivative of the geomagnetic field. The spatial-time information retrieved through these modes allows for the characterisation of the components of the SA field, at the mean epoch of each dataset. In agreement with this interpretation, the EOF2 chart in Fig. 4d–f at 300 km altitude (mean epoch 2005.8), is very similar to the chart of SA at the Earth’s surface for 2006 as given by Chulliat and Maus (2014) (see their Figure 7, also built from CHAMP data). It is dominated by spatial structures of SH degree 3 (as expected from our Fig. 6). The secular acceleration at the Earth’s surface evolves rapidly and the whole process wherein the Atlantic equatorial patches of SA radial field change from one sign to the other takes about 4 year [see again Figure 7 of Chulliat and Maus (2014)]. This suggests that \(\tau _W\) from Swarm may be too small to properly resolve this mode and could explain why EOF2 is quite different between the two datasets (compare Figs. 4d–f, 5d–f), with a correlation coefficient of only 0.47. Almost the same correlation (0.46) can be found between mode 2 from Swarm and mode 3 from CHAMP, despite the time series corresponding to different order time derivatives.

Modes 2 and 3 from CHAMP have similar average SH degree \(\ell _B\) (in contrast with the typical increase of \(\ell _B\) with the order of the mode), and they are quite similar, up to a longitude shift. The amplitude of mode 3 is higher than predicted from the power law scaling put forward by Gibson et al. (see “The singular spectrum and the level of noise variance” section); yet, it is smaller than the amplitude of mode 2 (see Fig. 3) and the two modes cannot be simply combined to form a propagating wave. Their similarity is nevertheless intriguing, and we can wonder whether it hints at a non-stochastic time pattern, which might be compared, for example, to the geomagnetic oscillation (of period about 6 years) reported by Silva et al. (2012) in the Earth’s surface SA, mostly from the CM4 model (from 1960 to 2002.5). Other workers have searched too for periodic signals in the SA but at the core-mantle boundary (CMB) instead of the Earth’s surface. Chulliat and Maus (2014) reported a standing wave, of dominant degree \(\ell =6\) at the CMB and of period about 6 years, in part of the equatorial belt from a principal component analysis of CHAMP data projected at the core surface [see also Chulliat et al. (2015)]. It is noteworthy, however, that continuation of the SA from the Earth’s surface \(r=R_E\) to the CMB \(r=R_C\) is fraught with difficulties. It entails amplification by a factor of \((R_E/R_C)^{\ell +2}\) and gives prominence to higher harmonic degrees (small scale structures) that have to be filtered out, typically above \(\ell =6\). All these studies argue for SA variability and equatorial symmetry of the radial SA in the Atlantic region equatorial belt. This agrees quite well with the geometry of the two modes 2 and 3 from CHAMP at West Africa (Fig. 4d, g). We now need longer and/or more accurate VO series to decide whether these two modes are really interconnected in the description of the same wave-like structure of the field, or if otherwise they do not differ significantly from the expectation of the stochastic null hypothesis producing the \(f^{-4}\) spectrum of the main field [see Dommenget (2007) for the discussion of such a null hypothesis and the perspectives for future work outlined in “Conclusions and perspectives” section].

A \(f^{-4}\) spectrum for the main field corresponds to a \(f^0\) spectrum for the secular acceleration, i.e. to a vanishing SA correlation time \(\tau _{SA}\), but this time has also been estimated from the ratio between SV and SA energy for each degree \(\ell\) separately (Lesur et al. 2008). Computed values from geomagnetic field models give \(\tau _{SA}\sim\) 10–20 year, independently of the SH degree (up to \(\ell \sim 10\) above which \(\tau _{SA}\) increases steeply). A similar value is obtained from the output of geodynamo simulations up to \(\ell \sim 10\) above which \(\tau _{SA}\) rapidly decreases (Christensen et al. 2012; Aubert 2018). From this perspective also, the window length \(\tau _W\) of the series that we have analysed is too short.

If modes 2 and 3 from CHAMP were not both related in building a deterministic structure, then the resolved mode 3 could still be read as the third-order time derivative of the geomagnetic field around the mean epoch 2005. It has a relatively large characteristic length scale \(\ell _B \sim 3\), with two main radial flux patches over Brazil and West Africa, at southern low latitudes. It imparts a special high time variability to SA in the near-equatorial region and in particular to the SA radial flux patch over West Africa which remains equatorially symmetric over time.

The fourth mode from Swarm in Fig. 5 shares the same cubic behaviour as the third mode from CHAMP (Figs. 1c, 2d), and some of the same symmetries can be seen in the EOF charts. In particular, the anti-symmetry over the Americas in \(B_r\) (Figs. 4g, 5j) and the diagonally symmetric patches over Central America and South Africa in \(B_\theta\) (Figs. 4h, 5k), both stand out. In the Swarm dataset, this mode is not so well resolved and shows some degree of degeneracy (see Fig. 3).

Among all modes resolved by our PCA analysis, it remains to discuss mode 4 from CHAMP. It shows similarities to mode 2 (from CHAMP) but displays higher variability close to geomagnetic polar regions, possibly due to the superposed effect of external currents.

Modes of order higher than four remain unresolved below the noise level (see Fig. 3). We can expect that if we had access to longer series, we would retrieve from the PCA decomposition extra terms for the series of Legendre polynomials, with better definition.

Annual oscillation mode from Swarm

Mode 3 from Swarm shows a spatial structure pulsating with an approximately annual period, clearly pointing to a magnetic source influenced by the sun. Although the VO series have been cleaned as much as possible from the magnetic fields with sources in the ionosphere and the magnetosphere, there is thus some residual external signal remaining in the Swarm VO series. The spatial structure of mode 3, namely its radial component (Fig. 5g), is clearly distinct from the geometry of other EOFs since it is mostly symmetric about the rotation axis (more precisely, the geomagnetic dipole axis). It shows a band around the geomagnetic equator, characteristic of a zonal quadrupolar geometry (resulting from a \(P_2^0 (\cos \theta )\) term in the magnetic potential, see Eq. 6, in the geomagnetic dipole reference frame). Mode 3 from Swarm cannot be due to a solenoidal field of internal origin (relative to the VOs altitude) since the misfit to an internal potential field reported in Table 1 is high. Neither is it a solenoidal field completely external to the VO locations. Thus, contributions from both internal (ionosphere, induced fields in the crust and mantle) and external (magnetosphere) sources are present. Outside the auroral regions, which are not represented in our study, we report deviations reaching \(\pm 2.6\) nT (obtained by multiplying the time series in Fig. 2c by charts in Fig. 5g, see Eq. (2)). This value is much reduced compared to amplitudes between 10 and 20 nT for Malin and Isikara (1976) and Shore et al. (2016). It thus appears that much of the annual oscillation has been removed from the VO series.

The annual oscillation observed at the Earth’s surface (Malin and Isikara 1976), before any treatment to minimise it, also presents a dominant quadrupolar structure. However, the geographic coordinate system used in Malin and Isikara (1976) and in the present study is arguably not the most appropriate system to represent magnetic fields with sources in the magnetosphere. As a matter of fact, Shore et al. (2016) calculated EOFs in the solar magnetic system to make the external field as static as possible in their chosen reference frame. They nevertheless also detected an annual oscillation and interpreted it as the result of a North-South oscillation of the system of external currents arising from the seasonal motion of the Sun in the solar magnetic coordinate system. Shore et al. (2016) showed the annual oscillation (see their Figure 6b) as a function of the magnetic local time defined in this coordinate system. They did not state whether the signal would be static in the geocentric solar magnetic system used by Barrois et al. (2018) to calculate, in addition to the solar magnetic system, some of the external field subtracted from the VO series. Transformation between the solar magnetic system and the geographic coordinate system involves longitudinal averages that explain well the dominantly axial structure of the mode 3 that we obtained from Swarm data.

Even though the CHAMP dataset covers a larger time window, we have found a well-resolved annual mode in Swarm series only. In Fig. 3, the plateau height associated with the noise level decreases from CHAMP to Swarm from about \(8.9\times 10^{2}\) to \(1.3\times 10^{2}\) nT\(^2\). The noise level in Swarm has decreased below the annual mode energy level allowing it to appear as a resolved mode. In fact, the annual mode can also be found in CHAMP but at rank 6–7, with amplitude well below the noise level. These two modes (6 and 7) have a correlation factor of 0.549 and 0.528, respectively, with the annual mode from Swarm (mode 3).

Finally, the 4-month resolution is rather long compared to the 1-year period of the mode discussed in this section. At an intermediate stage of this study, we examined VO series with monthly resolution. We found again a mode with annual periodicity and similar geometry as the mode investigated here.

Conclusions and perspectives

This study is an effort to retrieve new information on the Earth’s main magnetic field from recent geomagnetic data. We showed that PCA is an interesting alternative to the widely used SH by decomposing the signal into decorrelated modes. PCA seems to more easily retrieve a behaviour where the most rapid oscillations come from the smallest scale features, which is expected on a physical ground.

The PCA decomposition is particularly useful in cases where it allows to distinguish among different physical source contributions (Shore et al. 2016). The fit of internal and external potential field models to PCA modes guided us in a plausible separation between internal and mixed modes. We identified three clearly non-degenerate modes of internal origin from CHAMP data and two from Swarm data. The time series of these modes (PCs) can be represented by Legendre polynomials of increasing degree. The correspondence between the principal components of rank 4 with Legendre polynomials and the partial success of the modelling of the associate EOFs as internal potential fields give us also some confidence in the interpretation of the modes of rank 4 obtained from CHAMP and Swarm. Finally, from Swarm data only, we observe a mode with an annual period and quadrupolar geometry of the radial component (mode 3).

We found some interesting features in our results that can be further exploited. We also identified new problems to be tackled.

  • The spatial structure of an eccentric dipole moving towards Asia with amplitude increasing linearly in time is a good approximation to the SV of the main field at first order. After an interval of about 10 years between \(\sim\) 2005 and \(\sim\) 2015 (from mean epoch of CHAMP to mean epoch of Swarm), mode 1 from CHAMP still explains 60% of the time variability of the field during the Swarm period.

  • We found that the mode 3 obtained from CHAMP data is well resolved. It corresponds to the third-order time derivative at the middle epoch of the CHAMP series (November 2005). Once fitted into a spherical harmonic model, the mode can be downward continued to the CMB taking care to minimise the amplification of small scale noise. Interpreting the magnetic field changes with the radial induction equation at the core surface, we can undertake the estimation not only of the surface flow velocity and of the flow acceleration but also of the second-order time derivative of the velocity. This is all the more interesting as recent studies differ in their estimation of the core flow acceleration over the period 2000–2016. Livermore et al. (2017) found that a high latitude non-axisymmetric jet has dramatically increased over this period, by a factor 3, whereas Barrois et al. (2018) found a much more modest intensification.

  • Dommenget (2007) suggested evaluating data-derived EOF modes against modes synthesised from a stochastic null hypothesis. He argued that a spatial first-order autoregressive process (AR1) can be used as the null information for the spatial structure of the climate variability. Similarly, we may use a temporal second-order autoregressive process (AR2) as the null hypothesis for the magnetic field variability (based on the \(f^{-4}\) magnetic energy spectrum). Dommenget (2007) showed EOF modes for realisations of the two-dimensional AR1 stochastic process and introduced the idea of distinct EOFs, i.e. the modes that are the most distinguished from the modes of the null hypothesis. We may use the same approach, now in the time domain, to search for principal components associated with waves in the core, especially the torsional oscillations with about 6 years period (Gillet et al. 2010).

Let us focus on changes brought by Swarm mission and future work related to them:

  • In our tests, we found that we need a VO series of length at least 1.67 year to retrieve the annual mode from Swarm [see also Domingos et al. (2017) for an example where the 11-year solar cycle is extracted from a 16.5-year-long series]. Using the same ratio between the length of the time series and the period of the mode, we tentatively suggest that we need at least 10 years of high-quality data to retrieve the mode associated with torsional oscillations in the core, which is expected to reach amplitude of \(\sim 2\) nT/year at Earth’s surface (Cox et al. 2016).

  • VOs were devised in order to monitor the internal field and as such the emergence among the EOFs of an annual mode originating from external sources was unexpected. There remains in the VO series a signal arising from external sources with a significant amplitude of about 2 nT. The presence of this signal points to the need of improvements in the calculation of VO series. This effort could benefit from using, as a guide, mode 3 from Swarm that we computed here.

Although the small lifetime of Swarm mission does not yet allow to resolve completely SA features, if the characteristic times are indeed \(\sim\) 10 year, the noise variance level has already been reduced possibly due to cross-track sums provided by the constellation of three satellites. This brings some expectation concerning the retrieval of main field modes of smaller variance. But it also calls attention to the need to improve external field models in order to better clean VO series from external contributions.