# Automated recognition of spikes in 1 Hz data recorded at the Easter Island magnetic observatory

- First Online:

- Received:
- Revised:
- Accepted:

DOI: 10.5047/eps.2012.03.004

- Cite this article as:
- Soloviev, A., Chulliat, A., Bogoutdinov, S. et al. Earth Planet Sp (2012) 64: 743. doi:10.5047/eps.2012.03.004

## Abstract

In the present paper we apply a recently developed pattern recognition algorithm *SPs* to the problem of automated detection of artificial disturbances in one-second magnetic observatory data. The *SPs* algorithm relies on the theory of discrete mathematical analysis, which has been developed by some of the authors for more than 10 years. It continues the authors’ research in the morphological analysis of time series using fuzzy logic techniques. We show that, after a learning phase, this algorithm is able to recognize artificial spikes uniformly with low probabilities of target miss and false alarm. In particular, a 94% spike recognition rate and a 6% false alarm rate were achieved as a result of the algorithm application to raw one-second data acquired at the Easter Island magnetic observatory. This capability is critical and opens the possibility to use the *SPs* algorithm in an operational environment.

### Key words

Magnetic observatory magnetogram spikes pattern recognition fuzzy logic## 1. Introduction

The global network of magnetic observatories is one of the main observation infrastructures for geomagnetic research. Magnetic observatory data are used for investigating the geomagnetic secular variation originating in the Earth’s outer core, as well as the rapid variations generated by electric currents in the ionosphere, the magnetosphere and the oceans (e.g., recent papers such as Love, 2008; Matzka *et al.*, 2010). They are also used by a variety of governmental and industrial customers for applications such as directional drilling, reduction of magnetic survey data and space weather monitoring and forecasting (e.g., Reay *et al.*, 2005; Marshall *et al.*, 2011). Unlike other magnetometer networks, observatories are aimed at operating for several decades using internationally agreed standards of operations. About 120 observatories currently cooperate toward this goal within the INTERMAGNET program (www.intermagnet.org).

*et al.*, 2009b; Worthington

*et al.*, 2009). As expected, the faster measurement sampling rate uncovered various signals that were previously filtered out in one-minute data, including some artificial disturbances that have to be removed from the final observatory data products. While at many observatories the one-second data cleaning represents a reasonable amount of work, it becomes a daunting task at some observatories, particularly those installed in remote but important locations where no optimal observatory site could be found. For example, it is the case at the recently installed magnetic observatory in Easter Island (Isla de Pascua Mataveri, IAGA code IPM; see Chulliat

*et al.*, 2009a and Fig. 1), where the close-by traffic of trucks and planes may generate more than hundred artificial disturbances every day.

In the present paper we apply a recently developed pattern recognition algorithm *SPs* (from SPIKEsecond) to the problem of automatically detecting artificial disturbances in one-second magnetic observatory data. The first important step towards automated magnetogram filtering was undertaken by Soloviev *et al.* (2009) and Bogoutdinov *et al.* (2010). The *SPs* algorithm relies on the theory of discrete mathematical analysis (Gvishiani *et al.*, 2008a, 2010), which has been developed by some of the authors for more than 10 years. It continues the authors’ research in the morphological analysis of time series using fuzzy logic techniques (see e.g., Agayan *et al.*, 2005; Gvishiani *et al.*, 2008a, b). We show that, after a learning phase, this algorithm is able to distinguish artificial disturbances from natural ones, such as short-period geomagnetic pulsations in the 1 s–1 min period range (e.g., Samson, 1991). This capability is critical and opens the possibility to use the *SPs* algorithm in an operational environment.

## 2. Description of the *SPs* Algorithm

*SPs*algorithm is a tool applicable to any time series that has specific time anomalies (disturbances), which have to be identified. The algorithm is aimed at recognition of singular spikes

*S*of any nature with a simple morphology on a record

*y*. (Note that

*SPs*is not able to recognize jumps; this is done by another algorithm, JM, currently being developed by some of us.) An example of such spike, generated by a nearby running truck, is given in Fig. 2. The logic, which underlies the algorithm, is based on the following model of a spike. A

**spike**

*S*is defined as a record fragment having a

**tip**

*t*(

*S*), where two opposite

**sharp slopes**

*S*

^{l}and

*S*

^{r}meet, surrounded by

**quiet spike wings**

*W*

^{l}(

*S*) and

*W*

^{r}(

*S*) (Fig. 2). In order to formalize the logic of the algorithm, we use the concepts of fuzzy comparison and fuzzy extremality (Zadeh, 1965; Gvishiani

*et al.*, 2008a, b). The detailed mathematical description of the algorithm is given in Soloviev

*et al.*(2012). In what follows, we provide a brief summary of

*SPs.*

*SPs*algorithm consists of three blocks: “Λ-analysis”, “Search for quasi-spikes” and “Selection of spikes” (Fig. 3). The starting record

*y*is a time series

*y = y*(

*t*) givenonan interval of discrete positive semiaxis Open image in new window, where

*h*is the discretization step and

*k*is the observation node.

The *SPs* algorithm begins its search by considering a local extremum of *y* as a possible tip *t = t*(*S*) of a spike *S*. The algorithm evaluates the slopes *S*^{l} and *S*^{r} on each side of *t.* If they turn out to be sharp enough, the triplet *S* = (*S*^{l}, *t*, *S*^{r}), referred to as a quasi-spike, is further examined. Next, the algorithm searches for quiet wings *W*^{l}(*S*) and *W*^{r}(*S*) to the left and to the right of *S*^{l} and *S*^{r}, respectively. If quiet wings are detected, the quasi-spike is recognized as a spike, as defined above. The algorithm is aimed specifically at recognizing such spikes on a record *y*.

^{k}

*y*= {

*y*

_{k},…,

*y*

_{k+}

_{Δ}} of the record

*y*, a linear regression (Draper and Smith, 1966) is calculated by the least-square technique. The regression coefficients are then used to determine whether the fragment is ascending or descending, and to derive an indicator of activity within the fragment. Determining whether this activity is large (“sharp” fragment) or small (“quiet” fragment) is performed by using fuzzy comparisons (Gvishiani

*et al.*, 2008a, b) between a large number of fragments of varying lengths Λ = {Δ

_{1},…, Δ

_{m}}. In

*SPs*, the following fuzzy comparison function is used: for two numbers

*A*and

*B*, and where

*ν*is a fixed parameter. It yields a number between −1 and 1 which quantifies how much

*B*is larger than

*A.*

The other blocks of *SPs* algorithm, “Search for quasispikes” and “Selection of spikes” (Fig. 3), use the described classifications and correspondingly identify quasispikes and choose genuine spikes among them.

*SPs = SPs*(

*ν*,

*ρ*

_{1},

*ρ*

_{2}) (Fig. 3):

*ν—*parameter of fuzzy comparison,*ρ*_{1}—level of sharpness of the slopes*S*^{l}and*S*^{r},*ρ*_{2}—level of quietness of the wings*W*^{l}(*S*) and*W*^{r}(*S*).

A given set of free parameters is denoted by *π =* (*ν*, *ρ*_{1}, *ρ*_{2}).

## 3. Testing Dataset and Methodology

We tested the *SPs* algorithm on raw one-second data acquired at the Easter Island magnetic observatory in July and August 2009 (IPM, Fig. 1). The data include measurement values of the three components of the geomagnetic field vector along the North (*X*), East (*Y*) and downward vertical (*Z*) directions before baseline correction, and total intensity *F* of the geomagnetic field. Each 1-day 1-channel record registered with 1 Hz frequency consists of 86,400 data points.

Statistical information on spikes from 01/07/2009 to 20/07/2009 recognized by eye on magnetograms.

Channel | Number of spikes | Min amplitude, nT | Max amplitude, nT | Mean amplitude, nT | Min duration, s | Max duration, s | Mean duration, s |
---|---|---|---|---|---|---|---|

| 1119 | 0.100 | 82.280 | 1.298 | 9 | 190 | 27.330 |

| 1122 | 0.080 | 100.340 | 1.093 | 4 | 190 | 27.193 |

| 996 | 0.100 | 20.640 | 0.371 | 6 | 470 | 28.861 |

| 1135 | 0.102 | 61.770 | 0.918 | 9 | 439 | 31.719 |

Statistical information on spikes from 21/07/2009 to 31/07/2009 recognized by eye on magnetograms.

Channel | Number of spikes | Min amplitude, nT | Max amplitude, nT | Mean amplitude, nT | Min duration, s | Max duration, s | Mean duration, s |
---|---|---|---|---|---|---|---|

| 853 | 0.100 | 12.630 | 1.200 | 7 | 449 | 26.917 |

| 844 | 0.140 | 12.430 | 0.972 | 9 | 449 | 27.096 |

| 774 | 0.090 | 106.510 | 0.570 | 7 | 449 | 27.722 |

| 846 | 0.088 | 61.130 | 0.932 | 6 | 449 | 31.072 |

*X*,

*Y*,

*Z*,

*F*separately. As a result, we were able to obtain the optimal free parameter values of the algorithm for each channel independently. In order to select optimal values of free parameters, we implemented a brute-force search, i.e., we systematically tried a large number of values (Knuth, 1968). First, each 1-day 1-channel data series was processed by the algorithm using the following set of free parameter values: These values were pre-selected based upon the known behavior of the fuzzy comparison function and some preliminary tests. In total, |Π| = 100 combinations of free parameters were tested. To assess recognition quality we introduce the following function to be minimized: where

*SPs*(

*π*) is a result of the algorithm operation with some combination of free parameter values

*π*expressed in a set of intervals on the time axis, which define recognized events;

*P*

_{1}is the probability of the first kind error (target miss) defined as Open image in new window (where

*N*is the number of spikes);

*P*

_{2}is the probability of the second kind error (false alarm) defined as Open image in new window (Bogoutdinov

*et al.*, 2010). In the criterion

*K*

_{λ}we put λ = 0.8, thus expressing a higher degree of importance of not missing spikes versus avoiding false alarms. The value of the parameter λ was obtained by testing the algorithm for λ = 0.1, 0.2, …, 0.9 on an arbitrary set of free parameters and selecting the value for which the best recognition was achieved.

One should note that the range of free parameter values given above is quite wide. In order to better identify free parameter values, we took a small neighborhood around the already found optimal solution. It entailed examination of additional 125 combinations of free parameters. Following the same line for assessing the recognition quality as on the first stage of learning, we obtained the optimal free parameter values for each channel. In Bogoutdinov *et al.* (2010), it was shown that different optimal combinations of the free parameters were found for different observatories recording one-minute data. It is expected that a similar situation will arise in the case of other observatories recording one-second second data.

Once the free parameters were fixed, we first tested the algorithm by applying it to the time interval from 21 to 31 July 2009 and comparing with the results of manual data cleaning. By separating the dataset in two parts, we thus made sure that the testing was performed on an independent dataset. Next, we applied the algorithm to the time interval from 1 to 31 August 2009 and then performed recognition of spikes by eye in order to check the results.

## 4. Results

### 4.1 Results of the learning phase

*K*

_{0.8}for each channel:

*X*and

*Y*, where the error probabilities varied between 3.5% and 11.5%. Less good results were obtained in the case of the vertical component

*Z*, where the error probabilities of the first and the second kinds were 17.1% and 18.0% correspondingly. This difference is attributed to the smaller average amplitude of the spikes on the

*Z*component during the learning phase time interval, which made them more difficult to detect.

Statistics on the events recognized by the algorithm *SPm = SPm*(*π****) from 1/07/2009 to 20/07/2009.

Channel | Spikes recognized by eye | Events recognized by the algorithm | Missed spikes | Extra events | Probability of an error of the 1st kind | Probability of an error of the 2nd kind | Quality criterion |
---|---|---|---|---|---|---|---|

| 1119 | 1168 | 53 | 102 | 0.047 | 0.087 | 0.055 |

| 1122 | 1224 | 39 | 141 | 0.035 | 0.115 | 0.051 |

| 996 | 1007 | 170 | 181 | 0.171 | 0.180 | 0.172 |

| 1135 | 1146 | 134 | 145 | 0.118 | 0.127 | 0.120 |

### 4.2 Results of the testing phase

Statistics on the events recognized by the algorithm *SPm* = *SPm*(*π**) from 21/07/2009 to 31/07/2009.

Channel | Spikes recognized by eye | Events recognized by the algorithm | Missed spikes | Extra events | Probability of an error of the 1st kind | Probability of an error of the 2nd kind | Quality criterion |
---|---|---|---|---|---|---|---|

| 853 | 854 | 50 | 51 | 0.059 | 0.060 | 0.059 |

| 844 | 884 | 36 | 76 | 0.043 | 0.086 | 0.051 |

| 774 | 731 | 108 | 65 | 0.140 | 0.089 | 0.129 |

| 846 | 789 | 124 | 67 | 0.147 | 0.085 | 0.134 |

### 4.3 Results of the blind test

*a priori*expert opinion. The results of the recognition by the algorithm

*SPs*=

*SPs*(

*π**) were subsequently evaluated by eye. The overall recognition statistics for the whole set of data are provided in Table 5.

Results of application of the algorithms @@, @@, @@ and @@ to the records obtained from 1 to 31 August 2009 and their assessment by experts.

Channel | Spikes recognized by eye | Events recognized by the algorithm | Missed spikes | Extra events | Probability of an error of the 1st kind | Probability of an error of the 2nd kind | Quality criterion |
---|---|---|---|---|---|---|---|

| 2122 | 2057 | 79 | 14 | 0.0372 | 0.0068 | 0.031 |

| 2143 | 2150 | 23 | 30 | 0.0107 | 0.0140 | 0.011 |

| 1786 | 1780 | 104 | 98 | 0.0582 | 0.0551 | 0.058 |

| 1963 | 1996 | 39 | 72 | 0.0199 | 0.0361 | 0.023 |

The probability of missed spikes for the *X* component is 3.72%, that of false alarms is 0.68%, to be compared with 4.7% of missed spikes and 8.7% of false alarms for the 1/07–20/07 time interval (Table 3) and 5.9% of missed spikes and 6.0% of false alarms for the 21/07–31/07 time interval (Table 4). In the case of the other components *Y, Z* and the total intensity *F* the blind test also demonstrated higher efficiency of the algorithm application comparing to results of learning and testing phases, which is well reflected in the corresponding values of *K*_{0.8} quality criterion (Tables 3–5).

The difference in algorithm recognition quality *K*_{0.8} obtained for records for August and July 2009 is likely due to the fact that it was easier to carry out manual data processing by eye having at the disposal the results of the algorithm recognition (August data), rather than to analyze raw magnetograms “from scratch” (July data). Thus for July data the quality of manual recognition of spikes turned to be worse. This shows that the algorithm significantly helped the recognition by eye. It also provides some estimate of the amount of errors made when relying on manual spike detection.

Missed spikes and extra events recognized by the algorithm in August data were separately examined and the following conclusions were made: usually extra events represent either geomagnetic pulsations or other natural geomagnetic signals occurring in a narrow frequency band, whereas missed spikes in some cases represent long anomalous intervals not caused by trucks or airplanes.

The results of the blind test confirm that the learned algorithm is able to detect most of the spikes, and shows that there is some variability from one day/week/month to the next.

## 5. Discussion

In the present paper we introduced the algorithm *SPs*, able to automatically recognize spikes caused by artificial disturbances in magnetic observatory data sampled every second. We applied this algorithm to the recently installed observatory in Easter Island, where nearby trucks and planes cause several tens of such spikes every day. We showed that, after a 20-day learning phase in July 2009, the algorithm is able to recognize more than 94% of the spikes on the three components and the intensity recordings in August 2009, while the percentage of false alarms is less than 6%. At all the stages the algorithm showed worse results in processing vertical component *Z*.

A detailed examination of the false alarms reveals that most of them are due to geomagnetic pulsations. It is indeed very difficult sometime, even for a trained data expert, to distinguish a pulsation from an artificial spike. The occurrence of a pulsation can generally be inferred from the simultaneous occurrence of a pulsation-like signal at a nearby observatory. This functionality is not included in the present version of the algorithm. In some rare cases, false alarms are due to the temporary increase of the background noise, whose origin is unknown.

*dF*=

*F*

_{s}−

*F*

_{v}between the field modulus

*F*

_{s}=

*F*directly measured by the scalar magnetometer and that

*F*

_{v}calculated from the three components measured by the vector magnetometer. Normally,

*dF*should vary by a up to a few tenths of nT around a constant non-zero value due to the differences in transfer functions and locations of the instruments. Instrumental spikes and other anomalies generally lead to a larger than normal value of

*dF*, which can easily be detected. Typical IPM disturbances caused by nearby trucks and planes do also cause an increase of the

*dF*absolute value, due to the distance between the two magnetometers (about two meters) and their different transfer functions. However, in some cases, the resulting

*dF*spike is not easily distinguishable from the instrumental noise, as can be seen in the example shown in Fig. 8. On the contrary, quite often

*dF*record does not reflect spikes, which are present in initial geomagnetic records. The corresponding example is given in Fig. 9. It should be noted that the both examples lie within one hour period of one day.

Another disadvantage of *dF* method is that it needs presence of both vector data on the three components and scalar data on total field intensity and consequently correct operation of the both devices is required. The method becomes invalid if one of the devices doesn’t work properly or registration of one of the three vector components is failed. On the contrary, data filtration using the *SPs* algorithm can be applied to any particular record regardless of the presence of other records. It makes the algorithm applicable not only at magnetic observatories but also at magnetic stations where only variational data registration is carried out.

We plan to carry out further studies on seasonal and activity level dependence of the recognition results. The described algorithm is currently being implemented in the operation of the Russian-Ukrainian geomagnetic data center hosted by the Geophysical Center of the Russian Academy of Sciences. The development of a web application based upon the *SPs* algorithm is also being considered, in order to make it available to the wider magnetic observatory community.

## Acknowledgements

We thank Alan W. P. Thomson and Hans-Joachim Linthe for constructive reviews. The research has been carried out in the framework of collaboration program between Institut de Physique du Globe de Paris (IPGP), Moscow Institute of Physics of the Earth of Russian Academy of Sciences (IPE RAS) and Geophysical Center of RAS (GC RAS). The development of the algorithm *SPs* has been carried out in the framework of the project number 12-05-90428 supported by Russian Foundation for Basic Research. The Easter Island (IPM) magnetic observatory is jointly operated by Dirección Meteorológica de Chile (DMC) and IPGP. We thank INTERMAGNET for promoting high standards of magnetic observatory practice (www.intermagnet.org). This is IPGP contribution number 3288.