Analysis of Geoeffective Impulsive Events on the Sun During the First Half of Solar Cycle 24

A coronal mass ejection (CME) is an impulsive event that emerges rapidly from the Sun. We observed a quiet Sun without many spectacular episodes during the last decade. Although some fast halo and partial halo CMEs had taken place, among them was the backside CME on 23 July 2012. In this work, we verify the link between the variability of solar-wind, heliospheric and geomagnetic parameters and the transmission grid failures registered in southern Poland during 2010 – 2014 when many geomagnetic storms appeared, caused by halo and partial halo CMEs. We aim to apply three machine learning methods: Principal Components Analysis, Self-Organizing Maps, and Hierarchical Agglomerative Clustering to analyze sources on the Sun and the impacts of the intense geomagnetic storms in the first half of Solar Cycle 24. The conducted analyzes underline the importance of solar-wind proton temperature and point out other solar-wind and geomagnetic parameters independently indicated by all the methods used in this study.


Introduction
The Sun's magnetic field leads to many effects that are collectively called solar activity, such as solar flares, solar energetic particle (SEP) events, Coronal Mass Ejecta (CMEs), coronal holes, etc. (e.g., Priest, 2001;Nitta et al., 2021). A. Gil gila@uph.edu.pl A CME is a significant release of plasma and accompanying magnetic field from the Sun's corona into the solar wind. CMEs are often associated with solar flares and other forms of solar activity and are the subject of ongoing extensive studies (e.g., Papaioannou et al., 2016;Kilpua et al., 2021;Nitta et al., 2021;Asvestari et al., 2022;Rodriguez et al., 2022). Hess and Zhang (2017) identified 70 Earth-affecting interplanetary CMEs during Solar Cycle 24 using in situ observations from NASA's Advanced Composition Explorer (ACE). They presented a statistical study of the properties of these events, including the source regions. In addition, the authors examined the characteristics of CMEs that are more likely to be highly geoeffective and examined the effect of the flare strength. They found that Earthaffecting CMEs in the first half of Cycle 24 were more likely to come from the northern hemisphere. After April 2012, this reversed, and these events were more likely to originate in the southern hemisphere, following the observed magnetic asymmetry in the two hemispheres.
Fortunately, the most intense CME of Solar Cycle 24, on 23 July 2012, with only 18.5 h Sun-to-Earth shock transit time, was the backside one. It originated in the active region of S17W141, 11520 NOAA. The shock speed exceeded 2000 km s −1 , with the highest peak speed around 2600 km s −1 (e.g., Gopalswamy et al., 2016;Desai et al., 2020). Based on the plasma properties and measured magnetic field, it was assessed that this eruption would have caused a geomagnetic storm analogous to the Carrington storm (if it had been directed to the Earth). A couple of days before, the same active region was also a source of the relatively powerful halo CME that caused an intense geomagnetic storm (Gopalswamy et al., 2014(Gopalswamy et al., , 2016Gil et al., 2020a).
The CME link to geomagnetic storms stems from the southward component of the heliospheric magnetic field (HMF) contained in the CME flux ropes and the sheath between the flux rope and the CME-driven shock. A typical storm caused by the CME is characterized by high speed, sizeable angular width, and solar source location close to the central meridian. For CMEs originating at larger central meridian distances, the storms are mainly caused by the sheath field. Both the magnetic and energy contents of storm-producing CMEs can be traced to the magnetic structure of active regions and the free energy stored in them (Gopalswamy, 2009;Zhang et al., 2021;Palmerio et al., 2022).
GICs affecting the regular operation of electrical systems are one of the space-weather manifestations observed at the Earth's surface. During space-weather events, strong electric currents appear in the ionosphere, influencing the background of Earth's magnetic field. In addition, the triggered currents can emerge in the natural and technical conductors at the Earth's surface (e.g., Trichtchenko and Boteler, 2002;Viljanen et al., 2014). Electric transmission lines and buried pipelines are typical examples of such conductor systems. As a result, GICs can cause problems, such as increased corrosion of pipeline steel and damaged high-voltage power transformers. In addition, GICs can also affect geophysical exploration surveys, as well as oil and gas drilling operations (Oliveira and Ngwira, 2017). Substantial distortions in transmission line operation can be produced by GICs caused by rapid variations of the geomagnetic field, which are reflected by high values of its first derivative. The detection of transmission line emission at large distances from a three-phase power line indicates its unbalanced operation (Veeramany et al., 2016;Fiedorov, Mazur, and Pilipenko, 2021;Pilipenko, 2021).
In Eroshenko et al. (2010), the authors considered seventeen severe magnetic storms, which occurred in 2000 -2005, and showed that anomalies appeared during each of these storms in the operation of the Russian railway system. It was linked to the main phase of the most substantial part of the geomagnetic storm. Schrijver and Mitchell (2013), quantifying impacts of the geomagnetic storms on US electric transmission lines in 1992 -2010, found that ∼ 4% of the disturbances in this power grid reported to the US Department of Energy were due to strong geomagnetic activity.
Our work primarily focuses on the search for statistical relationships between solar and heliospheric parameters, illustrating short-term, rapid changes on the Sun and their effects on ground-based technology in Poland during the period 01.01.2010 -11.07.2014, when many intense geomagnetic storms appeared (Gil et al., 2021).
The article is organized as follows: In Section 1, we announce the background of our studies. In Section 2, we present the data used in the analysis, describing solar, interplanetary, and geomagnetic conditions. Sections 3 and 4 show the methodology and results obtained, respectively. In Section 5, we summarize our investigations.

Analyzed Parameters
In this work, we analyzed solar, heliospheric, and geomagnetic parameters during the first half of Solar Cycle 24, precisely in the period 01.01.2010 -11.07.2014. These time series illustrate the state of the Sun, its activity level, as well as its geoeffectiveness (e.g., Temmer, 2021;Zhang et al., 2021). Our attention was especially drawn to the moments of the halo and partial-halo fast CMEs (quite often being drivers of the large solar particle events, e.g., Gopalswamy et al., 2018)  We started with the calculation of the first derivative of the geomagnetic field component dB X /dt. Then, for each consecutive three-hour period, we found the maximal value of this derivative and called this parameter max(dB X /dt). Values of max(dB X /dt) allowed us to characterize the most decisive changes in the local magnetic field during each considered storm.
Additionally, we considered values of the geoelectric field at the Earth's surface (Ec [mV m −1 ]). The estimation of Ec was performed by building a 1D conductivity model  (Ádám, Prácser, and Wesztergom, 2012;Gil et al., 2021), and then it was conjugated with the values of the geomagnetic field (B X and B Y ). The multiplication of the Fourier transform of the magnetic field by the 1D transfer function was performed in the frequency domain. Then the application of the inverse Fourier transform of the output geomagnetic field allowed a return to the time domain (e.g., Boteler, Pirjola, and Marti, 2019). Finally, we considered detailed data of electrical grid failures/anomalies (EGFs) in South Poland from 01.01.2010 to 11.07.2014 (details in Gil et al., 2020b). Based on transmission line operator logs, we have aggregated particular causes into six general clusters: associated with (A) meteorological effects, (B) operational shutdowns, (C) vandalism, linked to (D) the aging of infrastructure elements (Ag), (E) connected to the unreliability of electronic devices (Ed), and (F) having unknown reasons (Un). A similar categorization was defined by Zois (2013). The first three clusters (A-C), ∼ 73.7% of all registered failures, might be considered objective reasons. The latter three (D-F) may be treated as failures having a solar origin (to some extent), and only these were further considered in the analysis of the influence of heliospheric parameters and geomagnetic indices separately for each type of failures/anomalies during each of the analyzed geomagnetic storms. The number and percentage of particular clusters of EGF during the whole studied period are shown in Table 1.

Data Processing
For data preparation, we used cross-correlation to measure similarity (e.g., Box, Jenkins, and Reinsel, 2008) between vectors representing solar-wind, heliospheric, and geomagnetic parameters, and vectors of transmission line failures in the three clusters of failures that could be linked to space-weather effects, i.e., failures caused by the aging of infrastructure elements, connected to the unreliability of electronic devices, and having unknown reasons.
Following Gil et al. (2021), we considered all intense geomagnetic storms during the studied period, that is, satisfying conditions: B z < −10 nT and Dst < −100 nT for more than three hours (Gonzalez and Tsurutani, 1987). In the period considered, 01.2010 -07.2014, 8 events with Dst < −100 nT with duration longer than three hours, 16 events with B z < −10 nT with duration longer than three hours, and 5 events that met both conditions, that is, B z < −10 nT and Dst < −100 nT during more than three hours (compare with Table 1 in Gil et al., 2021), were registered. It is worth underlining that all these storms were associated with the halo or partial halo CME (from cdaw.gsfc.nasa.gov/CME_list/).
We have performed our calculations for each intense geomagnetic storm, with data including 3 days before and 5 days after the geomagnetic storm, with analyzed sets always containing 64 of 3-hour data. The cross-correlation of vectors representing solar wind, heliospheric, and geomagnetic parameters (denoted in a formula by Y j , j ∈ {1, . . . , 13}) and . . , we evaluate the lag k cross-covariance as follows (Box, Jenkins, and Reinsel, 2008): whereF i ,Ȳ j indicate the sample means of the considered time series. The sample standard deviations are Hence, the estimate of the crosscorrelation r can be expressed in the following way (Box, Jenkins, and Reinsel, 2008): with k = 0, ±1, ±2, . . . . Thus, 39 plots were generated for each of them that present crosscorrelations values, r, and lags, k (in units of three hours). The sign of lag k is negative if the variability of failures/anomalies is preceded by changes in heliospheric and geomagnetic parameters, and positive in the opposite situation. Here, as an example, we present results obtained for the intense geomagnetic storm 26 -27.09.2011 in Figure 1.
The computed cross-correlations for the 19 geomagnetic storms showed that their values vary between −0.63 during the storm on 17.06.2012, for the parameter E y and EGF of Ed type, and 0.88 during the geomagnetic storm on 25.10.2011, for parameter max(dB X /dt) and EGF of Ed type.
Our analysis revealed that not for all storms and considered time series, cross-correlations give the results with 95% confidence bound. Furthermore, some geomagnetic storms (Sec- tion 2) showed that failures/anomalies sometimes occurred ahead of the intense geomagnetic storm (lag k values were positive). Thus, for a more detailed analysis, we took into account only these storms with cross-correlations r that were mostly above 95% confidence: (I) 09.03.2012, (II) 15 -16.07.2012, (III) 14.11.2012, (IV) 01.06.2013, and (V) 19.02.2014. For each of these storms, most of the lags k between electrical grid failures and solar, heliospheric, and geomagnetic parameters were negative. Detailed results of the cross-correlations for these five geomagnetic storms are presented in Tables 2 and 3. The obtained lag k was selected as the one for which the extreme cross-correlation value appeared, regardless of whether its sign was positive or negative. Hence, it is worth underlining that all the computed lags were taken into further calculations exactly as they were determined. As an illustration, we show results for geomagnetic storm on 19.02.2014 in Figures 2 -4. Tables 2 and 3 and Figures 2 -4 display that there are cases when this correlation between EGFs and solar-wind, heliospheric, and geomagnetic parameters is rather small, although mostly reaching the 95% confidence bounds. However, quite often, the absolute value of the correlation is above 0.50. For this particular storm, the highest r is at the level of 0.63 between SW T & EGF of Ag and Ed types. The length of the lags is changeable, both due to successive geomagnetic storms and due to various parameters. There is no regularity in the delay length. However, the lags for the geomagnetic indices are pretty similar. The same situation is found in the case of solar-wind parameters.
Based on the lags listed in Tables 2 and 3, each of the solar, heliospheric, and geomagnetic parameters has been shifted accordingly. Data processed this way has been used in the context of each method described in more detail in the article.

Principal Component Analysis
The essence of Principal Component Analysis (PCA) is a reduction of the n-dimensional data space of the variables X ⊆ R n to a m-dimensional one, with uncorrelated factors Table 2 Results of cross-correlations: lag k (the unit of lag equals three hours) and correlation r for five geomagnetic storms:  Z ⊆ R m , where m < n. Moreover, using PCA, one can search for possible regularities and dependencies between the investigated variables without prior knowledge of the data. Let X = (X 1 , . . . , X n ) be a vector with known variances for which the correlation or covariance matrix is also known. Using the properties of eigenvectors and eigenvalues, we want to find such new variables Z i , (i = 1, . . . , m) that are a linear combination of the old variables X p , (p = 1, . . . , n) and will transfer information from these old variables as well as possible (Jolliffe, 2002). In what follows, α i · X T = n p=1 a ip · X p , where α i = (a i1 , a i2 , . . . , a in ) and . . , m thus Z i = α i · X T denotes the ith principal component. In this way, the above problem comes down to the determination of the a ip values of the α i vector such that the variance Var(Z i ) will be as large as possible. The coefficients α i can be obtained from the matrix equation (S − λ I ) α i = 0, where S and I are the covariance matrix and identity matrix, respectively. Therefore, the solution α i = 0 is the eigenvector corresponding to the eigenvalue λ i . Moreover, λ i = Var (Z i ). When the correlation analysis is based on the correlation matrix, these values are interpreted as correlation coefficients between the original variables and the given principal component. The principle of creating components is that the next components are uncorrelated with each other and maximize the variability not explained by the previous component (Jolliffe, 2002). For this purpose, the PCA method orders the vectors according to the eigenvalues from the largest to the smallest. Thus it eliminates the variables with the smallest eigenvalues. This is known as the scree test (Cattell criterion), which has a geometric interpretation for the eigenvalues and the percentage of clarifying variables (Jolliffe, 2002). If the eigenvalue is very small, the variance is also very small, so the data are focused around the straight line which contains the eigenvector. In conclusion, the feature adds little information to the new set of components. In other words, these eigenvalues from maximal to minimal give us the order of the principal components, while the sum of variances of the variables Z i is equal to the sum of the variances of the original variables X p . Therefore, the transformation of the variables does not lead to the loss of information about the processes studied. Before using the PCA method, we check two assumptions: Bartlett's test and Kaiser-Mayer-Olkin (KMO) coefficient (Jolliffe, 2002). Bartlett's test is applied to prove the hypothesis H 0 : R = 1, which means that correlation coefficients r ps = 0 for p = s whereas r ps = 1 for p = s, p, s = 1, . . . , n. The wherer ps means partial coefficient of correlation and has the form Here c ps , c pp , c ss are the algebraic complement of r ps , r pp and r ss element, respectively. The condition is 0.5 < KMO < 1, that is, i =j i =jr 2 ps < p =s p =s r 2 ps .

Self-Organizing Maps
Self-Organizing Maps (SOM) are a type of artificial neural network. In SOM, each input layer neuron is connected to each output layer neuron. This structure allows to project a multidimensional space of input data (a single vector of input data is treated as one dimension) on the most often two-dimensional matrix of neurons, the so-called map of neurons (Kohonen, 1990). Neuronal maps are presented in the form of a hexagonal or rectangular grid, depending on the selected network topology. A hexagonal grid is most often used because it does not favor any direction, and the distance between neighboring neurons is equal. The connections between adjacent neurons are assigned a certain weight. In a single SOM training iteration, not only the weights of the winning neuron but also the weights of its closest neighborhood are updated based on the selected neighborhood function and the size of that neighborhood (Hu et al., 2019). Weights are updated based on Kohonen's rule: where w ij is the weight j of the neuron i, β(t) is the learning rate, h ci (t) is the neighborhood function (Caldas et al., 2017) that determines the neighborhood of the winner neuron, and x(t) is the input vector. The distance between the given neurons on the map reflects the similarity between the input elements. This allows us to find correlations between the input data. SOMs are most commonly used for data clustering and classification problems since SOMs aggregate similar data into clusters (Caldas et al., 2017;Lampinen and Oja, 1992). In our case, SOM was used to group selected heliospheric and geomagnetic parameters with electric grid failures.

Hierarchical Agglomerative Clustering
Cluster analysis is a broad class of data-mining methods in which data sets placed in the same group (or cluster) are more similar to one another than to those objects placed in other groups. Classification algorithms can be divided into two central models: (1) supervised and (2) unsupervised learning. Hierarchical agglomerative clustering (HAC) (e.g., Müllner, 2013) is an unsupervised learning method similar to SOM. This type of method is less biased and can adapt to unique situations because the resultant clusters are based on models that have not been previously trained. HAC methods work by grouping objects from the bottom up. Each data entity starts as its own 'cluster', and clusters are merged based on similarities until a significantly reduced number of groups are presented as a final solution. Ward's method (Ward, 1963) for clustering is among the most popular approaches for HAC. It is the only method among the agglomerative clustering methods that is based on a classical sum-of-squares criterion, producing groups that minimize within-group dispersion at each binary fusion. In addition, Ward's approach is useful because it looks for clusters in multivariate Euclidean space. That is also the reference space in multivariate ordination methods, particularly in PCA. Ward's linkage uses the incremental sum of squares, the increase in the total withincluster sum of squares due to joining two clusters. The within-cluster sum of squares is defined as the sum of the squares of the distances between all objects in the cluster and the cluster's centroid. The sum of squares metric is equivalent to the following distance function: where • · 2 is the Euclidean distance, • x r and x r are the centroids of clusters s and r, • n r and n s are the numbers of elements in clusters r and s.
A HAC result is typically visualized as a dendrogram. A horizontal line represents each merge. The y-coordinate of the horizontal line is the similarity of the merged clusters, where data series are viewed as singleton clusters. This similarity is called the combination similarity of the merged cluster. A dendrogram allows us to reconstruct the history of merges that resulted in the depicted clustering by moving up from the bottom layer to the top node. A fundamental assumption in HAC is that the merging operation is monotonic. Monotonic means that if s 1 , s 2 , ..., s K−1 are the combination similarities of the successive merges of a HAC, then s 1 ≥ s 2 ≥ · · · ≥ s K−1 holds.

Principal Component Analysis
We applied the PCA technique described in Section 3.1. We were looking for the family of each type of failure: Ag, Ed, or Un and 13 parameters describing the state of heliosphere near the Earth's vicinity for 5 distinguished geomagnetic storms (Section 2.2) trying to indicate the principal components for each type of failure. Under the required assumptions, that is, Bartlett's test and the KMO coefficient (see Section 3.1), we present the results of the PCA analysis for failures caused by the aging of infrastructure elements (Ag) ( Table 4). During our calculations, we obtained the set of eigenvalues from 3 to 6 when the eigenvalue was more than 1, or its variance was more than 5%. In the case of EGF of Ag type, we get the minimal cumulative percentage of variance equal to 80.91% for 3 eigenvalues obtained for the 14.11.2012 storm and the maximal cumulative percentage of variance equals 86.14% for 6 eigenvalues obtained for the 19.02.2014 storm (not shown here). Calculations for EGF of Ed-type got us 76.44% of cumulative variance for 4 eigenvalues for the 19.02.2014 storm, which was the minimum value, and 87.57% for 5 eigenvalues obtained on 15 -16.07.2012 as the maximal cumulative variance. For EGF of Un-type, our results are the following: 84.73% cumulative variance for 5 eigenvalues on 1.06.2013 and cumulative variance equals 85.08% for 5 eigenvalues obtained To be more precise, we need to take the 3rd, 4th, and 5th principal components so that the contribution in the transfer of information from each parameter is at least 80%. Similar results we obtained for other geomagnetic storms except for 15 -16.07.2012, when we got set of Dst-index, B, B z and E y (see Table 4). On the other hand, PCA analysis revealed that ap, AE, and Ec had the largest values of the factors coordinates of variables, and it means that ap, AE, and Ec are the parameters grouped together in the first principal components of considered geomagnetic storms, except 15 -16.07.2012. We present the arrangement of parameters considered in the PCA diagrams for the aging type of failures (Ag), connected to the unreliability of electronic devices (Ed), and having unknown reasons (Un) (Figures 5 -7). Figures 5 -7 present only two principal components that have the largest cumulative variance (the projection on the OZ 1 Z 2 plane). However, it is worth underlying that usually two principal components are enough to determine the space described by new variables (more than 50%); otherwise, it does not disqualify the other components. The analysis of variance during geomagnetic storms on 9.03.2012 gave us for Ag and the other heliospheric and geomagnetic parameters 5 principal components with cumulative variance 83%, on 15 -16.07.2012 5 eigenvalues and 85.4%, 14.11.2012 3 eigenvalues with 80.9%, 1.06.2013 4 eigenvalues with 82.6%, and on 19.02.2014 gave us up to 6 principal components and 81.1% of the total variance. The application of PCA for the other geomagnetic storms re-  vealed a change of the basic set of components. In the projection of Ag, Ed, Un, and the heliospheric and geomagnetic parameters in the plane of two factors with the highest variance ( Figures 5 -7), the observations concern the factor coordinates of the variables corresponding to the correlations between these parameters and the variables of the main components. It suffices to show that the longer the eigenvector of a particular variable is, the higher is the correlation of a given variable with a given main component. Taking into account the first component and EGF of Ag type, mentioned eigenvector of Ag is long, medium, medium, long, medium for geomagnetic storms on 9. 03.2012, 15 -16.07.2012, 14.11.2012, 1.06.2013, and 19.02.2014 (see Figure 5), respectively. The eigenvectors for Ed and Un are not so long for storm on 14.11.2012 (see, Figures 5 -7), nor do they satisfy KMO for storms 9.03.2012 (Ed), 15 -16.07.2012 (Un), 1.06.2013 (Ed), 19.02.2014 (Un). In Figures 5 -7, quite compact clusters of geomagnetic storm parameters can be distinguished. Our next objective is to note that both Ag and the eigenvectors of the other parameters are situated in the second or third quadrant of the coordinate system, contrary to B z , Dst-index, SW d. For Ed, we see a similar arrangement as for Ag. Figures 5 -7 show that on the PCA map all types of failures, Ag ( Figure 5), Ed (Figure 6), and Un (Figure 7), are closely connected to solar wind protons temperature.

Self-Organizing Maps Analysis
Models built based on neural networks are statistical and nondeterministic. They approximate the conditional mean value of the modeled input variables on the outputs. Moreover, the approximation process depends on several random elements, e.g., selecting weights of connections between neurons. So, in Self-Organizing Maps, maps with the same topology, the same training model parameters as the neighborhood function, or the number of iterations with the exact input data representation may show different results. However, in many cases, the conclusions of these seemingly different maps may still be similar (Lippmann, 1987;Wehrens, 2009). For this reason, the SOM analysis was performed 50 times for each combination of a given geomagnetic storm and a given type of electric grid failure to check the percentage of the number of results in which the heliospheric and geomagnetic parameters are grouped with a given failure in a given storm. Data sets with a 3-hour resolution were standardized from 01.01.2010 to 11.07.2014. The model parameters were a 4 × 4 neuron grid, 100 training iterations, and a link distance function as a neighborhood distance function.
The SOM analysis showed that different heliospheric and geomagnetic data are grouped into the same cluster with a given type of failure in each considered storm. However, despite  obtaining different results, it is possible to notice a group of parameters that correlate with a given type of failure in most analyzed cases.
Example results of our analysis are presented in Figures 8 -10 for the geomagnetic storm on 19.02.2014, for the three groups of electric grid failures: caused by the aging of infrastructure elements (Ag), connected to the unreliability of electronic devices (Ed), and having unknown reasons (Un). Figures 8 -10 show the SOM neighbor weight distances, which describe the weight assigned to the connections between neighboring neurons, labeled blue hexagons. The red lines show how the neurons are connected. The colors in the areas with the red lines represent the connection's weight between neurons. The brighter the color, the shorter the distance between neurons is. Yellow represents a weight equal to 1, which means the shortest distance between neighboring neurons and a grouping into the cluster of the parameters from the studied space. The darker the color (black means weight equal to 0), the longer the distance between neurons. This means that the parameters grouped in such neurons are not correlated. Our analysis showed that the Ag type of failure in all analyzed geomagnetic storms is often grouped -over 60% of the results, with the SW d proton density. For example, in the storm on 09. 03.2012 and 15 -16.07.2012, the SW d is grouped with an Ag type of failure in more than 95% of the obtained results. Also, other solarwind parameters, such as the speed SW s and the temperature SW T of protons, are grouped into more than 80% of the results with failure related to aging of infrastructure elements in storms 15.07.2012, 19.02.2014, and 14.11.2012. A similar tendency is also visible in all storms, excluding the storm on 14.11.2012 in the B y component of HMF (in a cluster with a Ag type of failure in more than 80% of the obtained results). Furthermore, the following parameters (often more than 70% of obtained results) correlate with the Ag-type failure: K, Ec, ap and AE.
The same situation can be observed for the solar-wind parameters: SW d, SW s, and SW T can be noticed in the case of the failure Un. Also, with this type of failure, in more than 60% of obtained results, the HMF component B y is grouped with a failure, except for the storm on 19.02.2014 and on 09.03.2012. In more than 60% of the results, in at least four analyzed storms in the same cluster with a failure having an unknown reason, a computed geoelectric field Ec, ap-and AE-index are also grouped.
Slightly different dependencies can be observed in the analyzed storms between the solarwind parameters and EGFs connected to the unreliability of electronic devices. However, these parameters are grouped into the same cluster in more than 65% of the results in the storms on 15 -16.07.2012, 19.02.2014, and 14.11.2012. Proton density SW d and B y components of HMF are also grouped with this failure in more than 60% of the results in all considered storms.
The results of the SOM analysis revealed that the highest impact on each of the considered electric grid failures (Ag, Ed, Un), regardless of the geomagnetic storm, seems to have one of the solar-wind parameters, the proton density SW d. In the case of storms on 15 -16.07.2012, 14.11.2012, and 19.02.2014, it is also possible to indicate the correlation between all analyzed failures and other SW parameters -temperature SW T and speed SW s. Moreover, depending on the storm with a given type of failure, different heliospheric parameters and geomagnetic indices may group in the same cluster, for example, in the storm on 19.02.2014, together with the Ag type of failure parameters: K, Ec, B, B y , ap and AE are grouped in more than 70% of the results, while in the storm on 09.03.2012, only the parameter B y correlates with the failure of the Ag type -more than 70% of the results. All obtained results are given in Table 5. Hence, it is difficult to generalize and evidently list a specific set of parameters that could affect the occurrence of a given type of electric grid failure during more than one storm. Results in percentages obtained from all the SOM analyses for all considered geomagnetic storms and failures caused by the aging of infrastructure elements (Ag), connected to the unreliability of electronic devices (Ed), and having unknown reasons (Un). Green denotes combinations in which the given heliospheric or geomagnetic parameter was grouped with the given electric grid failures in more than 60% of conducted experiments.

HAC with Ward's Linkage Results
We performed the HAC with Ward linkage of the solar, heliospheric and geomagnetic parameters described in Section 2.1 in detail, merged with the three types of failure (Ag, Ed, Un) during five geomagnetic storms. The HAC performs optimally if all variables are independent and can be well described by a normal distribution (Norusis, 2011). Thus, we have used the standardized data shifted for the cross-correlation factors as described in Section 2.2. The results are presented in the form of the hierarchical binary cluster tree called the dendrogram in Figure 11.
One way to determine the natural cluster divisions in a data set is to compare the height of each link in a cluster tree with the heights of the neighboring links below it in the tree. A link that is approximately the same height as the links below indicates that there are no distinct divisions between the objects that are joined at this hierarchy level. These links exhibit a high level of consistency because the distance between the data being bound is approximately the same as the distances between the objects they contain. On the other hand, a link whose height differs noticeably from the height of the links below indicates that the data bound at this level in the cluster tree are much farther apart from each other than their components were when they were joined. Taking the above into account and analyzing the dendrogram presented in Figure 11, we see that the considered data were divided into two main clusters. Unique colors are assigned to each group of nodes in the dendrogram whose linkage is less than 0.7 of the maximum link.
We are most interested in what heliospheric and geomagnetic parameters are within a small distance from the network failures (Ag, Ed, Un) during geomagnetic storms. The smaller the link height between transmission line failures (Ag, Ed, Un), the stronger the relationship with the considered solar activity parameter is. Figure 11 presents the results for the storm that took place on 19.02.2014. One can see that the parameters SW T and SW s are closest to all types of failures/anomalies. In the same central cluster are grouped the parameters By, SW d, and max(dB X /dt) for the Ag failures; the parameters SW d and max(dB X /dt) for the Ed failures and SW d and max(dB X /dt) for the Ag failures AE and max(dB X /dt) for the Un type of failures. Very similar clusters were recorded for the storm on 15 -16.07.2012. During this storm, the parameters of the same cluster with a distance below 20 are grouped: SW d, SW s, SW T and B y , for both Ag and Ed types of failures and SW d, B y , K, SW T , SW s for Un failures. The HAC analysis for the storm on 14.11.2012 revealed the cluster containing the SW s, SW d parameters with a distance below 20 for Ag failures; max(dB X /dt), SW s, SW d, and E y for Ed failures, while for Un failures were listed SW s, max(dB X /dt), and K parameters. During the storm on 9.03.2012, the list of parameters linked with distance below 20 is shorter and contains for Ag failures the parameters: SW d, B y , while for Ed and Un failures only the parameter SW d. Slightly different results were obtained for the storm that occurred on 1.06.2013. The closest to the Ag and Ed failures are the B y , E y parameters, and the B y , max(dB X /dt) and B parameters for Un failures.
The results obtained by the HAC support the results of the SOM analysis. A careful examination of the dendrograms revealed no one-to-one correspondence in the cluster structure between the storms. Nevertheless, in most of the considered storms, the solar-wind parameters (SW T , SW d, SW s) are at a small distance from the failures. Furthermore, the HMF components B y and variation max(dB X /dt) are also interchangeably grouped with the failures for all storms, while the electric field component E y is clustered with the failures during the storm on 01.06.2013.
Consequently, we can conclude that the HAC method suggests that, as a proxy for transmission line failures, the solar-wind parameters, HMF B y component or max(dB X /dt), and the electric field component E y can be considered.

Summary
In this work, we have considered sources on the Sun (halo and partial halo CMEs) and the impacts of the five intense geomagnetic storms that occurred during the first half of Solar Cycle 24. Applying three machine learning methods: Principal Components Analysis, Self-Organizing Maps, and Hierarchical Agglomerative Clustering (PCA, SOM, HAC), we have tried to discover the hidden dependencies between the solar-wind and heliospheric parameters (B, B y , B z , SW s, SW d, SW T , E y ), as well as geomagnetic characteristics (Dst, ap, AE, K, max(dB X /dt)), and computed geoelectric field Ec with three types of transmission grid failures/anomalies registered in southern Poland. The main findings of this work can be summarized in the following way: i) Cross-correlation analysis showed that for some geomagnetic storms, the correlation between EGFs and solar-wind, heliospheric and geomagnetic parameters is relatively small, although statistically significant (fairly often, the absolute value of the crosscorrelation coefficient is above 0.50). Moreover, the lag length is changeable without apparent uniformity due to consecutive geomagnetic storms and different parameters. However, the lags for the geomagnetic indices are fairly comparable. A similar situation holds for solar-wind parameters. ii) The PCA analysis revealed that in the first principal component, the parameters ap, AE, and Ec occur together. The PCA map has shown that in the first principal component, SW T , Ey, and ap-index accompany failures caused by the aging of infrastructure elements, Ag, for all considered geomagnetic storms. iii) The SOM analysis showed that the solar-wind parameters have the most substantial impact on all the considered types of electric-grid failures in southern Poland in all the storms under study. These parameters are usually grouped in the same cluster with failures in more than 80% of the results. iv) The HAC analysis pointed out that the predictive factor for the transmission line failures should be searched among the solar-wind parameters, HMF B y component, and the electric field component E y , as well as max(dB X /dt).
In summary, the analyses conducted for the five chosen geomagnetic storms preceded by halo and partial halo CMEs showed that selected solar-wind and geomagnetic parameters can be treated as the primary/crucial for the analysis and an attempt to explain grid failures appearance. In the future, we plan to extend our analyses to a further period and attempt to establish common features of rapid changes on the Sun and link them to the geomagnetic storms for which there would be increased malfunctioning of energy infrastructure elements. We will also include registrations of the increased occurrence of higher harmonics of alternating current in the network as a possible repercussion of forceful solar variability in our studies.
Acknowledgments Sz. Moskwa help with EGF data is gratefully appreciated.
Author contributions A.G. planned the scientific content. All authors wrote and reviewed the manuscript, A.G. prepared Figures 1 -4

Competing interests
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.