Abstract
A consistent approach for the inherently imbalanced problem of solar energetic particle (SEP) events binary prediction is being presented. This is based on solar flare and coronal mass ejection (CME) data and combinations of both thereof. We exploit several machine learning (ML) and conventional statistics techniques to predict SEPs. The methods used are logistic regression (LR), support vector machines (SVM), neural networks (NN) in the fully connected multilayer perceptron (MLP) implementation, random forests (RF), decision trees (DTs), extremely randomized trees (XT) and extreme gradient boosting (XGB). We provide an assessment of the methods employed and conclude that RF could be the prediction technique of choice for an optimal sample comprised by both flares and CMEs. The bestperforming method gives a Probability of Detection (POD) of 0.76(±0.06), False Alarm Rate (FAR) of 0.34(±0.10), true skill statistic (TSS) 0.75(±0.05), and Heidke skill score (HSS) 0.69(±0.04). We further show that the most important features for the identification of SEPs, in our sample, are the CME speed, width and flare soft Xray (SXR) fluence.
This is a preview of subscription content, access via your institution.
Notes
E.g. class weight \(\{c_{}:1, c_{+}:500\}\).
Class weight \(\{c_{}:1, c_{+}:1\}\) in hyperparametrizations of all 5 folds.
We note that the aforementioned process does not lead to overfitting, since trial and error iterations are performed strictly on the training set, while the selected models are finally assessed on unseen test sets.
References
Alberti, T., Laurenza, M., Cliver, E.W., Storini, M., Consolini, G., Lepreti, F.: 2017, Solar activity from 2006 to 2014 and shortterm forecasts of solar proton events using the ESPERTA model. Astrophys. J. 838(1), 59. DOI. ADS.
Anastasiadis, A., Papaioannou, A., Sandberg, I., Georgoulis, M., Tziotziou, K., Kouloumvakos, A., Jiggens, P.: 2017, Predicting flares and solar energetic particle events: The FORSPEF tool. Solar Phys. 292(9), 134. DOI. ADS.
Anastasiadis, A., Lario, D., Papaioannou, A., Kouloumvakos, A., Vourlidas, A.: 2019, Solar energetic particles in the inner heliosphere: Status and open questions. Phil. Trans. Roy. Soc. London Ser. A 377(2148), 20180100. DOI. ADS.
Aran, A., Sanahuja, B., Lario, D.: 2005, Fluxes and fluences of SEP events derived from SOLPENCO. Ann. Geophys. 23(9), 3047. DOI. ADS.
Balch, C.C.: 2008, Updated verification of the space weather prediction center’s solar energetic particle prediction model. Space Weather 6(1), S01001. DOI.
Barth, J.L.: 2004, Prevention of spacecraft anomalies—The role of space climate and space weather models. In: Daglis, I.A. (ed.) Effects of Space Weather on Technology Infrastructure, Springer, Dordrecht, 123. DOI.
Bein, B.M., BerkebileStoiser, S., Veronig, A.M., Temmer, M., Vršnak, B.: 2012, Impulsive acceleration of coronal mass ejections. II. Relation to soft Xray flares and filament eruptions. Astrophys. J. 755(1), 44. DOI. ADS.
Belov, A., Garcia, H., Kurt, V., Mavromichalaki, H., Gerontidou, M.: 2005, Proton enhancements and their relation to the Xray flares during the three last solar cycles. Solar Phys. 229(1), 135. DOI. ADS.
Bishop, C.M.: 2006, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, Berlin. ISBN 0387310738.
Brieman, L.: 2001, Random forests. Mach. Learn. 45, 5.
Brueckner, G.E., Howard, R.A., Koomen, M.J., Korendyke, C.M., Michels, D.J., Moses, J.D., Socker, D.G., Dere, K.P., Lamy, P.L., Llebaria, A., Bout, M.V., Schwenn, R., Simnett, G.M., Bedford, D.K., Eyles, C.J.: 1995, The Large Angle Spectroscopic Coronagraph (LASCO). Solar Phys. 162(1–2), 357. DOI. ADS.
Camporeale, E.: 2019, The challenge of machine learning in space weather: Nowcasting and forecasting. Space Weather 17(8), 1166. DOI. ADS.
Cane, H., Richardson, I., Von Rosenvinge, T.: 2010, A study of solar energetic particle events of 1997–2006: Their composition and associations. J. Geophys. Res. Space Phys. 115(A8), A08101. DOI.
Carrasco Kind, M., Brunner, R.J.: 2013, TPZ: Photometric redshift PDFs and ancillary information by using prediction trees and random forests. Mon. Not. Roy. Astron. Soc. 432, 1483. DOI.
Chen, T., Guestrin, C.: 2016, Xgboost: A scalable tree boosting system. In: KDD’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785. ISBN 9781450342322. DOI.
Cliver, E.W.: 2016, Flare vs. shock acceleration of highenergy protons in solar energetic particle events. Astrophys. J. 832(2), 128. DOI. ADS.
Crosby, N.B.: 2007, Major radiation environments in the heliosphere and their implications for interplanetary travel. In: Bothmer, V., Daglis, I.A. (eds.) Space Weather Physics and Effects, 131. DOI. ADS.
Defazio, A., Bach, F., LacosteJulien, S.: 2014, Saga: A fast incremental gradient method with support for nonstrongly convex composite objectives.
Desai, M., Giacalone, J.: 2016, Large gradual solar energetic particle events. Living Rev. Solar Phys. 13(1), 3. DOI. ADS.
Domingos, P.: 2000, A unified biasvariance decomposition and its applications. In: Langley, P. (ed.) Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, 231.
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: 2008, Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1871. DOI.
Filali Boubrahimi, S., Aydin, B., Martens, P., Angryk, R.: 2017, On the prediction of >100 MeV solar energetic particle events using GOES satellite data. arXiv. ADS.
Garcia, H.A.: 1994, Temperature and emission measure from goes soft Xray measurements. Solar Phys. 154(2), 275. DOI. ADS.
Garcia, H.A.: 2004, Forecasting methods for occurrence and magnitude of proton storms with solar soft X rays. Space Weather 2(2), S02002. DOI. ADS.
Gopalswamy, N., Yashiro, S., Michalek, G., Stenborg, G., Vourlidas, A., Freeland, S., Howard, R.: 2009, The SOHO/LASCO CME catalog. Earth Moon Planets 104, 295. DOI. ADS.
Kahler, S.: 2001, The correlation between solar energetic particle peak intensities and speeds of coronal mass ejections: Effects of ambient particle intensities and energy spectra. J. Geophys. Res. 106(A10), 20947. DOI.
Kahler, S.W., Ling, A.G.: 2018, Forecasting Solar Energetic Particle (SEP) events with Flare Xray peak ratios. J. Space Weather Space Clim. 8, A47. DOI. ADS.
Kingma, D., Ba, J.: 2014, Adam: A method for stochastic optimization. In: International Conference on Learning Representations.
Lario, D., Aran, A., GómezHerrero, R., Dresing, N., Heber, B., Ho, G.C., Decker, R.B., Roelof, E.C.: 2013, Longitudinal and radial dependence of solar energetic particle peak intensities: STEREO, ACE, SOHO, GOES, and MESSENGER observations. Astrophys. J. 767(1), 41. DOI. ADS.
Laurenza, M., Alberti, T., Cliver, E.W.: 2018, A shortterm ESPERTAbased forecast tool for moderatetoextreme solar proton events. Astrophys. J. 857(2), 107. DOI. ADS.
Laurenza, M., Cliver, E., Hewitt, J., Storini, M., Ling, A., Balch, C., Kaiser, M.: 2009, A technique for shortterm warning of solar energetic particle events based on flare location, flare size, and evidence of particle escape. Space Weather 7(4), S04008. DOI.
Mertens, C.J., Slaba, T.C.: 2019, Characterization of solar energetic particle radiation dose to astronaut crew on deepspace exploration missions. Space Weather 17(12), 1650. DOI. ADS.
Mishev, A.L., Usoskin, I.G.: 2018, Assessment of the radiation environment at commercial jetflight altitudes during GLE 72 on 10 September 2017 using neutron monitor data. Space Weather 16(12), 1921. DOI. ADS.
Núñez, M., PaulPena, D.: 2020, Predicting >10 MeV SEP events from solar flare and radio burst data. Universe 6(10), 161. DOI. ADS.
Pacheco, D.: 2019, Analysis and Modelling of the Solar Energetic Particle Radiation Environment in the Inner Heliosphere in Preparation for Solar Orbiter, Universitat de Barcelona, Facultat de Física. http://hdl.handle.net/10803/667033.
Panasyuk, M.I.: 2001, Cosmic ray and radiation belt hazards for space missions. In: Daglis, I.A. (ed.) Space Storms and Space Weather Hazards, 251. ADS.
Papaioannou, A., Sandberg, I., Anastasiadis, A., Kouloumvakos, A., Georgoulis, M.K., Tziotziou, K., Tsiropoula, G., Jiggens, P., Hilgers, A.: 2016, Solar flares, coronal mass ejections and solar energetic particle event characteristics. J. Space Weather Space Clim. 6, A42. DOI. ADS.
Papaioannou, A., Anastasiadis, A., Kouloumvakos, A., Paassilta, M., Vainio, R., Valtonen, E., Belov, A., Eroshenko, E., Abunina, M., Abunin, A.: 2018a, Nowcasting solar energetic particle events using principal component analysis. Solar Phys. 293(7), 100. DOI. ADS.
Papaioannou, A., Anastasiadis, A., Sandberg, I., Jiggens, P.: 2018b, Nowcasting of Solar Energetic Particle Events using near realtime Coronal Mass Ejection characteristics in the framework of the FORSPEF tool. J. Space Weather Space Clim. 8, A37. DOI. ADS.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: 2011, Scikitlearn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825.
Pisacane, V.L.: 2008, The Space Environment and Its Effects on Space Systems, American Institute of aeronautics and Astronautics, Reston. ISBN 9781624103537. DOI.
Pomoell, J., Aran, A., Jacobs, C., RodríguezGasén, R., Poedts, S., Sanahuja, B.: 2015, Modelling large solar proton events with the shockandparticle model. Extraction of the characteristics of the MHD shock front at the cobpoint. J. Space Weather Space Clim. 5, A12. DOI. ADS.
Reames, D.V.: 2015, What are the sources of solar energetic particles? Element abundances and source plasma temperatures. Space Sci. Rev. 194(1–4), 303. DOI.
Robbins, H., Monro, S.: 1951, A stochastic approximation method. Ann. Math. Stat. 22(3), 400. DOI.
SalasMatamoros, C., Klein, K.L.: 2015, On the statistical relationship between CME speed and soft Xray flux and fluence of the associated flare. Solar Phys. 290(5), 1337. DOI. ADS.
Salim, M., Ahmed, A.R.: 2018, A family of quasiNewton methods for unconstrained optimization problems. Optimization 67, 1717. DOI.
Schmidt, M., Roux, N., Bach, F.: 2017, Minimizing finite sums with the stochastic average gradient. Math. Program. 162, 83. DOI.
Schrijver, C.J., Kauristie, K., Aylward, A.D., Denardini, C.M., Gibson, S.E., Glover, A., Gopalswamy, N., Grande, M., Hapgood, M., Heynderickx, D., Jakowski, N., Kalegaev, V.V., Lapenta, G., Linker, J.A., Liu, S., Mandrini, C.H., Mann, I.R., Nagatsuma, T., Nandy, D., Obara, T., Paul O’Brien, T., Onsager, T., Opgenoorth, H.J., Terkildsen, M., Valladares, C.E., Vilmer, N.: 2015, Understanding space weather to shield society: A global road map for 20152025 commissioned by COSPAR and ILWS. Adv. Space Res. 55(12), 2745. DOI. ADS.
Shea, M.A., Smart, D.F.: 2012, Space weather and the groundlevel solar proton events of the 23rd solar cycle. Space Sci. Rev. 171(1–4), 161. DOI. ADS.
Steyn, R., Strauss, D.T., Effenberger, F., Pacheco, D.: 2020, The soft xray Neupert effect as a proxy for solar energetic particle injectiona proofofconcept physicsbased forecasting model. J. Space Weather Space Clim. 10, 64.
Swalwell, B., Dalla, S., Walsh, R.W.: 2017, Solar energetic particle forecasting algorithms and associated false alarms. Solar Phys. 292(11), 173. DOI. ADS.
Trottet, G., Samwel, S., Klein, K.L., Dudok de Wit, T., Miteva, R.: 2015, Statistical evidence for contributions of flares and coronal mass ejections to major solar energetic particle events. Solar Phys. 290(3), 819. DOI. ADS.
Unzicker, A., Donnelly, R.F.: 1974, Calibration of xray ion chambers for the space environment monitoring system. Technical report com7510667, National Oceanic and Atmospheric Administration, Boulder, Colo. (USA), Space Environment Lab.
Vainio, R., Desorgher, L., Heynderickx, D., Storini, M., Flückiger, E., Horne, R.B., Kovaltsov, G.A., Kudela, K., Laurenza, M., McKennaLawlor, S., Rothkaehl, H., Usoskin, I.G.: 2009, Dynamics of the Earth’s particle radiation environment. Space Sci. Rev. 147(3–4), 187. DOI. ADS.
Van Rossum, G., Drake, F.L. Jr.: 1995, Python Tutorial, Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands.
Vapnik, V.: 2000, The Nature of Statistical Learning Theory 8, 1. ISBN 9781441931603. DOI.
Vlahos, L., Anastasiadis, A., Papaioannou, A., Kouloumvakos, A., Isliker, H.: 2019, Sources of solar energetic particles. Phil. Trans. Roy. Soc. London Ser. A 377, 20180095. DOI.
Vršnak, B., Sudar, D., Ruždjak, D.: 2005, The cmeflare relationship: Are there really two types of cmes? Astron. Astrophys. 435(3), 1149. DOI.
Vršnak, B., Ruždjak, D., Sudar, D., Gopalswamy, N.: 2004, Kinematics of coronal mass ejections between 2 and 30 solar radii. Astron. Astrophys. 423(2), 717. DOI.
Winter, L.M., Ledbetter, K.: 2015, Type II and type III radio bursts and their correlation with solar energetic proton events. Astrophys. J. 809(1), 105. DOI. ADS.
Yashiro, S., Gopalswamy, N.: 2009, Statistical relationship between solar flares and coronal mass ejections. In: Gopalswamy, N., Webb, D.F. (eds.) Universal Heliophysical Processes 257, 233. DOI. ADS.
Youssef, M.: 2012, On the relation between the CMEs and the solar flares. NRIAG J. Astron. Geophys. 1, 172. DOI. ADS.
Acknowledgements
The CME Catalog used in this work is generated and maintained at the CDAW Data Center by NASA and The Catholic University of America in cooperation with the Naval Research Laboratory. Funding for the early phase of the catalog was provided by AFOSR and NSF. Currently, the catalog resides at the CDAW Data Center at Goddard Space Flight Center and is supported by NASA’s Living with a Star program and the SOHO project. SOHO is a project of international cooperation between ESA and NASA. Angels Aran and Athanasios Papaioannou acknowledge the support from the project MDM20140369 of ICCUB (Unidad de Excelencia “María de Maeztu”). Athanasios Papaioannou and Anastasios Anastasiadis further gratefully acknowledge International Space Science Institute (ISSI) support to the ISSI International Team 441: High EneRgy sOlar partICle events analysis (HEROIC, https://www.issibern.ch/teams/heroic/) In addition, Athanasios Papaioannou acknowledges ISSI support to the ISSI International Team 464: The Role Of Solar And Stellar Energetic Particles On (Exo)Planetary Habitability (ETERNAL, https://www.issibern.ch/teams/exoeternal/). Angels Aran acknowledges the support by the Spanish Ministerio de Ciencia e Innovación (MICINN) under grant PID2019105510GBC31 and through the “Center of Excellence María de Maeztu 202020” award to the ICCUB (CEX2019000918M). The authors would further like to thank the anonymous referee for the valuable comments that improved the context of the paper.
Author information
Authors and Affiliations
Ethics declarations
Disclosure of Potential Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Classification Methods and Utilized Implementations
1.1 A.1 Logistic Regression
Logistic regression is a linear classification method, in the sense that the input feature space \(\mathcal{R}^{l}\) is divided into decision regions, bounded by linear decision surfaces (hyperplanes) of l1 dimensions. In case of binary classification, a single hyperplane partitions input variable space in two decision regions, corresponding to the positive and negative class. This hyperplane can be mathematically expressed as:
where the weight vector \(\boldsymbol{\Theta}\) is defined in a compact form composed of the direction (\(\boldsymbol{\theta}\)) and offset (\(\theta_{0}\)) as follows:
In case classes are linearly separable in the given feature space (Figure 7), the above formulation implies that:
The cost function to be minimized in algorithms implementing Logistic Regression classifiers, is generally based on the logistic sigmoid function:
The logistic sigmoid function’s characteristic property is that it squashes all possible outcomes into the interval (0, 1), for all input values of \(a \in\mathcal{R}\). The primary term of the cost function is formed as follows:
where C is the inverse of regularization strength, often set to the inverse of the total number of training samples N. An additional penalization term is usually included in the general form Equation 24.
Hyperparameters of the logistic regression classifier in the scikit learn implementation include:

i)
inverse regularization strength C: smaller values imply stronger regularization,

ii)
penalty: specifies L1 / L2 penalization term,

iii)
solver: the minimization algorithm implemented to solve the optimization problem, such as ‘sag’ Stochastic Average Gradient (Schmidt, Roux, and Bach, 2017), ‘saga’ (Defazio, Bach, and LacosteJulien, 2014) another incremental gradient algorithm based on Stochastic Gradient Descent (Robbins and Monro, 1951) and ‘liblinear’ which implements a Newton method (Fan et al., 2008),

iv)
random state: used to shuffle the data, when the solver is ‘sag’, ‘saga’ or ‘liblinear’,

v)
maximum number of iterations,

vi)
class weight: to control the strength of penalization applied to the wrong prediction of each class.
1.2 A.2 Support Vector Machines
This is a set of supervised learning methods that use a subset of training vectors, called support vectors, in fitting the discriminant function (Vapnik, 2000). Support vectors are instance vectors that lie within a predefined distance from either side of the decision boundary, named the functional margin. In case a linear form is adopted to describe the discriminant function, the goal is to define a hyperplane as Equation 20 that assigns most instances to the correct class, as indicated by sign(\(\boldsymbol{\theta}^{\top}\mathbf{x} + \theta_{0}\)) (Equation 22). The distance from the nearest instance vectors of all the classes to the decision boundary is maximized at the same time, aiming to increase model’s generalization ability. The distance d from a point \(\mathbf{x}_{*}\) to a hyperplane \(\boldsymbol {f}(\boldsymbol {\Theta}, \mathbf{x})\) defined by Equation 20 is:
Maximum distance is achieved by minimizing \(\\boldsymbol{\theta}\ ^{2}=\boldsymbol{\theta}^{\top}\boldsymbol{\theta}\). In addition, weights are scaled so that the values of the decision function \(\mathcal{C}\) at the nearest points to the decision surface are:
Taking into account Equation 22, it follows that in the case of perfect linear separability, the optimal solution would satisfy:
Given that usually classes are not perfectly separable with a hyperplane, each point is allowed a maximum distance \(\xi_{i}\) to its correct margin. The resulting optimization problem is formulated as:
The variables \(\xi_{i}\), called slack variables, define the maximum distance allowed for samples to their correct margin boundary. The coefficient C controls the strength of penalization posed when an instance vector \(\mathbf{x}_{i}\) is misclassified or within the functional margin. Small values correspond to wider margins and vice versa (Figure 8).
A common technique utilized under the SVM approach, in the case the problem is highly nonlinear in the original input space, is the kernel method. This method maps the problem to a higher dimensional space where linear separability of classes becomes more feasible. Instead of learning an explicit discriminant function that maps support vectors \(\mathbf{x}_{i}\) to their target labels \(y_{i}\), a similarity function or kernel is used to learn the particular weight of each training sample (\(\mathbf{x}_{i}, y_{i}\)). Unlabeled vectors \(\mathbf {x}_{j}\) can subsequently be classified according to their similarity to training vectors \(\mathbf{x}_{i}\), expressed as a proper inner product through the kernel function:
where training vectors are mapped to a higher (maybe infinite) feature space \(\mathcal{V}\) through feature mapping function \(\phi\):
The optimization problem can be formulated as a dual problem to the primal Equation 28, using the kernel trick:
where:

i)
e is a \(1 \times N\) vector of ones,

ii)
\(Q=[Q_{ij}]=[y_{i} y_{j} K(x_{i},x_{j})]_{i,j=1,\ldots,N}\) is a \(N \times N\) positive semidefinite matrix,

iii)
K is the kernel described by Equation 29,

iv)
coefficients \(\alpha_{i}\), called the dual coefficients, are upperbounded by C.
Common kernel functions utilized in SVM classification are:

i)
linear: \(\mathbf{K}=\langle x_{i}, x_{j}\rangle\), where <.> denotes the dot product between two ldimensional vectors,

ii)
polynomial: \(\mathbf{K}=(\gamma\langle x_{i}, x_{j} \rangle+ r)^{d}\), where d indicates the polynomial degree,

iii)
Radial Basis Function (RBF): \(\mathbf{K}=\exp(\gamma\ x_{i}x_{j}\^{2})\),

iv)
sigmoid: \(\mathbf{K}=\tanh(\gamma\langle x_{i}, x_{j} \rangle + r)\).
The solution to the optimization problem Equation 31 corresponds to the equation describing the decision surface, so that the label of a new sample \(\mathbf{x}_{j}\) can be predicted according to the sign of decision function’s output for this sample, using only the support vectors:
Hyperparameters used to describe an SVM classifier with a linear kernel in the scikit learn implementation include:

i)
penalty: specifies L1 / L2 penalization term,

ii)
random state: to be used for shuffling the data in the dual problem optimization (Equation 31),

iii)
C: the penalization strength controlling the margin,

iv)
class weight: to control the strength of penalization applied to the wrong prediction of each class,

v)
maximum number of iterations.
Hyperparameters used to describe an SVM classifier with a nonlinear kernel in the scikit learn implementation include:

i)
the kernel function,

ii)
\(\gamma\): the regularization coefficient of the kernel,

iii)
\(coef_{0}\): the independent term in the kernel function,

iv)
C: the penalization strength controlling the margin,

v)
the polynomial degree, in the case of a polynomial kernel,

vi)
class weight: to control the strength of penalization applied to the wrong prediction of each class,

vii)
maximum number of iterations.
Examples of SVM binary classification using different kernel functions are presented in Figure 9.
1.3 A.3 Neural Network (MultiLayer Perceptron)
Neural networks (NN) sequentially transform the input space into higher dimensional spaces, where classes can be linearly discriminated (Figure 10). This classifier is actually constructed as an ensemble of logistic regression classifiers (Bishop, 2006). The dimension of the original feature space corresponds to the number of nodes (neurons) of the input layer. For binary classification problems, the output layer has a unique node, providing the final classification result (0/1). The number of transformations needed to achieve the best possible separability of classes, defines the number of hidden layers. The number of nodes at each hidden layer indicates the dimensionality in each transformed space. Nodes are interconnected via the synapses, indicating transformations of variables, so that transformation coefficients to a given layer from the previous one are referred to as the synaptic weights.
Once we define a specific architecture for the NN classifier, it can be parametrized on the synaptic weights \(\boldsymbol{\Theta}\) that describe sequential transformations between layers. The cost as well can be expressed as a function of \(\boldsymbol{\Theta}\) and the classification error between predicted/actual results, subsequently to be minimized by a certain optimization algorithm, as the stochastic average gradient (Schmidt, Roux, and Bach, 2017).
Hyperparameters of the NN classifier, in the scikit learn implementation of MLP, include:

i)
number of hidden layers and their dimensionalities: described as tuples (\(d_{1}\), …, \(d_{k}\)), \(d_{i}\) = dimensions of ith hidden layer, k = number of hidden layers,

ii)
solver: the method for weight optimization,

iii)
learning rate: the method/schedule for synaptic weight updates (e.g., constant, gradual, adaptive),

iv)
the initial learning rate: controls the step for updating the weights,

v)
maximum number of iterations.
Commonly used optimization methods in the MLP implementation include:

i)
sgd: stochastic gradient descent (Robbins and Monro, 1951),

ii)
lbfgs: an optimizer in the family of quasiNewton methods (Salim and Ahmed, 2018),

iii)
adam: a stochastic gradientbased optimizer proposed by Kingma and Ba (2014).
Neural Networks are in general blackbox models, as high dimensional spaces are usually involved so that the model’s structure becomes too complex to interpret or visualize. On the other hand, NNs can tackle problems involving highly nonlinear underlying relations.
1.4 A.4 Decision Tree Classifier
Decision tree is a nonparametric supervised classification method, in the sense that the model infers decision rules to optimally classify instance vectors, by learning from the vectors themselves. Input feature space \(\mathcal{R}^{l}\) is sequentially partitioned so that instance vectors of the same class are grouped together, within the created subspaces (Figure 11). Partition is performed under a set of sequential ifthenelse decision rules. Each decision rule results in a split of the training data on a particular feature and corresponds to a decision node in the tree’s structure. Leaf nodes represent the tree’s final decisions/predictions, so that they correspond to either homogeneous subsets (including instances of the same class) if classes are linearly separable, or to the purest possible subsets otherwise. The first split is performed on the most informative feature in the given feature space and the corresponding node is the tree’s root node.
Subsequent splits are performed in a sequential manner, while the rule for a certain decision node is based on the subset of vectors that satisfy the criteria defined by its predecessor nodes. More specifically, for each feature and each possible feature value, the corresponding split is evaluated according to a specified criterion. The combination (feature, value) that optimizes the adopted criterion is selected to perform the split. The commonly used criteria to quantify the quality of each split are Gini Impurity and Entropy (information gain).
Gini Impurity measures the probability for a randomly chosen data vector to be wrongly classified. This actually equals the probability for any vector to be randomly chosen, times the probability for it to be misclassified, summed over all possible classes (two in our case).
Entropy is similarly defined as
where the information gain associated to an arbitrary event with probability of occurrence \(p_{i}\) is
Minimization of either Gini impurity or entropy results in more uniform (pure) nodes (Figure 12).
Hyperparameters used to modify the decision tree classifier in the scikit learn implementation include:

i)
split criterion: Gini impurity, information gain,

ii)
minimum number of samples in a leaf node,

iii)
minimum number of samples required to perform a split,

iv)
maximum depth of the tree, indicating the maximum number of decisions/splits to be performed,

v)
class weight: to control the strength of penalization applied to the wrong prediction of positive and negative class events.
Decision Trees produce whitebox models, as their interpretation and visualization is straightforward. Nonetheless, the creation of overcomplex trees that cannot generalize well to new data is always possible. Overfitting is avoided by finding the tree with the minimum possible depth that can solve a particular classification problem. This purpose can be met by finetuning the maximum allowed depth and the minimum number of samples required for the tree to make decisions, to relatively low and high values, respectively. Overfitting can also be tackled a posteriori, that is, after fitting the tree to the training data. Postpruning is applied to achieve the best possible balance of tree complexity and misclassification cost.
1.5 A.5 Random Forest Classifier
Random Forest is formed as an ensemble of multiple decision trees, trained on different parts of the dataset. The bootstrap aggregating technique is used, to extract random samples with replacement and fit trees to these samples, using a random subset of features (\(\sim\sqrt {n_{features}}\)). Prediction of unseen samples is performed by extracting the “majority vote” of the predictions made by individual trees. This technique reduces variance without increasing the bias, since individual trees are independently trained.
Hyperparameters of the Random Forest classifier in the scikit learn implementation include:

i)
split criterion, to be used in individual trees (estimators),

ii)
minimum number of samples in estimator leaf node,

iii)
minimum number of samples required to perform a split in estimator,

iv)
maximum depth of estimators,

v)
class weight ⇒ to control the strength of penalization applied to the wrong prediction of positive and negative class events,

vi)
total number of estimators to be constructed,

vii)
maximum number of features to be utilized in constructing individual estimators.
1.6 A.6 Extremely Randomized Trees (Extra Trees)
Extremely randomized trees are also ensembles of multiple decision trees. In contrast to random forest, each tree in Extra trees is fit on the whole training dataset. Decision rules are established in each tree using a randomly selected cut point, instead of searching for the one that optimizes a split criterion, resulting in reduced correlation amongst trees.
Hyperparameters of the Extremely randomized trees classifier used in the present study are similar to those of the random forest classifier, described in the previous subsection.
1.7 A.7 Extreme Gradient Boosting (XGBoost)
Another classifier built as an ensemble of multiple decision trees is Extreme gradient boosting Chen and Guestrin (2016). XGBoost (XGB) trees are not independently but sequentially constructed, learning from previous trees’ mistakes through a clever penalization of trees. Newton boosting technique is employed, resulting in faster convergence to cost minima than gradient descent. Additionally, an extra randomization parameter is used to reduce correlation amongst trees, so that boosting results in reduced variance and bias.
Hyperparameters of the XGBoost classifier include:

i)
booster optimization algorithm to be used, mostly methods in the Newton family,

ii)
number of estimators to be constructed,

iii)
maximum depth of each estimator,

iv)
subsample of the training data to be used in individual estimators,

v)
gamma parameter: minimum loss reduction required to make a further partition on a leaf node of estimator, larger values correspond to more conservative algorithm,

vi)
lambda parameter: L2 regularization term on weights, larger values employ more conservative algorithms,

vii)
alpha: L1 regularization term on weights, larger values employ more conservative algorithms,

viii)
scale applied to positive weights: to control the strength of penalization applied to the wrong prediction of positive class events, relative to the one applied to the negative class.
Appendix B: Tabulated Information on Experimental Results
Appendix C: HyperParameter Spaces
3.1 C.1 Logistic Regression (LR)

i)
[penalty:] ‘l2’, ‘l1’

ii)
[solver:] ‘saga’, ‘liblinear’

iii)
[random_state:] 42

iv)
[C:] logspace(start=4, end=3, step=1)

v)
[max_iter:] 5000, 10000, 20000

vi)
[class_weight:] ‘balanced’, {0: 1, 1: 1}, {0: 1, 1: 10}, {0: 1, 1: 20}, {0: 1, 1: 50}, {0: 1, 1: 100}, {0: 1, 1: 150}, {0: 1, 1: 300}, {0: 1, 1: 400}, {0: 1, 1: 500}
3.2 C.2 Support Vector Machines with Linear Kernel (linSVM)

i)
[penalty:] ‘l2’, ‘l1’

ii)
[random_state:] 42

iii)
[C:] logspace(start=4, end=3, step=1)

iv)
[max_iter:] 5000, 10000, 20000

v)
[class_weight:] ‘balanced’, {0: 1, 1: 1}, {0: 1, 1: 10}, {0: 1, 1: 20}, {0: 1, 1: 50}, {0: 1, 1: 100}, {0: 1, 1: 150}, {0: 1, 1: 300}, {0: 1, 1: 400}, {0: 1, 1: 500}
3.3 C.3 Support Vector Machines with Nonlinear Kernel (SVM)

i)
[kernel:] ‘rbf’, ‘poly’, ‘sigmoid’

ii)
[C:] logspace(start=4, end=3, step=1)

iii)
[gamma:] ‘scale’, ‘auto’, 0.001, 0.01, 0.1, 1, 10

iv)
[degree:] 2, 3, 4

v)
[coef0:] 10, 1, 0.1, 0.01, 0.001, 0.0, 0.001, 0.01, 0.1, 1, 10

vi)
[max_iter:] 5000, 10000, 20000

vii)
[class_weight:] ‘balanced’, {0: 1, 1: 1}, {0: 1, 1: 10}, {0: 1, 1: 20}, {0: 1, 1: 50}, {0: 1, 1: 100}, {0: 1, 1: 150}, {0: 1, 1: 300}, {0: 1, 1: 400}, {0: 1, 1: 500}
3.4 C.4 Neural Network (NN)

i)
[hidden_layer_sizes:] (50,), (40,), (30,), (20,), (10,), (8,), (6,), (50, 2), (40, 2), (30, 2), (20, 2), (10, 2), (8, 2), (6, 2), (50, 3), (40, 3), (30, 3), (20, 3), (10, 3), (8, 3), (6, 3), (50, 4), (40, 4), (30, 4), (20, 4), (10, 4), (8, 4), (6, 4), (50, 20), (50, 10), (50, 5), (60,), (60, 20), (60, 10), (60, 5), (60, 4), (60, 3), (60, 2), (4,), (3,), (2,), (4, 4), (4, 2), (2, 1), (2, 2), (3, 1), (3, 2), (60, 30), (50, 25), (40, 20), (30, 15), (20, 10), (10, 5)

ii)
[learning_rate:] ‘constant’, ‘invscaling’, ‘adaptive’

iii)
[learning_rate_init:] 0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02, 0.03, 0.04, 0.05

iv)
[max_iter:] 5000, 10000, 20000

v)
[solver:] ‘adam’, ‘lbfgs’
3.5 C.5 Decision Tree (DT)

i)
[criterion:] ‘gini’, ‘entropy’

ii)
[min_samples_leaf:] range(start=2, end=100, step=2)

iii)
[min_samples_split:] range(start=5, end=130, step=5)

iv)
[max_depth:] 2, 4, 6, 8, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None

v)
[class_weight:] ‘balanced’, {0: 1, 1: 1}, {0: 1, 1: 10}, {0: 1, 1: 20}, {0: 1, 1: 50}, {0: 1, 1: 100}, {0: 1, 1: 150}, {0: 1, 1: 300}, {0: 1, 1: 400}, {0: 1, 1: 500}
3.6 C.6 Random Forest (RF)

i)
[criterion:] ‘gini’, ‘entropy’

ii)
[n_estimators:] 10, 20, 40, 60, 80, 100, 200, 400, 600, 800, 1000

iii)
[min_samples_leaf:] range(start=2, end=100, step=2)

iv)
[min_samples_split:] range(start=5, end=130, step=5)

v)
[max_depth:] 2, 4, 6, 8, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None

vi)
[max_features:] ‘auto’, ‘sqrt’, ‘log2’

vii)
[class_weight:] ‘balanced’, {0: 1, 1: 1}, {0: 1, 1: 10}, {0: 1, 1: 20}, {0: 1, 1: 50}, {0: 1, 1: 100}, {0: 1, 1: 150}, {0: 1, 1: 300}, {0: 1, 1: 400}, {0: 1, 1: 500}
3.7 C.7 Extremely Randomized Trees (XT)

i)
[random_state:] 42

ii)
[criterion:] ‘gini’, ‘entropy’

iii)
[n_estimators:] 10, 20, 40, 60, 80, 100, 200, 400, 600, 800, 1000

iv)
[min_samples_leaf:] range(start=2, end=100, step=2)

v)
[min_samples_split:] range(start=5, end=130, step=5)

vi)
[max_depth:] 2, 4, 6, 8, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None

vii)
[max_features:] ‘auto’, ‘sqrt’, ‘log2’, None

viii)
[class_weight:] ‘balanced’, {0: 1, 1: 1}, {0: 1, 1: 10}, {0: 1, 1: 20}, {0: 1, 1: 50}, {0: 1, 1: 100}, {0: 1, 1: 150}, {0: 1, 1: 300}, {0: 1, 1: 400}, {0: 1, 1: 500}
3.8 C.8 Extreme Gradient Boosting (XGB)

i)
[max_depth:] range(start=2, end=60, step=2)

ii)
[n_estimators:] 10, 20, 40, 60, 80, 100, 200, 400, 600, 800, 1000

iii)
[scale_pos_weight:] range(start=1, end=400, step=50)

iv)
[alpha:] 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10

v)
[gamma:] 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10

vi)
[lambda:] range(start=1, end=22, step=1)
Rights and permissions
About this article
Cite this article
Lavasa, E., Giannopoulos, G., Papaioannou, A. et al. Assessing the Predictability of Solar Energetic Particles with the Use of Machine Learning Techniques. Sol Phys 296, 107 (2021). https://doi.org/10.1007/s1120702101837x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s1120702101837x
Keywords
 Solar energetic particles
 Nowcasting
 Machine learning methods
 Forecasting