Validation of the CME Geomagnetic Forecast Alerts Under the COMESEP Alert System

Under the European Union 7th Framework Programme (EU FP7) project Coronal Mass Ejections and Solar Energetic Particles (COMESEP, http://comesep.aeronomy.be), an automated space weather alert system has been developed to forecast solar energetic particles (SEP) and coronal mass ejection (CME) risk levels at Earth. The COMESEP alert system uses the automated detection tool called Computer Aided CME Tracking (CACTus) to detect potentially threatening CMEs, a drag-based model (DBM) to predict their arrival, and a CME geoeffectiveness tool (CGFT) to predict their geomagnetic impact. Whenever CACTus detects a halo or partial halo CME and issues an alert, the DBM calculates its arrival time at Earth and the CGFT calculates its geomagnetic risk level. The geomagnetic risk level is calculated based on an estimation of the CME arrival probability and its likely geoeffectiveness, as well as an estimate of the geomagnetic storm duration. We present the evaluation of the CME risk level forecast with the COMESEP alert system based on a study of geoeffective CMEs observed during 2014. The validation of the forecast tool is made by comparing the forecasts with observations. In addition, we test the success rate of the automatic forecasts (without human intervention) against the forecasts with human intervention using advanced versions of the DBM and CGFT (independent tools available at the Hvar Observatory website, http://oh.geof.unizg.hr). The results indicate that the success rate of the forecast in its current form is unacceptably low for a realistic operation system. Human intervention improves the forecast, but the false-alarm rate remains unacceptably high. We discuss these results and their implications for possible improvement of the COMESEP alert system.


Introduction
Coronal mass ejections (CMEs) and their interplanetary counterparts (interplanetary coronal mass ejections, ICMEs) are regarded as the main drivers of space weather that cause the most intense geomagnetic storms (e.g. Gonzalez et al., 1998;Plunkett et al., 2001;Zhang et al., 2003;Koskinen and Huttunen, 2006;Echer et al., 2008;Yermolaev et al., 2012, and references therein). Therefore, one of the main aspects of space weather prediction is the estimation of the possible CME geomagnetic impact, i.e. its geoeffectiveness. Based on the widely accepted concept proposed by Dungey (1961), the enhanced CME geoeffectiveness is related to the effective reconnection with the geomagnetic field and therefore with the southern component of the ICME magnetic field, B s , and the corresponding y component of the convective electric field, E y = vḂ s (where v is the solar wind speed). This is confirmed with statistical studies based on in situ properties of ICMEs and indices of geomagnetic activity, such as the disturbance storm time index, Dst (e.g. Kane, 2005;Richardson and Cane, 2011;Verbanac et al., 2013, and references therein).
The present prediction scheme of the geomagnetic storm magnitude is generally reliable and depends on measured ICME in situ properties. The prediction of the Dst index is done in real time by various groups (e.g. Feldstein, 1992;Fenrich and Luhmann, 1998;O'Brien and McPherron, 2000) on the basis of inputs from the original formula of Burton, McPherron, and Russell (1975). However, in situ measurements of ICMEs are provided at the L1 Lagrangian point, i.e. about one hour before the start of the disturbance (for typical ICME speed), providing very limited "response time" (e.g. Koskinen and Huttunen, 2006;Richardson and Cane, 2011). Our current knowledge restricts us from predicting the crucial B s component of the ICME magnetic field at earlier times, e.g. from remote solar observations. Some studies have tried to compare the magnetic field of the ICME to its solar source region magnetic field in the initiation phase (e.g. Bothmer and Schwenn, 1994;Möstl et al., 2008). However, even if the original orientation of the magnetic field inside the CME were known in the initiation phase, the prediction of the B s component at Earth would be severely hampered by the fact that CMEs rotate while propagating (e.g. Vourlidas et al., 2011;Isavnin, Vourlidas, and Kilpua, 2013).
From the solar perspective, analysis of many years of solar images recorded by the Solar and Heliospheric Observatory (SOHO: Domingo, Fleck, and Poland, 1995) has revealed that fast full-halo CMEs associated with strong flares and originating close to the central meridian and low and mid-latitudes are potentially favourable candidates for producing strong geomagnetic storms (see, e.g., Zhang et al., 2003;Srivastava and Venkatakrishnan, 2004;Gopalswamy, Yashiro, and Akiyama, 2007;Zhang et al., 2007;Richardson and Cane, 2010;Dumbović et al., 2015, and references therein). The prediction schemes that rely on the remotely measured CME properties are found to be much less reliable when the interplanetary conditions are not taken into account (e.g. Srivastava, 2005;Valach et al., 2009;Kim et al., 2010;Uwamahoro, McKinnell, and Habarulema, 2012), however, the advantage is in the early warning. This type of advance forecasting requires the identification of key solar parameters that determine the geoeffectiveness of a CME and an empirical probabilistic model to estimate its geomagnetic impact. The development of the Coronal Mass Ejections and Solar Energetic Particles (COMESEP, http://comesep.aeronomy.be) alert system is a significant step in this direction. This operational space weather alert system has been running since the beginning of 2014, fully automatically, without human intervention, and among other things, it forecasts geomagnetic storms and performs risk analysis for selected user groups.
In the present article, we have analysed all the COMESEP alerts issued in 2014 with the objective to test the performance of the COMESEP forecasting tools, in particular, those implemented to forecast the geomagnetic impact of the geoeffective CMEs. Verification of the forecast of space weather mainly implies a quantitative and qualitative assessment of the performance of space weather prediction alerts, as has been done for terrestrial weather forecasting in the past. Recent efforts on various space weather forecast validations include those performed by, e.g., Berghmans et al. (2005), Balch (2008), Crown (2012), and Devos, Verbeeck, and Robbrecht (2014). In addition, the evaluation is continuously performed for several world-wide space weather prediction centres (e.g. the Space Weather Prediction Center, SWPC, in US, or Solar Influences Data Analysis Centre, SIDC, in Belgium) by the National Institute of Information and Communications Technology (NICT, http://seg-web.nict.go.jp/cgi-bin/forecast/eng_forecast_score.cgi). This type of analysis not only reveals the strength and the limitations of the tools used for forecasting, but also provides scope for advancing new and improved versions of tools to provide a better and accurate forecasting of space weather.

Description of Forecast Tools
The COMESEP alert system is the first fully automatic system for the detection of CMEs and solar flares, forecasting of CME arrival, and assessing their potentially hazardous impact (Crosby et al., 2012). It was developed within a European Union 7th Framework Programme (EU FP7) project Coronal Mass Ejections and Solar Energetic Particles (COME-SEP) and runs fully automatically, i.e. without human intervention. It consists of several interconnected tools, model or data based, that work together to automatically provide geomagnetic and solar energetic particles (SEP) radiation storm alerts. In general, the tools running under the COMESEP alert system can be divided into two categories: first-level producers (first-level tools) and tools that are both consumers and producers (second-level tools). First-level tools use near real-time data to monitor and automatically detect potentially hazardous events, based on which, they issue the alert. The alerts issued by first-level tools trigger second-level tools, which then, based on the input provided by first-level tools, produce their own alerts. The process is visualised in the flow diagram in Figure 1.
In this article we focus on the part of the COMESEP alert system that forecasts geomagnetic impact of CMEs several days in advance. It consists of two second-level tools: • The Drag-based Model (DGM) is a model for heliospheric propagation of CMEs, based on the assumption that the dominant force in the heliospheric dynamics of ICMEs is the magnetohydrodynamical equivalent of the aerodynamic drag. In its basic form (which is implemented in the COMESEP system), it provides the ICME Sun-Earth transit time, the arrival time, and the impact speed for a given set of input parameters (see Vršnak et al., 2013Vršnak et al., , 2014. The tool and its description are available at http://oh.geof.unizg.hr/DBM/dbm.php, as is the advanced form of the DBM, which provides this output for any target in the heliosphere and also takes the shape of the ICME Figure 1 Flow diagram of the tools used in the COMESEP alert system (for an explanation, see the main text).
into account by employing the so-called cone geometry (described in Žic, Vršnak, and Temmer, 2015). • The CME Geomagnetic Forecast Tool (CGFT) is an empirical statistical model that determines the CME geomagnetic risk level based on a probability estimation of the CME arrival and its likely geoeffectiveness (both are derived based on remote parameters of a CME and its associated flare). In addition, the estimated storm duration is based on the estimated geoeffectiveness and the month of the eruption, with the start time determined by the DBM ICME arrival. A more comprehensive description of the tool is given below.
Figure 1 shows that these second-level tools receive alerts from three first-level tools that detect a potentially hazardous (i.e. Earth-directed) CME and flare event: • The Computer Aided CME Tracking (CACTus) is a software package that autonomously detects CMEs (see, e.g., Berghmans, Foing, and Fleck, 2002;Robbrecht and Berghmans, 2004) in image sequences from the Large Angle Spectroscopic Coronagraph (LASCO: Brueckner et al., 1995) onboard SOHO, and whenever a halo or partial-halo CME is detected, it sends an alert to the COMESEP system. The catalogue of CMEs detected by CACTus and its description are available at http://sidc.be/cactus/. • The Solar Dimming and EUV Wave Monitor (Solar DEMON) is a software package that automatically and in real time detects solar flares using data from the Atmospheric Imaging Assembly (AIA: Lemen et al., 2012) onboard the Solar Dynamics Observatory (SDO: Pesnell, Thompson, and Chamberlin, 2012) and provides information such as location, time, and relative intensity (Kraaikamp and Verbeeck, 2015). The event catalogue of the tool and its description are available at http://solardemon.oma.be. • Flaremail is a tool that monitors GOES soft X-ray flux data in near real-time, and whenever an M-or X-class flare is detected, it sends an alert to the COMESEP system (http://sidc.oma.be/products/flaremail/).
To supplement the early geomagnetic impact forecast given by CGFT, there is also a tool that estimates the risk level of geomagnetic storm occurrence for the next 24 hours (Geo-mag24, http://www.spaceweather.space.dtu-dk/forskning/Geomag24). Geomag24 takes into account recent alerts issued by the DBM and CGFT, observations of the last months of geomagnetic activity, and solar wind observations combined with estimates of the background solar wind speed and coronal hole area (to include the risk of high-speed solar wind streams), as well as current in situ solar wind and geomagnetic data to estimate the risk of a geomagnetic storm for the next 24 hours. In addition, there are supplementary tools for SEP forecast. The first-level producer called the ground-level enhancement tool (GLE alert, available at http://cosray.phys.uoa.gr/index.php/glealertplus) monitors possible GLE events and upon detection issues an alert to the COMESEP system, and the SEP forecast tool predicts the probability and level for a radiation storm with proton energies > 10 MeV and > 60 MeV resulting from a flare. It primarily uses an input from Flaremail, but when available, it also uses input from Solar DEMON, CACTus, and GLE alert. Since the Geomag24 and SEP forecast tools are not the focus of this study, further details are not given, but they can be found at the COMESEP webpage (http://comesep.aeronomy.be).
We focus our analysis on CGFT alerts that are triggered by DBM and receive input from CACTus, Solar DEMON, and Flaremail (when available). CGFT can be divided into three modules: module I, which estimates the probability of CME arrival, module II, which estimates CME geo effectiveness, and module III, which estimates the storm duration based on the estimated geoeffectiveness and the month of the eruption (based on an analysis similar to Yokoyama and Kamide, 1997). In this study we analyse and estimate the correctness of the forecast of the geomagnetic impact of the CME and not the estimated storm duration, i.e. we analyse the forecast capability of only the first two CGFT modules.
CGFT module I: The probability of a CME arrival is estimated through a statistical model that relates the flare source position with a potential CME arrival at Earth. Each bin of longitude is associated with a probability of arrival. The probability of arrival is categorised into the following levels: very unlikely (0 -10%), unlikely (10 -40%), possible (40 -70%), likely (70 -90%), and very likely (90 -100%). The arrival estimation of CMEs is based on the position of the source location, i.e. bin of longitude: very unlikely (< −60 or > 60 deg), unlikely (−60 to −30 or 30 to 60 deg), possible (−30 to −10 or 10 to 30 deg), and very likely (−10 to 10 deg). The source position is known for an association with a detected flare of at least X-ray class M. In case of absence of an associated flare, the likelihood of arrival is set to "possible". This statistical model is based on the general observation that CMEs originating closer to the centre of the solar disc are more likely to arrive at Earth (e.g., Gopalswamy et al., 2000), but is not quantified based on a robust statistical measure. Therefore, the ranges of values for specific longitudinal bins, as well as the corresponding probabilities, are chosen in a somewhat arbitrary way.
CGFT module II: The geoeffectiveness of a CME is related to the disturbance storm time index, Dst, and is determined by a statistical model relating each single solar parameter to the absolute value of Dst, |Dst|. The full set of parameters that can be used by this tool includes the CME apparent width and speed, the associated flare soft X-ray class and source position, and the CME-CME interaction parameter (for details see Dumbović et al., 2015). Since in a fully automatic system the calculation of the CME-CME interaction parameter is not trivial, this parameter is not used in the currently operating COMESEP alert system. In addition, the tool can work with only a partial set of available parameters (e.g. only with CME speed). The probabilities for each parameter are combined to one probability for each |Dst| bin (for details see Dumbović et al., 2015). The probability distribution across the |Dst| bins is converted into the estimate of one single |Dst| bin, using specific thresholds on calculated probabilities, P . For example, the set of thresholds [0, 0.20, 0.12, 0.10] contains a threshold 0 on P (|Dst| < 100), 0.20 on P (100 < |Dst| < 200), 0.12 on P (200 < |Dst| < 300), and 0.10 on P (300 < |Dst| < 400). These steps are applied in this specific order to assign the estimated impact to a single |Dst| bin. When a condition is fulfilled, that |Dst| bin is chosen and the process is stopped. The thresholds are as follows: P (300 < |Dst| < 400) > 0.10 → 300 < |Dst| < 400, P (200 < |Dst| < 300) > 0.12 → 200 < |Dst| < 300, P (100 < |Dst| < 200) > 0.20 → 100 < |Dst| < 200, and P (50 < |Dst| < 100) > 0 → 50 < |Dst| < 100.
For each combination of arrival probability and storm level, a risk level is defined according to the CME geomagnetic risk matrix (Figure 2). There are four possible risk levels that provide an indication of the severity of the alerts sent: extreme risk (red), high risk (orange), moderate or medium risk (yellow), and low risk (green). The CGFT issues an alert when a risk level is estimated to be greater than "low". The process is started by a CME alert from CACTus that is sent to both the CGFT and DBM. The DBM calculates the arrival time of the CME at Earth and triggers the CGFT, but the CGFT does not use DBM parameters as an input. The CGFT receives the CME parameter width and speed from CACTus, and when the CME is associated with a flare, the flare strength is provided by Flaremail and the position of the source region of the flare from Solar DEMON. The CGFT modules I and II calculate arrival probability and geoeffectiveness levels, respectively, and estimate the risk of a geomagnetic storm, based on the CME geomagnetic risk matrix shown in Figure 2. If the estimated risk is higher than "low", the CGFT issues an alert.

Data and Method
We compiled a list of all alerts issued by COMESEP/CGFT in 2014 (a total of 72 alerts). For each event, COMESEP/DBM issued a forecast of CME arrival time and speed, and COMESEP/CGFT issued an alert with geomagnetic storm risk level based on arrival probability and estimated storm level. We cross-checked the list with that of Richardson and Cane (2010), which is continuously updated at http://www.srl.caltech.edu/ACE/ASC/DATA/level3/ icmetable2.htm, and with in situ measurements by the Magnetic Field Investigation (MFI: Lepping et al., 1995) and Solar Wind Experiment (SWE: Ogilvie et al., 1995) instruments onboard the Wind spacecraft in a time frame of ± 24 hours around the forecast DBM arrival time. In time periods when Wind observations were unavailable, we used the Advanced Composition Explorer (ACE: Stone et al., 1998) spacecraft measurements. In addition, we checked the Dst levels (available at http://wdc.kugi.kyoto-u.ac.jp/dst_provisional/index.html) around the observed ICME arrival times (when the ICME arrival is observed), or around forecast ICME arrivals when the ICME arrival is not observed. In this way, for each issued alert, we obtained information whether it was correctly forecast. In order to make a full evaluation of the forecasts we also needed the information on geomagnetic storms which occurred, but were not forecast. In this way, we compiled a list of all geomagnetic storms with Dst < −90 nT caused by ICMEs in 2014. Although the Dst threshold in the CGFT module II for relevant geomagnetic activity is −100 nT, here we allow a −10 nT error in the determination of the geoeffectiveness level and set −90 nT as a threshold value for the relevant geomagnetic activity. We found that in 2014 there were only three geomagnetic storms with Dst < −90 nT, a −116 nT storm on 19 February, a −91 nT storm on 20 February, and a −94 nT storm on 28 February. All three storms were forecast by COMESEP/CGFT, Table 1 Event statistics for 2014 for a) COMESEP alert system (without human intervention), and b) COMESEP + human scheme (with human intervention). CGeFT stands for CME Geoeffectiveness Forecast Tool, an independent, more recent version of the module II of CGFT (for an explanation, see the main text). Number of CGeFT alerts with |Dst| < −300 nT 3 although the actual estimated |Dst| level was not correct in all three events (in two events the COMESEP/CGFT module II calculated |Dst| < 100 nT, but because of module I calculation and application of the risk matrix, alerts were issued). Finally, we compiled a list of all CACTus CMEs with angular width larger than 120 deg to determine the number of potential alerts that could have been issued, but were not issued because the risk level was calculated to be "low". The corresponding numbers are given in Table 1. For each COMESEP/CGFT alert we checked what the forecast would be with human intervention, i.e. if upon receiving the alert from CACTus, the forecaster would use the most recent and advanced version of DBM and CGFT together with all available observations (e.g. measurements from the Solar-Terrestrial Relations Observatory, STEREO: Kaiser et al., 2008). For this purpose, we used a number of available online event catalogues under the assumption that the same information would be obtained by forecasters using available satellite observation. The SOHO/LASCO CME catalogue (Yashiro et al., 2004, http://cdaw.gsfc.nasa.gov/CME_list/) was used as an observer catalogue of CMEs and their parameters (plane-of-sky speed, apparent angular width). In addition, we used the COR1 CME catalogue from the two STEREO spacecraft (available at http://cor1.gsfc.nasa.gov/catalog/) to eliminate backside events (in 2014 both STEREO spacecraft had a field of view on the back side of the Sun) and movies from the Sun Earth Connection Coronal and Heliospheric Investigation (SEC-CHI) suite onboard STEREO (available at http://secchi.nrl.navy.mil/index.php?p=js_secchi) for time periods where the COR1 CME catalogue had no entries. In order to estimate the source region of the CME, we used the Solar DEMON flare catalogue (available at http://solardemon.oma.be/science/flares.php?min_seq=1&min_flux_est=0.000000001& days=0&science=1), the NOAA GOES flare list (available at http://www.ngdc.noaa.gov/stp/ space-weather/solar-data/solar-features/solar-flares/x-rays/goes/xrs/), and the SDO/AIA fil-ament eruption list (available at http://aia.cfa.harvard.edu/filament/). These catalogues provided information needed to forecast whether a CME was expected to arrive at Earth, together with its arrival and geoeffectiveness level. To forecast the arrival of CMEs, an advanced version of the DBM was used (ADBM, available at http://oh.geof.unizg.hr/DBM/ dbm.php, for details see Žic, Vršnak, and Temmer, 2015, or the same webpage under "Documentation"), and for the geoeffectiveness level, we used a more recent version of the CGFT module II (CGeFT, available at http://oh.geof.unizg.hr/CGEFT/cgeft.php, for details see the same webpage under "Documentation"). It should be noted that CGeFT makes a rougher estimation of the |Dst| level than CGFT module II with the aim to be more reliable (possible |Dst| levels are < 100 nT, 100 -200 nT, and > 200 nT). The number of alerts in this COMESEP + human scheme is reduced to 66 because the CGeFT also takes the CME-CME interaction parameter into account (which is determined by the observer, in this case by M. Dumbović). In several cases, we found that COMESEP/CGFT issued two alerts for two separate CMEs, whereas the observer would issue only one alert because the two detected CMEs are likely or very likely to interact and result in one storm event. In total we found that out of these 66 events, 13 were likely or very likely involved in a CME-CME interaction. Furthermore, out of these 66 alerts, four were found to be CACTus false alerts (reported CMEs were not found on either the revised level 2 CACTus or SOHO/LASCO CME lists), and 29 were found to be DBM false alerts (either backside events or the advanced version of DBM, which takes into account CME geometry, calculated that the CME will miss Earth). Therefore, in this COMESEP + human scheme there were in fact only 33 alerts that would be triggered using CGeFT. The corresponding numbers for COMESEP + human scheme are given in Table 1.
We evaluated the forecast by comparing the predicted value with the observed value using verification measures for binary events, i.e. events with two possible outcomes. Each binary event can have only two possible forecast outcomes (it was either forecast or not) and two possible observed outcomes (it was either observed or not), which were then combined into four possible outcomes: a hit (forecast and observed), a miss (not forecast, but observed), a false alarm (forecast, but not observed), and a correct rejection (not forecast and not observed). For a list of events (i.e. an evaluation sample), the number of hits, misses, false alarms, and correct rejections can be represented in a contingency table for a binary event, which is then used to calculate verification measures (see, e.g., Devos, Verbeeck, and Robbrecht, 2014;Dumbović, Vršnak, andČalogović, 2016). As verification measures we used the probability of detection (POD), the false-alarm ratio (FAR), bias (BIAS), and Heidke skill score (HSS), where each gives only a partial information on the quality of forecast. The POD measures the success rate (regardless of the number of false alarms), the FAR measures the false-alarm rate (regardless of the success rate), BIAS measures the ratio of the frequency of forecasts to the frequency of observations (i.e. whether the system is more inclined to forecast more or fewer events), and the HSS estimates the accuracy of the forecast relative to that of random chance (for details see, e.g., Devos, Verbeeck, and Robbrecht, 2014;Dumbović, Vršnak, andČalogović, 2016).
Since CGFT makes a forecast of |Dst| in four different ranges, a dichotomisation must be applied, i.e. events need to be separated into two classifications in order to produce a contingency table for a binary event. A geomagnetic storm can be regarded as a binary event by defining a specific threshold -it occurs or it does not occur, where for the CGFT the value of the threshold can be taken as the CGFT threshold for relevant geoeffectiveness |Dst| > 100 nT, i.e. |Dst| > 90 nT when we allow ± 10 nT error in the observation (for simplicity and clarity reasons we refer to this threshold as |Dst| > 100 nT). With this dichotomisation, a hit is defined as an event where |Dst| > 100 nT was both forecast and observed, a miss is an event that was forecast as |Dst| < 100 nT, but observed as |Dst| > 100 nT, a false alarm is an event forecast as |Dst| > 100 nT, but observed as |Dst| < 100 nT, and correct rejection is an event where |Dst| < 100 nT was both forecast and observed. A contingency table for a binary event based on this dichotomisation is given in Table 2. It should be noted that this dichotomisation approach does not discern between events with |Dst| > 100 nT, which have |Dst| in different ranges (e.g. 100 nT < |Dst| < 200 nT and |Dst| > 200 nT), but since in the chosen time period only three |Dst| > 100 nT events were observed, it is the most suitable one.

Evaluation Results and Discussion
The verification measures for COMESEP/CGFT alerts (without human intervention) and CGeFT alerts, i.e. COMESEP + human scheme (with human intervention), are shown in Figure 3. The figure shows that the POD for the COMESEP + human scheme is much higher (two times) than that of COMESEP/CGFT without human intervention. However, the FAR (given in Figure 3b) is quite high in both cases, and for COMESEP/CGFT without human intervention, it is very close to the value of the worst forecast. This indicates that the system (regardless of the human intervention) has a tendency to forecast more events and issue false alerts, which is supported by the result for BIAS, which is much larger than 1 in both cases (see Figure 3c). Finally, Figure 3d shows the HSS, which is almost 0 for the COMESEP/CGFT system without human intervention, but improves to 0.25 with human intervention. These results indicate that without human intervention the forecast of the system is not much better than a random guess, but shows some skill when a human observer intervenes. The evaluation results for the CGFT module II using verification measures for the binary events reveal many shortcomings. While the automatic system shows some ability to detect geomagnetic storms (as seen in POD), the false-alarm rate is unacceptably high for a realistic operation system, resulting in an overall skill that is not better than a random guess (as seen in HSS). The forecast skill somewhat improves with human intervention, but even with human intervention, the FAR remains unacceptably high. The reasons for this poor early prediction probably lie in the fact that the system 1) does not use "real" CME input values, but projected ones, 2) does not incorporate CME propagation and evolution effects in the calculation, and most notably 3) cannot distinguish between different magnetic field orientations in CMEs.
In addition, several other facts have to be taken into consideration. Firstly, evaluation measures were calculated for module II, which estimates the possible storm level, while CGFT also takes the probability of arrival into consideration in issuing alerts. Table 1 shows that out of 72 issued CGFT alerts in 47 cases, the CGFT module II estimated a low storm risk with |Dst| < 100 nT. Based on the risk matrix given in Figure 2, this means that in these cases, the alert was issued solely because the probability of the arrival was high. Out of 47 cases of low storm activity calculated by the CGFT module II, in 45 |Dst| < 100 nT was indeed observed, i.e. in these cases no alert was necessary. We therefore conclude that including the arrival probability to determine the CME risk can lead to a large number of unnecessary alerts, and it might be useful for the COMESEP alert system to revise the CGFT CME risk matrix. Secondly, the evaluation was made for 2014, where the geomagnetic activity was quite low, since there were only three major geomagnetic storms and none of them was stronger than |Dst| > 200 nT. The system should also be tested for periods with higher geomagnetic activity to verify its prediction capability for such severe geomagnetic storms.
On the other hand, 2014 corresponds to the second peak of Solar Cycle 24, which had a quite high CME activity. Table 1 shows that CACTus detected 1855 CMEs in 2014, and of these, 98 were halo and partial-halo CMEs, which are regarded as potentially threatening, but only for 72 of them alerts were issued. In addition, the CGFT module II determined that only 25 out of 72 events would be relevantly geoeffective, whereas with human intervention this number is additionally reduced to only 11 forecasts. Therefore, we conclude that the automatic COMESEP system has a high success rate regarding correct rejections. This is the most important aspect of the system, which is basically fed with a huge amount of data, out of which it is supposed to resolve which data or events need to be discarded.
Finally, our results show that with human intervention, the forecast is somewhat improved, but it should be taken into account that in this analysis the observer (M. Dumbović) had STEREO spacecraft observations available, which could more easily identify backside events. For this specific time period (2014), the view of the back side of the Sun was possible because of the STEREO spacecraft location, which might not be the case in the future, when the observer will have to rely on the same observation used by the COMESEP alert system. In addition, we used the more recent and advanced version of the tool. Furthermore, human intervention resulted in a much better input for the CGFT, where CMEs could be associated with flares with soft X-ray classes lower than M, and the CME-CME interaction parameter could also be derived. This strongly suggests that the COMESEP/CGFT forecast could be improved in the future by advancing the input it receives and by implementing more recent and advanced tools.

Summary and Conclusion
We evaluated the performance of the fully automatic COMESEP/CGFT tool for issuing alerts on geoeffective and possibly hazardous CMEs. The evaluation revealed many shortcomings of the tool, such as an exceedingly high false-alarm rate and low forecast skill, which is not much better than the random forecast. In addition, because it is combined with the CME arrival probability in order to assess the risk, the system quite often issues unnecessary alerts. The main value of the system (in its current state) lies in its ability to refine the number of potentially hazardous CMEs, as it shows a relatively good correct rejection rate, which additionally improves with human intervention. In general, we find that the CME geoeffectiveness estimation is improved with human intervention, primarily because the human observer is not limited to the observation tools implemented in the system and thus uses a more reliable and extensive input, as well as advanced tools. However, even with human intervention, the false-alarm rate is unacceptably high for a realistic operation system. It should be noted, however, that the performance of the system is yet to be tested for a time period where the geomagnetic activity is much higher. The system should be improved for possible future applications by using the most reliable possible input and incorporating CME propagation and evolution effects, but most notably by including the CME magnetic field orientation as one of the most important aspects of the geomagnetic storm forecast. Since the system is designed such that different tools communicate with each other, it could be easily upgraded by implementing new tools and more recent versions of tools that are already implemented into the system. This and the fact that it is fully automatic and works completely without human intervention makes it a promising starting point for a future operation system for space weather early prediction.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.