Improvements to the Empirical Solar Wind Forecast (ESWF) model

The empirical solar wind forecast (ESWF) model in its current version 2.0 runs as a space-safety service in the frame of ESA’s Heliospheric Weather Expert Service Centre. The ESWF model forecasts the solar-wind speed at Earth with a lead time of 4 days. The algorithm uses an empirical relation found between the area of solar coronal-holes (CHs), as observed in EUV within a 15∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$15^{\circ}$\end{document} meridional slice, and the in-situ measured solar-wind speed at 1 AU. This relation however, forecasts Gaussian type speed profiles, as the CH rotates in and out of the meridional slice, causing some discrepancy in the timing between forecasted and observed solar-wind speed. With adaptations to the ESWF 2.0 algorithm we improve the precision and accuracy of the ESWF speed profiles. For that we implement compression and rarefaction effects occurring between solar-wind streams of different velocities in interplanetary space. By considering the propagation times for plasma parcels between the Sun and Earth and their interactions, we achieve the asymmetrical shape of the speed profile that is characteristic of high-speed streams (HSS). By further implementing CH segmentation, co-latitude information and dynamic thresholding, we find that the newly developed ESWF 3.2 performs significantly better than ESWF 2.0. For a sample of 294 different HSSs, we derive a relative increase in hits of the timing and peak velocity by 13.9%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$13.9\%$\end{document}. The Pearson correlation coefficient increases by 14.3%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$14.3\%$\end{document} from 0.35 to 0.40.


Introduction
The solar surface covers a mix of open and closed magnetic field structures. Predominantly "open" magnetic fields, i.e. closing far in interplanetary space, are observed within coronalhole (CH) regions. In extreme-ultraviolet (EUV) radiation, these regions appear as dark D. Milošić daniel.milosic@uni-graz.at (Jarolim et al., 2021). In addition, within the ESWF EUV image data processing pipeline a dynamic thresholding algorithm has been developed using the CATCH tool (Heinemann et al., 2019).
In general, the improvement of ESWF can be measured against results from statistical studies. Reiss et al. (2016) compared the uncertainties in the arrival time and speed between ESWF 2.0 and the semi-empirical Wang-Sheeley-Arge model (WSA; Wang and Sheeley, 1990;Arge and Pizzo, 2000;Arge et al., 2003) and found typical uncertainties in the range of ± 48 hours and ± 100 km s −1 . From a variation of persistence solar-wind models (assuming that the stream does not change in time; see e.g., Owens et al., 2008;Temmer, Hinterreiter, and Reiss, 2018), the uncertainties are within a similar range of 80 -100 km s −1 and 1 -2 days off. Recent statistics assessing the performance of numerical magnetohydrodynamic (MHD) models, such as EUHFORIA by Hinterreiter et al. (2019), derived an RMSE of 125 km s −1 . Other papers investigating the solar-wind model performance report speed uncertainties in the range of 100 -150 km s −1 and for the arrival time up to 3 days (MacNeice, 2009;Jian et al., 2015). Moreover, we also know that the performance of MHD models decreases significantly for periods of high solar activity (Gressl et al., 2014). In that respect, we also will have a closer look at the ESWF performance stability of the new developments and improvements over the solar cycle.
The investigation of the solar-wind flow is essential when studying space weather, as the background solar-wind is the backbone of CME propagation models. The improvement of space weather related models is of high interest and needs basic research combined with applications (e.g., Temmer, 2021). The accurate modeling of the solar-wind flow in interplanetary space is identified as a crucial parameter for CME propagation models, e.g., in the COSPAR Roadmap (Schrijver et al., 2015). Though many studies were outlined on that subject, there was no clear improvement and, hence, still there is a requirement of a more accurate solar-wind flow model, as mentioned in the 2023 COSPAR Roadmap update under the iSWAT initiative.

Methods
ESWF is based on the ratio of CH areas that appear within a slice of ±7.5 degrees around the central region of the Sun (depicted in Figure 1). Using the empirical CH area -SW speed relation (Vršnak, Temmer, and Veronig, 2007), the ratio of CH areas can be related to the solar-wind speed measured at 1 AU about four days later. As the CH rotates with the Sun, the fractional area begins to rise and reaches its peak approximately when the center of mass of the CH is on the central meridian. This process then reverses as the CH moves out of the slice again. The larger the area detected within that slice, the higher the solar-wind speed forecast. The basic ESWF version is calculated by where v is the solar-wind velocity for the distance at 1 AU, c 0 and c 1 (in units of km s −1 ) are the coefficients of the linear relation, A is the fractional CH area (relative CH area measured within the meridional slice), and t − t * is the retarded time with the time-lag, t * , in days.
Since the establishment of the empirical CH area -SW speed relation, different development steps were performed. ESWF 1.0 and 2.0 assume a constant time lag of four days (t * = 4d). With ESWF 2.0, c 0 and c 1 have been recalculated to derive an adaptive method for the varying low speed component of the background solar-wind, as described in Rotter et al. (2015) and also used in Reiss et al. (2016, their Equation 2).

Figure 1
Cartoon showing the symmetric (Gaussian-like) area-time relation for an idealized CH rotating into the measurement slice at the central part of the Sun and exiting it again. As the CH moves to the central position, the maximum area, hence, forecasted solar-wind speed at 1 AU is reached. The red line at the Western part illustrates the maximum compression, that is the stream interface region. The compression and rarefaction are not covered by the symmetric area profile as generated by ESWF 2.0.
With these developments, the ESWF versions 1.0 and 2.0 reveal a rather symmetric profile in the solar-wind speed forecast (see Figure 1), leading to delayed peak forecasts. To improve, compression and rarefaction effects due to the interaction between slow and highspeed solar-wind streams appearing on the West part of the CH need to be considered. In addition, we aim to implement recent results derived from advanced analyses of CHs, such as: a) dynamic intensity thresholds for the CH segmentation (Heinemann et al., 2019) and b) the dependence of the HSS peak velocity to the co-latitude of the CH location ) -both useful for better covering solar cycle variations.
In the following, we present for each aspect: 1) compression and rarefaction, 2) dynamic thresholding, and 3) co-latitude, the development steps performed and summarize here the different versions of the ESWF, which will be described.
• ESWF 1.0 -constant time lag of 4 days; constant c 0 and c 1 ; described in Vršnak, Temmer, and Veronig (2007); • ESWF 2.0 -constant time lag of 4 days; variable c 0 and c 1 ; described in Rotter et al. (2015), Reiss et al. (2016); • ESWF 3.0 -PROP; adds non-constant time lag to mimic compression effects; each plasma parcel propagates to 1 AU with the predicted velocity; variable c 0 and c 1 ; • ESWF 3.1 -PILE; similar to PROP adding non-constant time lag to mimic compression effects; each plasma parcel propagates to 1 AU with the mean value of the predicted velocity and a constant background velocity of 400 km s −1 ; variable c 0 and c 1 ; • ESWF 3.2 -PILE; CH area extraction with dynamic thresholding; co-latitude information; variable c 0 and c 1 .

Compression and Rarefaction
The Gaussian profile, as illustrated in Figure 1, shows the symmetric speed profile stemming from the CH entering and exiting the slice over which the ratio of the area is calculated. This, however, is not capable of mimicking compression effects. To better match the steepening of the HSS as it encounters the slow solar-wind flow ahead, two different algorithms are developed. The propagation-algorithm (PROP) takes each data point and predicts its arrival time based on its velocity from Equation 1 and the Sun-Earth distance. These data are then shifted to the calculated arrival time. The time-lag, t * , is now calculated as In order to avoid plasma parcels overtaking others, the affected points are compressed using a linear scale. Using the PROP algorithm, the stream interaction is only modeled in the way that faster streams displace slower ones in the time series.
The pile-up-algorithm (PILE) works in the same way, but the arrival time is calculated slightly differently, taking into account a simple model for the stream interaction. Instead of solely basing the propagation on the velocity value of each individual data point, the PILE algorithm takes the mean value of the data point and the background solar wind, given by The background solar-wind speed is assumed to be constant at 400 km s −1 , which corresponds roughly to the mean background solar-wind speed in our time interval of interest (2012 -2021). The assumption is that an HSS engages in an inelastic collision with the surrounding matter, conserving momentum (see also Hofmeister et al., 2022). The simple calculation of the mean implies another assumption, namely that the density distribution close to the Sun is uniform.
In Figure 2 we show the resulting synthetic velocity profiles for the different algorithms, which represent each a typical profile produced by the ESWF having a background velocity (c 0 ) of 350 km s −1 . We see that indeed, in comparison to the Gaussian profile, both algorithms produce an asymmetrical profile, like we would expect from an average HSS (Geyer et al., 2021;Dumbović et al., 2022). The PROP algorithm produces a very steep rise of the curve and also shifts the peak to earlier arrival times than the PILE algorithm. Although producing a similar rise, the PILE algorithm is more gentle, being slowed down by the slow solar-wind ahead and thus mimicking the interaction region between the streams.

Dynamic Thresholding
Using image data (193 Å) from the Atmospheric Imaging Assembly (AIA: Lemen et al., 2012) on the Solar Dynamics Observatory (SDO), Heinemann et al. (2019) found that, for optimal extraction of CH areas, the threshold intensity is dependent on the phase of the solar cycle. Thus, a constant threshold intensity as a percentage of the median intensity of the solar disk is not suitable for longer periods. Whereas the ESWF 2.0 still uses 35% of the median intensity as a constant threshold of the solar disk, we here introduce the dynamic thresholding technique derived from the statistical analysis of the CATCH coronalhole catalog (Heinemann et al., 2019): where I m is the median intensity and TH is the threshold intensity in DN (Data Number).

Co-latitude Dependence
Hofmeister et al. (2018) derived, from a data set of well-observed low-latitude CHs during the period 2010 -2017, a relation between the maximum solar-wind speed measured on Earth and the co-latitude of the center of mass of the CH and its area. To implement that, the meridional slice (±7.5°) is cut horizontally into 12 evenly distributed segments before CH area extraction, each covering a latitudinal range of 15 • . The left panel of Figure 3 exemplarily shows an EUV image recorded by the AIA/SDO 193 Å filter, along with the meridional slice divided into 12 latitudinal segments. Each segment is assigned a fractional CH area with respect to the whole slice area. Then these values are given weights according to the co-latitude to Earth. Above a co-latitude of 60 • (in both hemispheres), the weights become zero. At the other ranges, they are weighted with a function depending on the cosine of the co-latitude. The right panel in Figure 3 shows the weights that are applied to the fractional CH area, which is shown in the middle panel of Figure 3. The EUV image was taken on February 12, 2015, at which the B 0 angle was negative with −6.7°, meaning the northern hemisphere of the Sun was tilted away from the observer. Thus, close to B 0 , that is for the lower latitudes of the Southern Hemisphere, the CH areas are weighted more.

Statistical Analysis
To test the various new implementations and resulting ESWF versions 3.0 -3.2, we use a statistical sample of 294 well-observed high-speed streams over the time range 2012 -2021. The list of HSSs was manually selected, compiled and investigated by Geyer et al. (2021) with enhancements by Koller et al. (2022). The given times were extracted from the solarwind measurements such that each HSS is embedded in a time period of seven days, with the velocity peak appearing 3 days after the start of that time period. For the comparison between different ESWF versions, we focus on the parameters arrival time and arrival speed of the HSS at 1 AU. We perform hit & miss statistics where we use the following criteria. As a hit we define that the modeled HSS peak speed is within a range of ± 100 km s −1 and that the arrival time of that peak is within a range of ± 48 hours. In addition, we also define a separate velocity and time hit, which gives us information whether one parameter performs better than the other. Modeled profiles that do not meet these criteria are defined as miss. These criteria are consistent with RMSE values derived from earlier studies including MHD models (Owens et al., 2008;MacNeice, 2009). As we compare a specific sample of data with verified events of HSSs, we do not cover false positives or false negatives. In that respect, we investigate the improvement of the hit statistics for the different model development steps applying the specified value ranges. Furthermore, we calculate for the arrival time and speed the mean absolute error (MAE) given by and the root mean squared error (RMSE) given by Here n is the amount of data points, v model i is the ith HSS peak velocity extracted from the model, and v ref i is the ith HSS peak velocity from the in-situ data. For estimating the match of the entire modeled HSS profile, we derive the Pearson correlation coefficient (median value of the 294 correlation coefficients derived) and the Euclidean distance (sum of differences between each point of a time series, normalized for one week; median value for the 294 time series). The aforementioned statistical tools were chosen with the intent to be comparable to studies of different models and previous studies on the ESWF (see, e.g. Rotter et al., 2015;Reiss et al., 2016). In addition, we make use of dynamic time warping (DTW). Recently the usage of DTW in quantifying the performance of solar-wind models has been explored (Samara et al., 2022a). Here, we use DTW in a simple form, by calculating the normalized DTW cost for each HSS and then comparing the median values. DTW is especially useful for our analysis, since one of the main objectives is to properly model the shape of an HSS, and DTW can give us information about the similarity of two curves. The DTW cost matrix is defined as with d ij the Euclidean distance, which is defined as After constructing an array containing the cost between each two elements of the time series we find the path with the minimum Euclidean distances between adjacent elements. The sum of all Euclidean distances along the optimal path is then normalized by the amount of data points per time series. For implementation we use the python package openly provided by Giorgino (2009); see also Niennattrakul, Ruengronghirunya, and Ratanamahatana (2009) and Samara et al. (2022a).
In addition to performing these methods on the entire data set (2012 -2021), we split the data set into two subsamples, considering periods of high (110 HSSs) and low solar activity (184 HSSs) separately. The year 2016 is used as separation time between high and low activity and reveals half the peak-to-peak difference between the Solar Cycle 24 maximum phase and the minimum towards the maximum phase of Solar Cycle 25.

Results
We performed the statistical analysis described in Section 3 on the output of each ESWF version (1.0, 2.0, 3.0, 3.1, 3.2). The main results are shown in Figure 4 and summarized in Table 1.
Panel (a) of Figure 4 shows the percentage of HSS that meet the criteria of a hit in the forecast in terms of the peak velocities and arrival times. Panels (b) and (c) show the RMSE and the MAE for the HSS arrival times and peak velocities, respectively. While the hit percentage increases, the errors steadily decrease as the versions improve. Panel (d) shows the correlation coefficient between model and measurements for the different ESWF versions. The only significant positive difference occurs in versions with the PILEalgorithm, whereas the ESWF 3.1 performs slightly better than 3.0. Panel (e) shows the median Euclidean distance for each version of the ESWF during a HSS. We see that the distance decreases for every consecutive version, which can be interpreted as improvements. Panel (f) shows the median normalized DTW-score for each version. For each HSS, the DTW-score/cost has been calculated and normalized by the number of data points, and the median of these values is plotted. Since the DTW cost is an inverse measure of the similarity of two time series, lower cost means higher similarity. Note that these numbers are only to be interpreted relative to each other. One DTW score alone gives no clue about the goodness of a model. We see that the similarity improves for most consecutive versions and is best for the ESWF 3.2.
To better understand the improvements as derived from the point-to-point statistics, we inspect three time series for which the HSS profiles are modeled by the ESWF 2.0 and 3.2. Figure 5 shows the in-situ measured solar-wind speed using OMNI data for periods during three different solar cycle phases, namely ( notice that the occurrence frequency of HSSs increases as the solar cycle is in its declining phase during 2016. For all three phases, we observe a qualitative improvement of ESWF 3.2 compared to ESWF 2.0, especially during the declining phase in panel (b) of Figure 5. During the maximum phase, in panel (a) of Figure 5, we see two distinct HSSs that previously have been detected by the model, but their profiles don't match very well with the observed data. One of those structures is in early June, the other in early July. The first is modeled quite accurately with ESWF 3.2. The second one is slightly off, but still matches better the observed profile in comparison to ESWF 2.0. In panel (b) in Figure 5 we can identify six different HSSs during the declining phase of the solar cycle. For almost each of those HSS we see that the model has improved in both timing and peak velocity. During the solar mini- mum, we see in panel (c) that the forecast of larger structures slightly improved, while some smaller ones have even worsened. It is difficult to qualitatively assess the change in performance for low solar activity. Overall we can see an improvement throughout the solar cycle. Table 1 summarizes all the statistical results for the entire sample of HSSs over the whole solar cycle, as well as for the minimum and the maximum phases separately. In addition, we calculate the statistics for either a velocity hit or a time hit separately. Interestingly, we derive that the time hits perform better in comparison to the velocity hits, which on average underestimate the peak velocity. Table 1 Performance of two different versions of the ESWF for the entire sample of HSSs (full) and separately for the events occurring prior to 2016 (Min) and after 2016 (Max). The hit statistics use the criterion that the modeled HSS peak velocity is within ±100 km s −1 and the HSS peak arrival time is within 48 hours of the observed ones. In addition, we give separate statistics for either a hit in terms of peak velocity or arrival (two bottom columns).

Discussion
Over the past years, we made substantial progress and increased our knowledge about the evolution of CHs, their specific shapes and locations on the Sun, that are clearly affecting the in-situ measured solar-wind speed profile (e.g., Garton, Gallagher, and Murray, 2018a;Heinemann et al., 2018;Hofmeister et al., 2018Hofmeister et al., , 2020Hofmeister et al., , 2022Samara et al., 2022b). The ESWF model (Vršnak, Temmer, and Veronig, 2007;Rotter et al., 2012Rotter et al., , 2015Reiss et al., 2016) is a simple but robust forecast algorithm, relating CH areas to the in-situ measured HSS on Earth with an average lead time of 4 days. Specifically, here we are interested in improving the ESWF model towards more reliable forecasts of the arrival time and speed of individual HSSs. For that we newly implemented compression and rarefaction effects. We developed two algorithms, PILE and PROP. Both are simplified approaches assuming a uniform density distribution between HSSs and slow wind close to the Sun. PROP (ESWF 3.0) represents a simple displacement between the streams, which is physically incorrect (see also Hofmeister et al., 2022). PILE (ESWF 3.1) reflects a ballistic propagation of the streams with inelastic interactions for which the background solar-wind is considered as a constant (derived with ∼400 km s −1 ) and a time-variant speed variable (c 0 in Equation 3) accounting for shortterm variations is added. Though the hit statistics are slightly better compared to ESWF 2.0, PROP does not cover much interaction between fast and slow solar-wind and, hence, it causes similar early arrival times as ESWF 2.0 with an MAE of more than 40 hours (see Figure 4). The results for PILE reveal in the MAE and correlation coefficients clear improvements over ESWF 2.0 and therefore PILE was used as basis to further develop ESWF 3.2, where the co-latitude information and dynamic coronal-hole extraction threshold was implemented. The improvements come at a low computing cost as the algorithm in total takes less than two minutes to compute.
For the comparison between ESWF 3.2 and ESWF 2.0, we use hit & miss statistics on the HSS sample with the following criteria. A hit was considered when both, modeled and observed HSS speed maximum, occurred within a 48-hour window and the maximum speed of the modeled and observed HSS matched by ± 100 km s −1 . We derive a relative increase of hits by 13.9%. Note that the amount of separate hits in timing increase more than those in velocity. In addition, we calculate for the arrival time and speed the MAE and RMSE for the HSS sample and find improvements up to 25%. In order to quantify the match of the entire modeled HSS profile within a time window of one week, we calculate the Pearson correlation coefficient, the Euclidean distance, and DTW cost. For the Pearson correlation we obtain a relative increase of 14.3%. We note here that the DTW cost is very sensitive to the boundary values and the length of the time series (here 7 days). It can only be used as a relative comparison between two time series. The decrease of the DTW cost found from ESWF 2.0 to 3.2 shows the improvement in terms of the increased similarity between modeled and observed HSS profile when using ESWF 3.2. For more details we refer to the study by Samara et al. (2022a). Table 4 in Reiss et al. (2016) shows a comparison of two different solar-wind models, ESWF 2.0 and Wang-Sheeley-Arge (WSA: Wang and Sheeley, 1990;Arge et al., 2003), and their performance of simulating high-speed enhancements (HSEs). HSEs were defined by speed increases above a minimum threshold value for the background solar-wind. For the time range of 2011 -2014 they found a Pearson correlation coefficient of 0.39 -0.42 for ESWF 2.0 and 0.37 -0.50 for WSA, depending on the definition of the threshold for HSE detection. These results are comparable to our results for the ESWF 3.2 during 2012 -2021 (Table 1), giving a Pearson correlation coefficient of 0.31 and 0.46 for the HSS list during maximum and minimum solar activity, respectively. The better performance for low solar activity phases is derived also for numerical background solar-wind models and can be explained by the low number of coronal mass ejections propagating in interplanetary space (see e.g., Owens et al., 2008;Gressl et al., 2014).
Studies show that the HSS speed measured at Earth, seems also to be affected by the actual morphology of a coronal-hole (e.g., Bromage et al., 2000;Samara et al., 2022b). In-situ measurements close to the Sun derived with the Parker Solar Probe hint towards a mixture of open and closed field from within coronal holes (Bale et al., 2019). This supports the finding that the predominantly open magnetic field stems from flux tubes covering only a small fraction of a coronal-hole's area (Hofmeister et al., 2017). Fine structures within the coronal-hole boundaries are not covered by ESWF. Definitely, more investigations are needed with respect to coronal-hole areas/boundaries and how these appear as an entity of a dark region.

Conclusion
The latest version of the empirical solar wind forecast model has the following new adaptations: • Dynamic Thresholding: CH area extraction using a threshold, which takes into account solar activity. • Co-latitude Information: segmentation of CH areas into different latitudes to account for the three-dimensionality of the solar-wind propagation. • PILE Algorithm: simple stream interaction model using constant density distribution and inelastic collision of streams.
Over the whole Solar Cycle 24, we see an improvement of the ESWF 3.2 over ESWF 2.0 in terms of: • Number of correctly forecasted HSS Peaks: 13.9% relative increase • Pearson correlation: 14.3% relative increase Furthermore, we find the following mean deviations from the in-situ data: • We derive that the prediction is usually about 1.5 days and 90 km s −1 off. These values can roughly be used as uncertainties of the model ESWF version 3.2.
Further improvement of the ESWF is needed to more accurately predict space weather conditions, especially considering the current performance for the modeled peak velocity. The largest improvement step in that regard has been achieved between ESWF versions 1.0 and 2.0, whereas the presented newer versions mainly improved the timing of the peaks (compression effect). Methods for further improvements include data assimilation from insitu measurements, which was already shown by Podladchikova et al. (2018), and which is another ESA service called ESWF24. The data assimilation however, strongly reduces the lead time (for ESWF24 to 24 hours). Other options for better forecasting are always given by machine-learning algorithms. In that respect, the morphology of CHs and in-situ measured speed profiles could be related in more detail (see e.g., Samara et al., 2022b) for enhancing the forecast quality.

Competing interests
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.