Sub‑Seasonal Experiment (SubX) Model‑based Assessment of the Prediction Skill of Recent Multi‑Year South Korea Droughts

Reliable sub-seasonal forecast of precipitation is essential to manage the risk of multi-year droughts in a timely manner. However, comprehensive assessments of sub-seasonal prediction skill of precipitation remain limited, particularly during multi-year droughts. This study used various verification metrics to assess the sub-seasonal prediction skill of hindcasts of five Sub-seasonal Experiment (SubX) models for precipitation during two recent multi-year South Korea droughts (2007 − 10 and 2013 − 16). Results show that the sub-seasonal prediction skill of the SubX models were stage-, event-, and model-dependent over the recent multi-year droughts. According to the Brier skill scores, SubX models show a more skillful in one to four lead weeks during the drought onset and persistence stages, than the recovery stage. While the prediction skill of the SubX models in the first two initial weeks show more skillful prediction during the 2007–10 drought, the impact of the forecast initial time on the prediction skill is relatively weak during the 2013–16 drought. Overall, the EMC-GEFSv12 model with the 11 ensemble members (the largest among the five SubX models) show the most skillful forecasting skill. According to the sensitivity test to the ensemble member size, the EMC-GEFSv12 model had no gain for biweekly precipitation forecast with the nine ensemble members or more. This study highlights the importance of a robust evaluation of the predictive performance of sub-seasonal climate forecasts via multiple verification metrics.


Introduction
South Korea experienced long-lasting precipitation deficits during 2013-2016, which caused destructive socio-economic drought impacts, such as about 24 million US dollars of crop production losses (KREI 2016). News articles from mass media and internet portals highlighted the exceptionally low flow condition and adverse impacts of the 2013-16 drought Park et al. 2022a). Recent severe South Korea droughts, including the 2013-16 drought, were mainly induced by abnormal large-scale atmospheric circulations over East Asia modulated by the North Pacific Oscillation (NAO), Arctic Oscillation (AO), and sea surface temperature in the North Pacific and North Atlantic Myoung et al. 2020;Ham et al. 2022;Ma et al. 2022;Park et al. 2022b). Although South Korea has developed well the national water management system by constructing numerous dams and reservoirs, recent severe droughts caused significant socio-economic damages. An increased risk of South Korea droughts is a concern for water resources managements (Rhee and Cho 2016).
Over the last two decades, the government and drought research community have made collaborative efforts to develop and maintain operational systems for drought monitoring and forecasting over South Korea. These operational systems have been designed to detect different types of droughts, including meteorological (precipitation deficit), agricultural (soil moisture deficit), and hydrological (streamflow or reservoir water-level deficit) droughts and monitor the current drought condition and water use changes. They also provide the maps for the outlook of the near-future drought condition (https:// hydro. kma. go. kr/ front/ intro. do; https:// www. droug ht. go. kr/ main. do). These operation systems generate important information for water resources 1 3 Korean Meteorological Society management and policies, and action plans of the government and local stakeholders to mitigate agricultural and economic losses. For proactive plans for drought mitigation, reliable drought sub-seasonal forecasts (e.g., within one lead month) is crucial. For example, sub-seasonal forecasts will provide the time window of opportunities to minimize the adverse effects during the persistence stage of severe droughts, particularly multi-year droughts. However, the current drought monitoring and forecasting systems is based on monthly precipitation anomalies, which is difficult for authorities and local stakeholders to respond to the persistent stage of multi-year droughts in a timely manner.
Drought is generated by multi-scale meteorological processes, such as land-atmosphere coupling, sea surface temperature teleconnections, and anomalous large-scale circulations. While precipitation and temperature are essential climatic factors for drought propagations (Easterling et al. 2007;Vicente-Serrano et al. 2010;Chen and Sun 2015;Luo et al. 2017;Park et al. 2020;Kam et al. 2021), persistent precipitation deficits play a key role in modulating the terrestrial water budget during the drought onset and persistence stages (Byun and Wilhite 1999;Oki and Kanae 2006;Mishra and Singh 2010;Song et al. 2014;Zhang et al. 2017;Koster et al. 2019;Parker et al. 2021). Oceans also play an important role in large-scale drought persistence at the seasonal or longer temporal scales (Hoerling and Kumar 2003;McCabe et al. 2004;Kam et al. 2014a). Anomalous largescale circulations and associated pressure systems provide a favorable condition for seasonal droughts (Li et al. 2011;Diem 2013;Kam et al. 2014b;Zi et al. 2022).
While persistent precipitation deficits initiated a drought, an intense storm event in a single day can terminate the drought (Byun and Wilhite 1999;Hayes et al. 1999;Wilhite et al. 2000;Heim Jr. 2002;Mishra and Singh 2010;Kam et al. 2013;Dettinger 2013). For example, a recent multiyear drought over California was initiated slowly in late 2011 and persisted in 2017. It was recovered by atmospheric rivers and associated storms in early January 2017 (within two weeks). The multi-year drought experience forced authorities and water resources managers to operate dams and reservoirs for drought mitigation and recovery (e.g., maintaining a high level of water surface in the dam reservoir). It resulted in the Oroville dam crisis in February 2017 when another atmospheric river and associated intense precipitation hit California (Vano et al. 2019). During the Oroville dam crisis, the public's interest in drought was still high because they were concerned about the re-emergence of the multi-year drought (Kam et al. 2019). Therefore, reliable sub-seasonal drought prediction skill is warranted for a timely drought response, particularly during the persistence stage of multiyear droughts.
Recently, the Sub-seasonal Experiment (SubX) project has been launched to provide the sub-seasonal hindcasts and forecasts of multiple climate forecast models for the globe (Pegion et al. 2019). The SubX climate forecast models provide daily forecasts up to the next 30 to 45 days. Previous studies found that the SubX models show better predictive performance through considering the environmental circulations at sub-seasonal time scale like the Madden-Julian oscillation (MJO) and the variation in upper-tropospheric jet (L'Heureux et al. 2021;Li et al. 2021;Lim et al. 2021). For the extreme events, such as droughts and floods, the sub-seasonal forecasts of the SubX models can provide reasonable sub-seasonal prediction guidance for U.S. climatic extremes (DeAngelis et al. 2020;Cao et al. 2021). In addition, the SubX models are successfully used to provide meteorological daily forecast for hydrologic prediction over India (Tiwari and Mishra 2022) and the eastern South America (Pegion et al. 2019). However, assessments of the prediction skill of sub-seasonal forecasts of SubX models for South Korea droughts remain limited.
This study aims to assess the predictive performance of SubX models for precipitation anomalies in South Korea, particularly during the duration of two recent multi-year droughts, by answering the following scientific questions: 1) How does the predictive performance of SubX models change between the two recent multi-year droughts?, 2) What is the impact of initial time on the predictive performance of the SubX models?, and 3) What is the sensitivity of the predictive performance to the verification metric for ensemble/deterministic and categorical forecasts? The findings of this study will guide a direction for how to facilitate SubX sub-seasonal forecasts for South Korea droughts.

Observational Data and SubX Hincasts
This study used the National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC) daily precipitation gridded data at the 50-km (0.5°) resolution (https:// psl. noaa. gov/ data/ gridd ed/ data. cpc. globa lprec ip. html). The temporal coverage of the CPC daily precipitation data is from 1979 through 2016. The CPC precipitation data was constructed based on the gauge reports of over 30,000 stations around the world. It was quality controlled by other independent measurements like the radar and satellite observations and the numerical models' outputs (Xie 2010).
In this study. monthly liquid water equivalent thickness anomalies of the Gravity Recovery and Climate Experiment (GRACE, Tapley et al. 2004) was used to assess propagations of liquid water equivalent thickness anomalies during the recent multi-year droughts over South Korea (http:// www2. csr. utexas. edu/ grace/ RL05_ masco ns. html). The spatial resolution of the GRACE data used in this study was 25 km (0.25 • ). The GRACE data has been widely used for tracking spatiotemporal variation in groundwater where the ground observations are limited (e.g., Shamsudduha et al. 2012;Scanlon et al. 2012;Meghwal et al. 2019).
To investigate general predictive performance on South Korea, the CPC daily precipitation and GRACE liquid water equivalent thickness were averaged over the southern part of the Korean Peninsula (34-40 • N and 126-130 • E). Here, observational precipitation anomalies were computed based on the 1999-2016 climatology, which is a common period for the SubX model climatological period (Table 1).
This study used multi-ensemble hindcasts of SubX models for precipitation during 1999-2016 (Table 1). Out of the six SubX models, five SubX models were selected based on the availability of multiple ensemble hindcasts (≥ three ensemble members) and the climatology at the International Research Institute archive (IRI archive, http:// iridl. ldeo. colum bia. edu/ SOURC ES/. Model s/. SubX/). The spatial resolution of hindcasts of the SubX models is 100 km (1.0 • ). SubX models provide weekly or sub-weekly sub-seasonal hindcasts and forecasts with different forecast initial times (32 to 45 lead days). The detailed configuration for SubX models can be found in Pegion et al. (2019).
In this study, the three sets of the two-week hindcasts (i.e., week 1-2, week 2-3, and week 3-4) were considered, following the consistent method of the SubX system (http:// cola. gmu. edu/ subx/ forec asts/ forec asts. html). To compare the precipitation anomaly hindcasts of SubX models to the observation, the forecasted precipitation time series of the SubX models were constructed based on the forecast initial time of week 1-2, week 2-3, and week 3-4 (see a schematic diagram for two lead week hindcast calculation in Fig. S1). The multi-model ensemble means (MMEs) in each date were calculated by averaging the values of five models. Precipitation anomalies of the selected SubX models were computed using the climatology of each model that are provided at the IRI archive and averaged over South Korea.

Verification Metrics
Various verification metrics for the ensemble/deterministic and categorical forecasts were calculated to evaluate robustly the predictive performances of SubX models ( Table 2; Harvey et al. 1992;Huang and Zhao 2021). For the ensemble/deterministic forecast, the correlation coefficient (CC), root mean square error (RMSE), and relative bias (RB), and interquantile range (IQR) were calculated based on relations between the observations and the forecasts of SubX models. The spread of multiple ensemble hindcasts for each SubX model was quantified by the IQR. The continuous ranked probability skill score (CRPSS; Pappenberger et al. 2015) and the Nash-Sutcliffe efficiency (NSE; Nash and Sutcliffe 1970) were computed to evaluate relative prediction skills of SubX models compared to the reference forecast. For CRPSS and NSE, the reference forecast is the climatology of daily precipitation and the climatological observed precipitation, respectively. The CRPSS and NSE values is 100 and 1.0, respectively, when the forecast is a perfect prediction skill. The positive (negative) CRPSS and NSE indicate that the forecasts outperform (underperform) than the reference forecast.
For categorical forecasts related to the positive and negative precipitation anomalies, various verification metrics were calculated. The verification metrics include hit rate (HIT) and false alarm rate (FAR) of the signal detection theory (Harvey et al. 1992), the Brier skill score (BSS; Becker and van den Dool 2016), and probability of detection (PD)/false detection (PFD) for each phase of precipitation anomaly ( Table 2). The BSS score is 100 when the forecasting skill is perfect. A positive BSS indicates that the forecast is better than the observed climatology (the reference forecast). The PD and PFD were used to construct the receiver operating characteristic curve for the forecasts of SubX models.

Effective Drought Index
Using the CPC daily precipitation data, the effective drought index (EDI, Byun and Wilhite 1999) is calculated to estimate the drought condition in South Korea. The EDI accounts for the accumulation effect of daily precipitation deficit and can monitor the daily temporal evolution of droughts (Kim et al. 2009;Park et al. 2015;Jain et al. 2015). The EDI is classified into three severity categories: moderate (-1.5 < EDI ≤ -1.0), severe (-2.5 < EDI ≤ -1.5), extreme (EDI ≤ -2.5) drought . The detailed calculation procedure can be found in Byun and Wilhite (1999) and Park et al. (2022a). In this studythe onset and recovery of a drought were defined at the different threshold value of EDI (-0.5 and 0.5 of EDI, respectively), which allows us to capture a full recovery of the ongoing drought (Kam et al. 2019). Figure 1 shows the 31-day running means of the CPC precipitation and GRACE liquid water equivalent thickness anomalies over the 2007-10 and 2013-16 droughts. The 31-day running means were computed to account for the cumulative impact of precipitation anomalies (the long-term persistence). Over 2007-2010, the consecutive negative precipitation anomalies from late 2007 developed the drought onset (Fig. 1a) and the severity rapidly reached -1.01 of EDI in March 2008 (Fig. 1c). Despite the retreat in summer 2008, Table 2 Verification metrics used in this study (Harvey Jr. et al., 1992;Huang and Zhao 2021) N Metric Equation

Hindcasts of SubX Models for Two Multi-Year South Korea Droughts
Continuous ranked probability score (CRPS) skill score (CRPSS) Hit rate (HIT) the drought re-emerged and reached -1.68 of EDI (severe drought) in April 2009. Due to the large positive precipitation anomalies in summer 2009, the full recovery of second drought is found. However, severe persistent precipitation deficits in the following autumn and early winter re-emerged and persisted the drought in early 2010. Over 2013-2016, several precipitation surplus retreated the several drought conditions and terminated the drought in 2016 (Fig. 1b and d). During this period, the minimum value of EDI was -1.83 in April 2014 after the drought onset, and -1.71 in October 2015 during the drought reemergence (Fig. 1d), indicating that the 2013-16 drought was a more severe drought than 2007-10 drought. Also, the days with -0.5 of EDI or below is 810 days during the 2013-16 drought, which was longer than those days during the 2007-10 drought (427 days).
At the daily scale of drought propagations, the positive precipitation anomalies retreated the on-going drought (e.g., after June 2008 and 2009 and near June 2015 and 2016) and the following persistent negative precipitation anomalies caused abrupt transitions to the drought re-emergence stage. This result suggest that daily drought propagations have strong variation of the drought condition, confirming that different threshold values for drought onset (-0.5 of EDI) and recovery (+ 0.5) is necessary for drought frequency analysis to avoid the overestimate of drought events from a single threshold value (-0.5).
Results from the CPC and GRACE data show coincident negative precipitation anomalies and liquid water thickness anomalies (Fig. S2), indicating an evolution of meteorological drought to hydrological drought, possibly causing severe hydrological damages in socioeconomic area in South Korea. For example, the 2013-2016 drought is a mega-drought in South Korea historically and received huge attention from the public due to its destructive damage (NDSP 2017;Park et al. 2022a). Figure 2 shows the 31-day running means of daily precipitation anomalies for the observation (CPC), the multimodel ensemble mean (MME), and the ensemble mean of each SubX model over 2007-10. Results show that the prediction skill decreases gradually from the week 1-2 to the week 3-4 forecast initial times. In the week 1-2 of the forecast initial time (Fig. 2b), the MME and the most of SubX models were able to predict well the timing of significant positive and negative precipitation anomalies. In particular, the abrupt phase transition of precipitation anomalies was reasonably predicted near the June 2008 and 2009. In the week 2-3 of the forecast initial time (Fig. 2c), the MME and the most of SubX models were not able to predict negative precipitation anomalies in near September 2008 and April and May 2009. However, the timing and magnitude of predicted precipitation anomalies compared to observation were relatively not reliable in the week 3-4 forecast initial time (Fig. 2d).
The predictive performances of the SubX models for the 2013-16 and 2007-10 droughts were consistent (Fig. 3). The best predictive performance was found in the week 1-2 forecast initial time and it was gradually lower as the initial time was earlier. In particular, for the week 1-2 forecast initial time (Fig. 3b), the SubX models' forecasts were able to predict abrupt phase transitions of precipitation anomalies (e.g., June 2016).

SubX Multi-Model based Predictive Performances
The CC, IQR, RMSE, RB, NSE, and CRPSS were calculated for the period of two multi-year droughts to quantitatively estimate prediction skills of SubX models in terms of ensemble and deterministic forecasts (Fig. 4). For both droughts, the CC shows statistically significant relations of larger than 0.6 even the week 3-4 forecast initial time, indicating that forecasts of SubX models relatively well captured the overall variation in observed precipitation during the on-going drought. The positive RB and RMSE values indicate that the precipitation amount of SubX models generally overestimate than that of observation. The positive NSE and CRPSS also indicate that the SubX models were more skillful than the reference forecast over both drought periods. According to the multiple verification metrics (CC, IQR, RB, NSE, and CRPSS), the SubX models showed more skillful forecasts for the 2007-10 drought (0.9 of CC, 1.9 of IQR, 6.2% of RB, 0.8 of NSE, and 45.4% of CRPSS) than that for the 2013-16 drought (0.8 of CC, 2.1 of IQR, 8.4% of RB, 0.6 of NSE, and 42.6% of CRPSS).
The prediction skill was decreased rapidly as the forecast initial time increases. The prediction skill of the SubX models for the 2007-10 drought was decreased rapidly in the week 2-3 and 3-4 forecast initial times (e.g., dCC∕dt = -0.13 per week, dRMSE∕dt = 0.6 per week, dRB∕dt = 3.4% per week, dNSE∕dt = -0.2 per week, and dCRPSS∕dt = -16.1% per week). Interestingly, the prediction skill of SubX models was decreased relatively slowly for the 2013-16 drought in the week 2-3 and 3-4 forecast initial times (e.g., dCC∕dt = -0.09 per week, dRMSE∕dt = 0.3 per week, dRB∕dt = 1.2% per week, dNSE∕dt = -0.13 per week, and dCRPSS∕dt = -9.6% per week). These results imply that the generating mechanisms for the 2007-10 and 2013-16 droughts might be different, which requires a further investigation, but is beyond the scope of this study. Figure 5 shows HIT, FAR, and BSS for each (negative and positive) anomaly phase during the drought periods. The BSS scores show that the predictive performance of SubX models for the negative precipitation anomaly (> 60% BSS) was more skillful than that for the positive precipitation anomaly (33.5-55.0% of BSS), indicating that the SubX models might take an advantage of Fig. 2 The 31-days running mean of precipitation anomalies for the observation (CPC) (a), multi-model ensemble mean (MME), and ensemble mean of each SubX model (b)-(d) for each forecast initial time in 2007-10 drought case. For the SubX data, the 14-day-average anomaly was used corresponding to the week 1-2 (b), week 2-3 (c), and week 3-4 (d) forecast initial times, and its center position for abscissa is the seventh date of 14-day forecast (refer the visualized methodology in Fig. S1). The shaded area was plotted only if the sign of SubX model data is same with observation data and more than half of ensemble members of that show same sign 1 3 Korean Meteorological Society Korean Meteorological Society the long-term memory/persistence during the precipitation negative anomaly phase (e.g., the drought onset and persistence stages) or the dry initial condition during the drought. The HIT, FAR, and BSS estimates also show that a relatively more rapid decrease of the predictability of the SubX models as the forecast initial time increases for the 2007-10 drought (e.g., dBSS_P∕dt = -10.2% per week and dBSS_N∕dt = -6% per week) than that for the 2013-16 drought (e.g., dBSS_P∕dt = -8.8% per week and dBSS_N∕dt = -5.6% per week).
The receiver operating characteristic (ROC) curves were constructed based on the PD and PFD from the two-by-two  Receiver operating characteristic (ROC) curve for each categorical forecast type (i.e., positive and negative precipitation anomalies) and drought case contingency table. The ROC curves allows us to evaluate the predictive performance for the categorical forecast in terms of probability between ensemble members (Fig. 6). The point in the upper left corner (bottom right corner) indicates the better (worse) forecast than reference forecast which is 1:1 line of PD and PFD. Overall, the upper left corner ROC curves show a more skillful predictability of the SubX models than reference forecast. However, the ROC curves of the week 3-4 forecast for the 2007-10 drought and the week 2-4 forecast for the 2013-16 drought were close to the 1:1 line of PD and PFD, indicating no difference from the prediction skill of the reference forecast. In other words, the prediction skill of SubX models was much longer sustained for the 2007-10 drought than that of for the 2013-16 drought, which is an inconsistent result from the verification metrics for ensemble/deterministic forecasts. This result implies that the robust evaluation of predictive performance of climate forecast models are necessarily based on multiple verification metics. Figure 7 show the mean score for each model which is averaged over the various metrics belonging in same forecast types: the mean score of ensemble/deterministic (categorical) forecast is calculated by averaging relative scores of CC, IQR, RMSE, RB, NSE and CRPSS (HIT, FAR, BSS, PD, and PFD). This mean score was a relative score between models based on multiple verification metrics to summarize the overall performance among the five SubX models through the following steps: 1) calculating all the verification metrics for each individual SubX model's ensemble mean (Tables S1 and S2) and 2) normalizing the score of each verification metric by a range of the maximum and minimum metric values among SubX models (Tables S3  and S4).

SubX Individual Model based Predictive Performances
In general, the ESRL-FIMr1p1 and EMC-GEFSv12 models showed a more skillful predictive performance than other models in all forecast initial time (Fig. 7a-c). Based on the average of the mean score for the ensemble/deterministic forecasts and the categorical forecast (Table 3), the EMC-GEFSv12 and ESRL-FIMr1p1 models were the best and second-best model, respectively. The ECCC-GEPS6 and RSMAS-CCSM4 models were the worst model for the ensemble/deterministic forecasts and the categorical forecast, respectively. This result confimed that the predictive performance of model can be differently evaluate depending on the applied verification metric. Similarly, the spread of relative forecast performances was inconsistent between the verification metrics of ensemble/deterministic forecast and the categorical forecast ( Fig. 7d-f). Interestingly, the spread of relative forecast performance between the SubX models was relatively narrower for the 2007-10 drought than that of 2013-16 drought.
Overall, the EMC-GEFSv12 model with 11 ensemble members showed a more skillful performance than other models with three or four ensemble members, hypothesizing a possible benefit of a large ensemble size on the subseasonal prediction skill. To validate this hypothesis, The CRPSS, NSE, and BSS scores were calculated based on 10,000 times bootstrap sampling of 11 ensemble members of the EMC-GEFSv12 along different ensemble sizes (Figs. 8 and 9). Results show that the mean value of CRPSS, NSE, and BSS increased significantly in all the forecast initial times and drought events as the ensemble size increased, particularly when the ensemble member size is low (e.g., three to five ensemble members). The range of boxplot was gradually narrow to larger ensemble member, indicating the reduction of uncertainties. Results also showed the upper limit of the ensemble member size to enhance the prediction skill, which was around eight/nine ensemble members. This results might be model-dependent, which is needed to further investigate.

Discussion
The monsoonal circulation in East Asia plays as a dominant role to decide the timing and amount of precipitation in South Korea Liu et al. 2022). However, even in the major monsoon season, the precipitation in South Korea is not continuously occurring during the whole season, and it can be depending on the movement of monsoonal front at the sub-seasonal time scale (Chen et al. 2004;Kim and Kim 2020;Park et al. 2021). In addition, precipitation over South Korea can be greatly driven by the environmental circulations at the sub-seasonal time scale such as, the extratropical cyclone (Moon et al. 1994;Park et al. 2021), typhoon (Kim and Jain 2011), and Okhotsk Sea blocking (Song and Ahn 2021). Since variation in precipitation is closely connected to the evolution of drought, these subseasonal environmental circulations should be replicated well in climate forecast models, such as SubX models, to produce the robust sub-seasonal forecasting skill for South Korea droughts. Currently, the limited forecasted variables (e.g., upper-level tropospheric pressure field) are available in the SubX forecast archive, but it may be necessary to further investigate the simulated relationship between precipitation and surrounding circulation patterns in the SubX models against the observed relationship, which remains unknown. The predictive performance of the SubX models was verification metric-dependent and event-dependent. While the SubX models relatively well forecasted the 2007-10 drought than the 2013-16 drought, their prediction skill was much rapidly decayed for the former drought case as the forecast initial time increases. The predictive performance of each individual SubX model can be differently evaluated, depending on the type of forecasts (ensemble/deterministic vs. categorical). The findings of this study suggest the need to use multiple verification metrics for robust evaluations of the predictive performance of climate forecast models, which is necessary to avoid the biased assessment and to objectively evaluate the model performances for the extreme event.
This study confirmed that a model with larger ensemble members could enhance the prediction skills in terms both of mean value and uncertainties. These results imply that the limited sub-seasonal prediction skill from an imperfect model physics can be more improved by simply adding models and ensemble members. The results clearly showed the upper limit of the affected ensemble size (herein, eight ensemble members), which requires advancement of the physics in climate forecast models for the improvement the prediction skill beyond a certain predictive performance level. Therefore, the optimized ensemble size of climate forecast models is carefully designed for the efficient tradeoff between prediction skill and computational cost.
This study assessed the forecasting skill of the SubX climate forecast models during the two major drought events over the southern part of the Korean Peninsula. This study used the climatology of SubX models that was provided by each modeling's group. The period for the climatology was relative short (about 20 years) and slightly different among SubX models, which could be biased to precipitation anomalies. Risbey et al. (2021) suggested that the prediction skill of models can be overestimated in the case of hindcast than the real-time forecast due to the biased approach in anomaly assessment. This implies that the "fair" evaluation of prediction skill may be needed with a careful design of the operation system that are consistent for hindcasts and real-time forecasts.
This study focused on multiple verification metricbased assessment for forecasting skill of SubX models. The SubX models provided limited climate variables, including sea surface temperature, near-surface temperature, and 500-hPa geopotential height, due to the data storage, which is difficult to explore the detailed mechanisms in the limited prediction skill of the SubX models against observed mechanisms. A climate model-based study (Ham et al. 2022) found that the large-scale sea surface temperature forcing over the subtropical central Pacific contributed to the 2013-16 South Korea drought. Other generating mechanisms can be further investigated via land-seaatmospheric interactions, using the reanalysis products, such as ECMWF reanalysis version 5 (ERA5).

Conclusions
This study conducted a comprehensive evaluation for the predictive performance of the five SubX models for two recent multi-year droughts over South Korea. By using various verification metrics for ensemble/deterministic and categorical forecasts, the prediction skills of SubX sub-seasonal hindcasts were quantitatively estimated. The positive CRPSS, NSE, and BSS scores in all forecast initial times of the SubX models indicate that the SubX models were obviously more skillful than the reference forecast. According to the mean scores for ensemble/deterministic and categorical forecasts, EMC-GEFSv12 was superior to other models for sub-seasonal forecasts of South Korea droughts.
The SubX sub-seasonal forecasting system is operating for the globe in the near real-time. Currently, the Korean Meteorological Administration (KMA) has operated at the short-through long-range forecast systems using a single climate forecast model. This study provides an insight about how facilitate the SubX sub-seasonal forecasting system and the current KMA forecasting system. Forecasted daily precipitation from the SubX models can be used to outlook propagations of the ongoing drought via daily drought indices, such EDI. The multiple SubX model-based operational system for drought forecasting will help evaluate uncertainties in the outlook of multi-year droughts robustly. This study highlights the importance of a robust assessment of the predictive performance of sub-seasonal climate forecasts via multiple verification metrics. by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021M3I6A1086808). Chang-Kyun Park was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1C1C2004711).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.