High-resolution Projection Dataset of Agroclimatic Indicators over Central Asia

To understand the potential impacts of projected climate change on the vulnerable agriculture in Central Asia (CA), six agroclimatic indicators are calculated based on the 9-km-resolution dynamical downscaled results of three different global climate models from Phase 5 of the Coupled Model Intercomparison Project (CMIP5), and their changes in the near-term future (2031–50) are assessed relative to the reference period (1986–2005). The quantile mapping (QM) method is applied to correct the model data before calculating the indicators. Results show the QM method largely reduces the biases in all the indicators. Growing season length (GSL, day), summer days (SU, day), warm spell duration index (WSDI, day), and tropical nights (TR, day) are projected to significantly increase over CA, and frost days (FD, day) are projected to decrease. However, changes in biologically effective degree days (BEDD, °C) are spatially heterogeneous. The high-resolution projection dataset of agroclimatic indicators over CA can serve as a scientific basis for assessing the future risks to local agriculture from climate change and will be beneficial in planning adaption and mitigation actions for food security in this region.


Introduction
Central Asia (referred to as CA, Fig. 1), which consists of five countries (Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, and Uzbekistan), is highly agrarian, with 60% of its population living in rural areas and agriculture accounting for over 45% of total employment and nearly 25% of the Gross Domestic Production (Babu and Djalalov, 2006). Agricultural land in this region is mostly desert and mountainous pastures. Arable land suitable for crop production is about 20% of the agricultural land (and as low as 4% in Turkmenistan). The main crops produced in CA are cotton and wheat. Kazakhstan is one of the world's major wheat and flour exporters. Uzbekistan is the sixth largest producer and the second largest exporter of cotton in the world. Semi-arid to arid climates prevail in the lowland areas of CA. Thus, in the central and western parts of CA, agriculture is only possible with irrigation. Major areas under rainfed agriculture are only found in the very north of Kazakhstan (Sommer et al., 2013). Every year, natural disasters such as droughts and extreme temperatures (cold and heat waves) bring risks to agricultural production in this region (Thurman, 2011).
Climate change may pose challenges to the vulnerable agriculture in CA. Sommer et al. (2013) assessed the impacts of projected climate change on wheat in this region and found that the projected increase in temperature is the most important factor leading to earlier and faster crop growth and higher biomass accumulation and yield. Conversely, the projected increase in precipitation is expected to be insignificant because of the increasing evaporative demand of the crops. In addition, higher temperatures were found to bring an increased risk of flower sterility and thus lead to less yields. Mirzabaev (2018) suggested adaptive actions should be taken to strengthen the resilience of agriculture producers in CA to increased weather variability.
Due to the lack of high-resolution climate data and less attention from the community relative to its surrounding areas, like East Asia, South Asia, and the Mediterranean Piao et al., 2010;Iglesias et al., 2011;Bandara and Cai, 2014;Cramer et al., 2018;Aryal et al., 2020;Huang et al., 2020), studies on the impacts of climate change on agriculture in CA are lacking, especially on the potential impacts of projected climate change under warming scenarios in the near-term future.
Recently, we carried out a study that involves the dynamical downscaling of multiple global climate models (GCMs) for the CA region with an unprecedented horizontal resolution of 9 km (Qiu et al., 2022). The reference and future periods of the simulations are 1986-2005 and 2031-50, respectively. In this study, the model data from the downscaled results is applied to calculate some key agroclimatic indicators (referred to as AIs), which are proxies for the effect of weather and climate on specific agricultural activities (Arnell and Freeman, 2021) and both practical and understandable to farmers and policy makers (Trnka et al., 2011). As absolute threshold-based temperature indices are largely sensitive to systematic model biases, statistical bias correction (adjustment) methods are suggested to be used to correct the raw model outputs (Dosio, 2016: Iturbide et al., 2022. Here, the quantile mapping method (Themeßl et al., 2011) is applied to correct the simulated temperature before calculating the AIs relating to absolute temperature thresholds.
The aim of this paper is to describe the high-resolution projection dataset of AIs over CA and investigate projected changes in these indicators in the near-term future. This study can serve as a scientific basis for assessing the future risks to the local agriculture from climate change and will be beneficial in planning adaption and mitigation actions for food security in this region. The remainder of this paper is organized as follows: section 2 describes the data and meth- ods. Projected changes in the indicators are presented in section 3, as well as the evaluation of the QM method. Section 4 provides usage notes. Discussion and conclusion are in section 5.

Data and method
2.1. The model data The AIs assessed in this study are calculated based on daily mean/maximum/minimum temperature (TG/TX/TN) from the 9-km-resolution dynamical downscaling of three different GCMs in CA with a regional climate model (RCM), the Weather Research and Forecasting (WRF) model (Skamarock et al., 2008). The three GCMs are MPI-ESM-MR (referred to as MPI), CCSM4 (CCSM), and HadGEM2-ES (Had) from Phase 5 of the Coupled Model Intercomparison Project (CMIP5). Before the downscaling, the bias-correction technique developed by Bruyère et al. (2014) is utilized to correct the climatology of the GCMs and meanwhile allow synoptic and climate variability to change. The WRF simulations are labeled as MPI_WRF_COR, CCSM_WRF_COR, and Had_WRF_COR ( "COR " means using the bias-correction technique), respectively. The reference-period simulations are from 1986 to 2005, and the future runs are between 2031 and 2050 under the moderate emission scenario RCP 4.5.
The simulated TG/TX/TN over CA has been extensively evaluated, and basic features of the projected temperature changes have been investigated (Qiu et al., 2022a(Qiu et al., , 2022b. Results show that the RCM simulations driven by three different GCMs can well capture the local temperature on time scales from daily to annual in CA during the reference period. For instance, the spatial correlation coefficients (SCCs) of the simulated seasonal and annual mean TG/TX/ TN over CA are all above 0.95 against the observations, and their root mean square errors (RMSEs) are all below 2.50°C. The three WRF simulations indicate that annual mean TG averaged over CA will increase by 1.6°C-2.0°C in the nearterm future (2031-50) relative to the reference period (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005). Stronger warming is detected north of ~45°N in CA from autumn to spring. Enhanced warming is projected in many mountainous regions (like the Tibetan Plateau/ Himalayas and Alps) around the world (Rangwala et al., 2013;Mountain Research Initiative EDW Working Group, 2015;Palazzi et al., 2019). However, the projected warming in the high-elevation areas of CA is not stronger than that in the plain areas.

Agroclimatic indicators
Specific crops grow well in specific climate regions, and the success of a crop can be related to climate factors (e. g., frequency of frost damage, length of growing season, heat stress) as well as physical factors (e.g., soil, slope, aspect) and farm management (e.g., irrigation, fertilization) (Petr, 1991;Rijks, 1994;Holden and Brereton, 2004). Understanding the complex interactions between crops and regional climate allows for better management decisions (Trnka et al., 2011). AIs are widely used to convey climate variability and change, which are meaningful to the agricultural sector. For instance, the Global Agriculture Sectoral Information System project developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) has produced contemporary and future AIs for climate change impact studies for global agriculture (SIS, 2019).
Here, we chose six of the ECMWF indicators that are most meaningful to assessing the potential impact of projected climate change on the agriculture in CA. Among them, growing season length (GSL, day) refers to the number of days when plant growth takes place, biologically effective degree days (BEDD, °C) provides valuable information on the local heat summation, frost days (FD, day) indicates frost damage, summer days (SU, day) and warm spell duration index (WSDI, day) indicate heat stress and heatwaves, and tropical nights (TR, day) gives information about the occurrence of various pests. Table 1 shows the detailed definitions of these AIs, which are calculated using TG/TX/TN from the WRF simulations (MPI_WRF_COR, CCSM_WRF_ COR, and Had_WRF_COR). Note that GSL is calculated with the model data throughout the year, and the other indicators are calculated for the growing season (April to October, Gessner et al., 2013). The WRF simulations show that projected changes in precipitation over few areas are significant in CA in the nearterm future (Qiu et al., 2022a(Qiu et al., , 2022b. We found that projected changes in AIs calculated based on precipitation (e.g., wet days, heavy precipitation days, and maximum number of consecutive wet/dry days) are also insignificant over most of CA. Therefore, only the results of the AIs which relate to temperature are presented in this paper.

Quantile mapping for postprocessing the model data
Our recent study (Qiu et al., 2022b) found that the WRF simulations have systematic biases in simulating the surface air temperature. For instance, the annual mean TG over the very northern part of Kazakhstan and the Pamirs exhibits a cold bias, and that over other areas generally exhibits a warm bias. All the chosen AIs except WSDI relate to absolute temperature thresholds (see their definitions in Table 1) and are particularly sensitive to the systematic biases in the model data. The biases in the raw model outputs can propagate down to the AIs, which may add uncertainties in the projected changes in them. Thus, the QM method is separately used to postprocess the simulated TG, TX, and TN. First, a transfer function depending on the quantile distribution is established by matching the model data with observations during the reference period (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005). Then, the transfer function is applied to correct the model data in the future period (2031-50). The equations are as below: where the model data and observations are denoted by and , respectively, calibration period and projection (future) period are denoted by and , respectively, is the variable, and are the empirical cumulative distribution function ( ) and its inverse, respectively, and is the bias-corrected model data. Numerous studies have found that the QM method can effectively remove model biases, not only for the mean and interannual variability, but also for extreme events (Ashfaq et al., 2010;Piani et al., 2010;Gudmundsson et al., 2012;Teutschbein and Seibert, 2012;Tong et al., 2021).
The land component of the fifth generation of European reanalysis (referred to as ERA5-Land, Hersbach et al., 2020) with ~9-km grid spacing is used as "observations " during the postprocessing. Prior to the bias correction, the model data is interpolated to the grid of the observations with the nearest neighbor method. There are two reasons why we chose ERA5-Land as observations. First, the traditional observations that are available for the CA region, like CRU TS v4 (version 4 of the Climatic Research Unit gridded Time Series) and CPC (Climate Prediction Center) Global daily temperature, have coarser resolution (0.5° × 0.5°) relative to the 9-km-resolution downscaled results. If we use these coarse-resolution gridded observations, the model data will be interpolated to their grids and the small-scale climate characteristics brought by the dynamical downscaling will be largely erased. Second, ERA5-Land is found to have good performance in describing the global land surface temperature with respect to the MODIS (Moderate Resolution Imaging Spectroradiometer) data (Hersbach et al., 2020) and has been applied for studies on many sectors in CA (Xue et al., 2019;Wang et al., 2020;Jiang et al., 2021;Lu et al., 2021).

Main producing areas of wheat and cotton
As introduced above, wheat and cotton are the main crops in CA. Hence, projected changes in the AIs over the main producing areas of these two crops are particularly assessed. Main producing areas of wheat and cotton are detected based on the Spatial Production Allocation Model (SPAM) 2010 v2.0 Global data (International Food Policy Research Institute, 2019), which contains the physical area, harvest area, production, and yield of 42 crops (including wheat and cotton) with 10 × 10-km grid-cell resolution. According to the condition that the main crop of a grid cell is defined as the crop whose harvest area is the largest and accounts for at least 5% of the grid cell area, the main producing area of wheat is detected to be located in northern Kazakhstan, with an area of ~3.8 × 10 5 km 2 , and that of cotton is detected to be along the Amu Darya and Syr Darya rivers and in Uzbekistan and Turkmenistan, with an area of ~9.4 × 10 4 km 2 (Fig. 1). Other thresholds (3% and 7%) are also tested, and the results are similar to those with a threshold of 5%.

Evaluation of the quantile mapping method
To evaluate the QM method, the reference period (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) of the model data is divided into two parts. The first half (1986-95) is set as the calibration period, during which the transfer function is established. The transfer function is then applied to correct the model data in 1996-2005. The AIs calculated based on the raw and bias-corrected model data (Raw-AIs and Cor-AIs) during 1996-2005 are compared with those calculated based on the observations (Obs-AIs) to show the performance of the QM method. Figure 2 presents the time-averaged Obs-AIs (the left column) over CA during 1996-2005 and the biases of the timeaveraged Raw-AIs (the middle column) and Cor-AIs (the right column). Because the spatial patterns of the biases are very consistent between the three WRF simulations (MPI_WRF_COR, CCSM_WRF_COR, and Had_WRF_ COR), only the results of WRF_CCSM_COR are shown. It is seen that the application of the QM method caused drastic reductions in bias for all the indicators over CA, especially for BEDD and SU (Fig. 2e vs Fig. 2f, Fig. 2k vs Fig. 2l). The RMSEs of all the indicators are generally reduced by more than half. For instance, the ensemble mean of RMSEs of BEDD decreases from 289.86 °C per year to 96.08 °C per year (Fig. 3d), and that of SU decreases from 21.01 days per year to 4.67 days per year (Fig. 3h). The Raw-AIs are close to the Obs-AIs in the spatial distribution, with the mean values of SCCs generally above 0.90 (e.g., Figs. 3a, c, and e). After using the bias correction, all the SCCs increase to near 1.00 (e.g., Fig. 3i), which suggests that the QM method not only reduces the biases of the AIs but also brings improvements in describing their spatial patterns.
We also evaluated WSDI, which relates to relative (not absolute) temperature threshold (see its definition in Table 1) and is calculated based on the raw model data. We found that it is difficult for the regional model to accurately simulate the spatial distribution of WSDI in CA, with the  SCCs in the range of 0.3-0.4. However, the biases of WSDI are minor, with the RMSEs as low as about five days per year. To sum up, the QM method is excellent in improving the accuracy of the simulated AIs, which provides a good base for assessing the projected changes in the agroclimatic indicators in CA.

Projected changes in the agroclimatic indicators
Projected changes in the Cor-AIs during the future period (2031-50) relative to the reference period (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) are demonstrated in this section. Besides the results of each WRF simulation, the ensemble mean is also illustrated. Figures 4 and 5 show that GSL, SU, WSDI, and TR are projected to significantly increase over CA in the near-term future, while FD is very likely to decrease over the entire region, especially in the high-elevation areas (e.g., the Tien Shan and Pamirs). Averaged over CA, GSL changes from about 201-202 days per year (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) to around 213-217 days per year (2031-50, Table 2). The aver-age number of SU (TR) increases from about 88 (35) days per year to around 101-108 (48-54) days per year. Moreover, the simulated WSDIs are 1.8-2.4 times longer in the coming decades. Regional averages of the Cor-AIs over the climate subregions [northern CA (NCA), middle CA (MCA), southern CA (SCA), and the mountainous areas (MT)] are also summarized in Table 2. See the scopes of these subregions in Fig. 1c of Qiu et al. (2022). Figure 4 shows that BEDD increases (>100°C yr −1 ) in northern CA and the mountainous areas and decreases (>300°C yr −1 ) in the southern and middle parts of the plain areas. The reason why projected changes in BEDD are spatially heterogeneous is as follows: the northern part and mountain ranges of CA have relatively cold climates, and higher temperature in the future will lead to more days whose TG exceeds the lower limit (10°C) of BEDD and thus increases BEDD in these areas; in contrast, the southern and middle parts of the plain areas have relatively hot climates, and the local warming will cause more days whose TG goes over Fig. 4. Projected changes (2031-50 vs. 1986-2005 in GSL, BEDD, and FD calculated based on the bias-corrected model data from three WRF simulations (WRF_MPI_COR, WRF_CCSM_COR, and WRF_Had_COR). The ensemble means of the changes are also shown. The slashed areas in subplots a-c, e-g, and i-k indicate where the changes passed the significance test at the 95% confidence level using the two-tailed Student's t-test. The slashed areas in subplots d, h, and l indicate where the signals (+/−) of the changes in the WRF simulations are consistent. the upper limit (30°C) of BEDD and thus reduces BEDD in these areas.
In particular, projected changes in the Cor-AIs over main producing areas of wheat and cotton are assessed. The ensemble mean of the results of the simulations shows that GSL, BEDD, SU, WSDI, and TR over the main producing area of wheat will increase by 8.7%, 15.6%, 49.8%, 112.1%, and 219.6%, respectively, and FD over this area will decrease by 40.7% (Fig. 6). Over the main producing area of cotton, GSL, SU, WSDI, and TR will increase by 3.9%, 6.5%, 91.0%, and 32.0%, respectively, and BEDD will decrease by 10.4%. There are few frost days in this area during the growing season both in the reference and future period. Prolonged growing seasons, more local heat summation, and less frost damage over the main producing area of wheat may increase crop yields. However, more summer days, warm spells, and tropical nights over the main producing areas of both wheat and cotton will increase the risk of heat stress and the occurrence of pests, which may cause less crop yields.

Usage notes
This dataset is hosted at the National Tibetan Plateau Data Center (tpdc.ac.cn/en/) (Qiu et al., 2022). The files are stored in netCDF4 format and compiled using the Climate and Forecast (CF) conventions. It contains six agroclimatic indicators calculated based on the TG/TX/TN from three WRF simulations (MPI_WRF_COR, CCSM_WRF_COR, and Had_WRF_COR) for a spatial domain covering the CA region and its surrounding areas. The spatial resolution is 0.1° × 0.1°. The dataset covers two continuous 20-year periods, 1986-2005 and 2031-50 (Qiu et al., 2022b). The bias correction with the quantile mapping method is based on the Python module bias_correction 0.4, whose description, installation, and usage are explained at

Discussion and conclusion
A high-resolution projection dataset of agroclimatic indicators (AIs) over Central Asia (CA) is derived based on daily mean/maximum/minimum temperature (TG/TX/TN) from the 9-km-resolution dynamically downscaled results of three different GCMs. The AIs used are growing season length (GSL, d), biologically effective degree days (BEDD, °C ), frost days (FD, day), summer days (SU, d), warming spell duration index (WSDI, day), and tropical nights (TR, day). The reference and future periods are 1986-2005 and 2031-50, respectively.
Model data from Phase 6 of the Coupled Model Intercomparison Project (CMIP6) has been successively released since 2019. Some studies have found that the CMIP6 models bring improvements in simulating the climate in some regions relative to those of CMIP5 (Jiang et al., 2020a;Xin et al., 2020;Dong and Dong, 2021). The dynamical downscaling in this study began in early 2019, and at that point the available model data from CMIP6 that could be used to drive the WRF model was rare. Thus, we chose the CMIP5 models to do the downscaling. The CMIP6 models will be prioritized for the studies in the next stage.
This study assessed projected changes in some key agroclimatic indicators over CA, its climate subregions, and main producing areas of wheat and cotton, to present some Table 2. Regional averages of the agroclimatic indicators calculated based on the bias-corrected model data from three WRF simulations. The ensemble mean (first number) as well as the minimum and maximum ensemble member (in parentheses) is listed. Regional averages of the indicators over the climate subregions in CA are also summarized. They are northern CA (NCA), middle CA (MCA), southern CA (SCA), and the mountainous areas (MT). Their scopes are presented in Fig. 1c of Qiu et al. (2022 preliminary results on the potential impact of climate change on the local agriculture. Further studies with dynamic agroecosystem models, such as crop yield models, are recommended to make a more accurate assessment with consideration of limiting crop production factors (e.g., CO 2 fertilization, soil nutrients, and fertility) and viable adaptation options (e.g., irrigation).
Acknowledgements. This study was supported by the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDA20020201) and the General Project of the National Natural Science Foundation of China (Grant No. 41875134). The work was carried out at National Supercomputer Center in Tianjin, and this research was supported by TianHe Qing-suo Project-special fund project in the field of climate, meteorology, and ocean. The produced dataset is provided by the National Tibetan Plateau Data Center (http://data.tpdc.ac.cn).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.