Bias Adjustment of Satellite Precipitation Estimation Using Ground-Based Observation: Mei-Yu Front Case Studies in Taiwan

The Global Satellite Mapping of Precipitation (GSMaP) was used to estimate the accumulated rainfall in May from the Mei-Yu front in Taiwan. Rainfall estimation from GSMaP during 2002–2017 were evaluated using more than 400 local gauge observations, collected from the Taiwan Central Weather Bureau (CWB). Studies have demonstrated that the GSMaP rainfall estimation estimates can be biased, depending on the target region, elevation, and season. In this experiment, we have evaluated GSMaP over three elevation ranges. The GSMaP systemic errors for each elevation range were identified and corrected using regression analysis. The results indicated that GSMaP estimation can be improved significantly through adjustment over three elevation ranges (elevation less than 50 m, elevation of 50–100 m, and elevation higher than 100 m). For these three elevation ranges, the correlation coefficient between the GSMaP estimations and CWB rainfall data was 0.76, 0.78, and 0.59, respectively. This indicated that the GSMaP estimation was more accurate for low-elevation regions than high-elevation regions. After the proposed approaches were employed to correct the errors, the bias errors were respectively improved by 5.64(13.7%), 7.33(38.4%) and 10.52(31.2%) mm for low-, mid- and high-elevation regions. This study demonstrated that the local correction approaches can be used to improve GSMaP estimation of Mei-Yu rainfall in Taiwan.


Introduction
The Mei-Yu front is the main source of rainfall in Taiwan in May. Because of continuous rainfall during the Mei-Yu season, the soil water content is high, which causes natural disasters such as rockfalls, landslides, mudflows, and floods in several low-lying areas. Moreover, when the rain is heavy, traffic accidents, including car, shipping, and aviation accidents, are more frequent because of low visibility. Nonetheless, Mei-Yu rain is a major source of water in Taiwan and is essential for water resources management, especially for agriculture and the domestic water supply.
Satellite observation is not affected by obstacles (e.g. a mountain) and is clearly useful for estimating rainfall in regions with various landforms. Satellite-derived precipitation is classified into two categories by data resource. One is infrared/ visible observation form geosynchronous satellites (D'Souza et al. 1990 ;Vicente et al. 1998;Porcu et al. 1999;Delgado et al. 2008), the other one is the passive microwave observation from low Earth orbiting satellites or from Polar orbiting satellites (Petty 1994;Ferraro and Marks 1995;Ferraro et al. 1996;Yeh et al. 2015). Different sensors have different characteristics and advantages. For instance, the infrared data collected by geosynchronous satellites have high temporal resolution, but the passive microwave data obtained by polar-or low-orbiting satellites enable more accurate rainfall estimation than do infrared data. The introduction and comparison of different channels precipitation products can be found from Adler et al. (2001) and Levizzani et al. (2002). Some studies represented retrieval algorithms of rainfall data by analyzing the data obtained from both infrared and passive microwave channels (Todd et al. 2001;Chen and Li 2002;Liu et al. 2002;Kidd et al. 2003;Joyce et al. 2004;Huffman et al. 2007). The Global Satellite Mapping of Precipitation (GSMaP) rainfall estimation used in this study is the combination of infrared and passive microwave data. Hence, the GSMaP product was able to utilize the advantages of these distinct channels, increasing the feasibility of rainfall retrieval (Kubota et al. 2007;Seto et al. 2009;Ushio et al. 2009;Aonashi et al. 2009;Shige 2015, 2017).
Numerous papers report that heavy rainfall during Mei-Yu seasons is closely related to the topographic complexity of Taiwan (Chen 1994;Chen 2007), with studies having employed numerical analysis to determine topographic influences on rainfall (Chu and Lin 2000;Chen and Lin 2005;Chen et al. 2008). Accordingly, the topographic complexity in Taiwan is likely to affect the rainfall estimation of the Global Satellite Mapping of Precipitation (GSMaP). This is exactly indicated that certain adjustments were necessary when using the GSMaP product.
The biases and random errors of satellites-retrieval rainfall product have significant differences in terms of location, season, terrain and atmospheric conditions (Dinku et al. 2011;Sorooshian et al. 2011). Therefore, many studies have been conducted to validate satellite-derived rainfall estimations at different scales and regimes. Such as Africa (Dinku et al. 2007;Hughes 2006;Thiemig et al. 2012;Zhou et al. 2014), Indonesia (Vernimmen et al. 2012), U.S. (AghaKouchak et al. 2011Zhou et al. 2014), Australia (Zhou et al. 2014), Bangladesh (Rahman et al. 2012;Islam 2018). There is very limited reference related to the validation and correction of satellite-derived precipitation products over Taiwan, which is the main focus of this study.
There are many methods to reduce the bias of satellitederived rainfall estimation by the additive or multiplicative bias against reference data (i.e., rain gauge observations). For example, using the differences between gauges and the satellite-derived rainfall, and consider the inverse distance weighting to reduce the bias (Boushaki et al. 2009). A method of reducing bias by blending gauge data with multiple satellite-derived rainfall (Lin and Wang 2011). Remove rainfall estimation bias by using satellite-derived, radar, and gauge rainfall products (Tesfagiorgis et al. 2011). A probabilitybased bias adjustment approach for satellite-derived rainfall using a qxuantile mapping technique. (Yang et al. 2016).
This study estimated the intensity of rainfall from the Mei-Yu front in Taiwan by using satellite data. The aforementioned studies have indicated that researchers should use different GSMaP retrieval algorithms according to the target season, region, and elevation when making rainfall estimates. Such considerations enable more accurate estimation of rainfall intensity, which was the aim of the present study. Such estimates would enable the authorities to inform people of the possible rainfall intensity during Mei-Yu seasons. Furthermore, governmental units (e.g. disaster prevention and response teams and reservoir administrations) would be able to prepare for heavy rains in advance and thus prevent disasters.

Data Collection and Analysis
Data were obtained from two sources, namely (1) the GSMaP and (2) Taiwan Central Weather Bureau (CWB) rain gauge observations. We collected 16 years (2002-2017) data on the Mei-Yu front in May-June each year. The rainfall for each year met the standard for heavy rain (Greater than or equal to 50 mm in the past 24 h) specified by the World Meteorological Organization (WMO). A total of 22 front data were collected, the dates of which are presented in Table 1.

GSMaP
The GSMaP was developed by the Japan Science and Technology Agency. This product combines microwave and infrared techniques and covers the area between the latitudes of 60°North and 60°South. The temporal resolution is one recording per hour, with a spatial resolution of latitude of 0.1°. Currently, GSMaP data are provided by the Japan Aerospace Exploration Agency. The GSMaP product version we analyzed from 2002 to 2014 and from 2015 to 2017 is RNL (Reanalysis Ver.) and NRT. (Near-Real-Time Ver.) respectively.
The GSMaP RR data were retrieved from the microwave imager and sounder. The retrieval algorithm for the imager was based on the method proposed by Aonashi and Liu (2000); that for the sounder was based on the method of Shige et al. (2009), who incorporated features of the emission-based estimate from T B at 23 GHz and a scattering-based estimate from T B at 89 GHz. The algorithm is suitable for estimating the rainfall intensity at the sea surface.

Data from the CWB Observation Stations
The CWB observation stations situated over Taiwan (temporal resolution = 1 h; spatial resolution = station-dependent). A total of 425 stations were involved in data collection. The station data contained weather variables of all types. We selected the following variables: longitude-latitude, hourly rainfall, and elevation. The lowest-lying station had an elevation of 4 m, whereas the highest-lying station had an elevation of 3402 m. The longitude-latitude and elevation of each station are depicted in Fig. 1. All rainfall data were subject to CWB quality control to ensure that rainfall observations are correct.

Korean Meteorological Society
Before conducting analysis on the GSMaP products and CWB-data accumulated rainfall, we had to identify the differences between the time at which GSMaP estimates were calculated and CWB rainfall data were collected. The GSMaP defines the rainfall that accumulates during a certain hour to be the rain that accumulates from minute 00 to minute 59. The CWB defines the rainfall accumulates during a certain hour as the rain that accumulates from minute 00 to minute 59 in the hour before the hour in question.
Therefore, comparing GSMaP and CWB rainfall data must ensure that the data from the two sources were obtained from the same time intervals. Specifically, when the CWB rainfall was selected for hour N, the GSMaP rainfall for hour N + 1 was compared. The GSMaP presented data on a grid map based on 0.1°units of latitude and longitude. The CWB data, however, were in-situ data. Accordingly, we compared the two rainfall data by interpolating the grid point data of GSMaP to the position of CWB stations. In this study, we evaluated daily totals of rainfall by accumulating hourly estimates from GSMaP rainfall.

Methods
Previous studies have shown that complex terrain does affect the amount of satellite-derived rainfall. Because two-thirds of the land is mountains and hills in Taiwan, and the overall slope is more than 30 degrees on average. This study has tested different correction methods. In addition to classification of elevation, the satellite-derived rainfall estimation was corrected by using the weighted nearest neighbor, average within a certain domain, inverse distance weighting and so on. The method to get the best results is as follows. We divided the actual data obtained by the Central Weather Bureau (CWB) of Taiwan and the estimation data obtained using the GSMaP into groups on the basis of elevation. Next, the data were compared and analyzed to identify systemic errors in the GSMaP estimations. We then conducted statistics and regression analysis to correct the errors, ensuring that the GSMaP estimates approximated the actual rain rates (RRs).

Classification of Elevation
The GSMaP and CWB rainfall data during the Mei-Yu season were displayed on diagrams to explore the relationship between the GSMaP and CWB data. For example, on case 2, the rainfall estimated by the GSMaP (Fig. 2b) was lower than that observed by CWB observation stations (Fig. 2a). This difference was possibly attributed to the topographic complexity of Taiwan. The result of underestimation of rainfall estimation by satellites is similar to the findings of Aonashi et al.  (2009) and Takido et al. (2016). Subsequently, using the elevation categorizations (Table 2), we investigated the relationship between topography and rainfall for the 22 selected datasets. It was found that GSMaP products are compared with observation of stations below 100 m, and the correlation coefficient exceeds 0.75. As the height of the observation station increases, the correlation coefficient begins to decrease. The correlation coefficient is less than 0.6 when the height of observation station is exceeding 100 m. In other words, the accuracy of rainfall estimation by satellites is inversely proportional to the terrain height. The result of relationship between accuracy and terrain height is consistent with Takido et al. (2016). In order to find the rainfall estimations in Taiwan during Spring, qualitative and quantitative analyses were conducted on the 22 sets of data for the Mei-Yu front in May-June each year from 2002 to 2017 (16 years in total). The definition of heavy rain according to the CWB was met for each set of data. Our objective was to examine and compare the characteristics of the GSMaP and CWB data. Therefore, to provide the most suitable elevation categorization for GSMaP estimation in Taiwan, we divided the data into elevation categories using four types of categorization. We selected 22 sets of data (cases in Table 1) comprising 9350 samples to determine which of the four categorizations was most suitable for describing Taiwan's landmass. The details of the elevation categorizations are listed in Table 2.

Confirmation and Correction of Systemic Errors
The four categorization types were then separately used to calculate the coefficients of correlation between the CWB and GSMaP accumulated rainfall data. According to the most adequate categorization of elevation levels determined above, we compared the rainfall data estimated by the GSMaP with the actual rainfall data collected by the CWB stations. Accordingly, the systemic errors in the four types of elevation categorization were identified, and a regression equation was determined by fitting the curve in the GSMaP-CWB scatter plot.
Compare the average correlation coefficients between four categorization types. The average correlation coefficient of Categorization B is greater than 0.7, which is better than the other three categorization types. In addition, the rainfall distribution of higher than 100 m is similar, and the number of stations higher than 100 m is limited. For the above two reasons, we have included elevation of more than 100 m in the same group. Therefore, the Categorization B is considered to be the most suitable categorization for GSMaP application in Taiwan. In this paper, we only discuss the data with what was  . Finally, the cross-validation for all cases was categorized according to the most suitable categorization scheme to correct the errors in the GSMaP data. Furthermore, regression analysis was performed on these two datasets to revise the GSMaP algorithm. In this way, the accuracy of the RR distribution obtained using the GSMaP was improved.

Identifying the most Suitable Categorization of Elevation
In Categorization B, the data were divided into categories of elevation of 50 m and lower, 50-100 m, and 100 m and higher. Among the 425 stations, 113 stations had elevations of 50 m   (Fig. 3).
We compared the GSMaP and CWB accumulated rainfall scatter plots for the stations in the three elevation groups according to Categorization B. Linear regression analysis was then performed to plot the trend lines, which are presented in Fig. 4. The x-axis and y-axis represent the CWB and GSMaP accumulated rainfall (mm). Each red dot corresponds to an individual rainfall datum; the green lines depict the line corresponding to x = y (i.e. perfect correlation); and the blue lines denote the linear regression trend lines obtained. The equation for each elevation group was the corrected regression equation for the GSMaP accumulated rainfall.
Most of the red dots in Fig. 4 are distributed to the left of green line, that is, GSMaP rainfall estimation is mostly underestimated, and the underestimation of GSMaP rainfall is similar to the result of Taniguchi et al. (2013). This is because the moist air of the front runs toward the mountains and the humid air rises causing rainfall (topographic effect). The GSMaP is a satellite-derived rainfall product, and satellite observations cannot take into account topographic effect. However, topographic effect does have a significant impact on satellite-derived rainfall product, and about two-thirds of Taiwan's land is mountains and hills. Therefore, when using GSMaP to estimate rainfall intensity in Taiwan, it is important to consider the topographic effect, which is one of the contributions of this study.

Probability Distribution of the Difference between the GSMaP and CWB Accumulated Rainfall and Error Analysis of the Average Rainfall
We linearly interpolated the GSMaP data (of daily accumulated rainfall) into the data obtained from 425 CWB stations according to Categorization B. The CWB accumulated rainfall was then subtracted from the GSMaP accumulated rainfall, resulting in the probability distribution displayed in Fig. 5 regarding estimation differences. The y-axis represents the probability (%), and the x-axis denotes the difference in rainfall (mm). The figure indicates that the GSMaP rainfall values were generally lower than the CWB values after the data were divided into the three elevation groups. The average rainfall error for each elevation group is displayed in Table 3.

Average Error before and after the Correction
To test the feasibility of the proposed correction to rainfall estimation, this study referred to cross-validation, adopted Categorization B, and employed two approaches to correct the GSMaP accumulated rainfall: the linear regression correction, and the error correction of the average rainfall. The bias before the correction were compared with those after the correction. When the post-correction accumulated rainfall was closer to the actual rainfall than the pre-correction rainfall, the correction approach employed was considered effective.
For all cases, the rainfall estimated by the GSMaP was lower than that of observed by the CWB stations. We then divided the GSMaP data into three elevation levels according to Categorization B. The GSMaP errors were corrected by substituting the rainfall value into the corresponding regression equation and by subtracting the average rainfall errors (Table 3) to get the new rainfall value. The post-correction GSMaP data in the three elevation levels were considerably  Korean Meteorological Society improved, and the example as Fig. 6. For the bias correction, the GSMaP data of the high-elevation group were the most improved, followed by the data of the mid-elevation and then the low-elevation group. The estimation error of the GSMaP regarding rainfall intensity increased as the elevation increased, which confirms that topography is highly related to rainfall intensity in Taiwan. We assumed that the GSMaP could not detect the elevation of the target region during estimation; hence, the methods proposed in this study demonstrated greater improvement in the rainfall estimation of the high-elevation group. After corrections were made using the regression equations and by employing the values in Table 3, the bias was reduced by 5.64 (13.7%), 7.33 (38.4%) and 10.52(31.2%) mm for low-, mid-and high-elevation regions, respectively. The bias was both greatly reduced after correction was performed using the two approaches. Therefore, the correction methods proposed in this study effectively reduce errors in GSMaP estimations of accumulated rainfall in Taiwan.

Conclusion
This study estimated the rainfall intensity in Taiwan during the Mei-Yu season by using the GSMaP product. The results demonstrated that the elevation of target regions strongly affected the GSMaP rainfall estimations for Taiwan. Regarding elevation level, the most effective categorization of the rainfall data was dividing them into the following three groups: rainfall in regions with an elevation of less than 50 m; that in regions with an elevation of 50-100 m; and that in regions with an elevation of more than 100 m. We then compared the GSMaP rainfall and the CWB rainfall within those three groups. The correlation coefficient between the data for the low-elevation group (elevation <100 m) was 0.77, and that for the highelevation group (elevation >100 m) was 0.59. The results indicated that the GSMaP makes more accurate estimations for low-elevation regions than for high-elevation regions. The correlation coefficient between the GSMaP and CWB data decreased as the elevation increased. Subsequently, we conducted error analyses on the data categorized by elevation. The GSMaP underestimated rainfall in regions with low, middle, and high elevation by 5.78, 7.51, and 10.71 mm, respectively. As the elevation increased, the errors also increased. Moreover, by using 8925 rainfall samples (21 Mei-Yu cases), we constructed a regression equation for correcting GSMaP estimations (1 Mei-Yu case) for regions with different elevations in Taiwan.
To verify that the proposed error correction methods (i.e., the average rainfall error and regression equation correction approaches) can effectively improve GSMaP rainfall estimation in Taiwan during the Mei-Yu season, this study analyzed cross-validation and examined the corrected errors. The correction using the regression equation slightly outperformed that using the average rainfall errors. Specifically, the regression approach reduced bias by 7.8 mm (27.7%) on average. The results verified that the proposed correction approaches were effective in improving GSMaP rainfall estimation during the Mei-Yu season in Taiwan.