Performance of daily satellite-based rainfall in groundwater basin of Merapi Aquifer System, Yogyakarta

Evaluation of the performance of daily satellite-based rainfall (CMORPH, CHIRPS, GPM IMERG, and TRMM) was done to obtain applicable satellite rainfall estimates in the groundwater basin of the Merapi Aquifer System (MAS). Performance of satellite data was assessed by applying descriptive statistics, categorical statistics, and bias decomposition on the basis of daily rainfall intensity classification. This classification is possible to measure the performance of daily satellite-based rainfall in much detail. CM (CMORPH) has larger underestimation compared to other satellite-based rainfall assessments. This satellite-based rainfall also mostly has the largest RMSE, while CHR (CHIRPS) has the lowest. CM has a good performance to detect no rain, while IMR (GPM IMERG) has the worst performance. IMR and CHR have a good performance to detect light and moderate rain. Both of them have larger H frequencies and lower MB values compared to other satellite products. CHR mostly has a good performance compared to TR (TRMM), especially on wet periods. CM, IMR, and TR mostly have a good performance on dry periods, while CHR on wet periods. CM mostly has the largest MB and lowest AHB values. CM and CHR have better accuracy to estimate rain amount compared to IMR and TR. All in all, all 4 satellite-based rainfall assessments have large discrepancy compared with rain gauge data along mountain range where orographic rainfall usually occurs in wet periods. Hence, it is recommended to evaluate satellite-based rainfall with time series of streamflow simulation in hydrological modeling framework by merging rain gauge data with more than one satellite-based rainfall than to merge both IMR and TR together.


Introduction
Ground rainfall estimates using rain gauges is a traditional way of the most common methods (Brauer et al. 2016) that is used to correct the bias of satellite rainfall estimates (Ma et al. 2019;Park et al. 2019) for water resources application and related studies. For example, it is the input for hydrological modeling in basin scale (Andersen et al. 2001) and for modeling base flow (Becker and Braun 1999). Bias correction requires ground rainfall estimates to correct satellite rainfall estimates because the finer ground observation can reduce an error in spatial precipitation gradients of satellite rainfall estimates (Zhang and Anagnostou 2019). The optimal and appropriate satellite rainfall estimates should be selected before they are corrected with ground rainfall estimates.
It is necessary to evaluate the performance of satellite rainfall estimates to limit the error and uncertainty of corrected gridded rainfall estimates when it is merged to the ground rainfall estimates. It is because the success of bias correction depends on understanding and quantification of systematic uncertainties inherent to satellite-based rainfall and sensors (Sorooshian et al. 2011). The evaluation of satellite-based rainfall performance will show the consistency and deficiencies of satellite rainfall estimates (Pfeifroth et al. 2015) compared to ground observation (Sorooshian et al. 2011). Ground observation of rain gauge measures rainfall directly, while satellite rainfall products estimate rainfall from visible/infrared (VIS/IR), microwave (MW), and/or radar sensors (Bai and Liu 2018). Satellite observing system also has a complex uncertainty and error because of precipitation estimates from cloud and precipitation parameters (Stephens and Kummerow 2007).
Satellite rainfall estimates of CMORPH (Climate Prediction Center Morphing Methods) are passive microwave (PMW)-based rainfall estimates (Joyce et al. 2010) that accurately derive rainfall estimates compared to infrared (IFR)-based algorithm (Ebert et al. 2007). CMORPH is applicable for diurnal cycle of precipitation (Janowiak et al. 2005) in the mountainous region of Bali compared to other satellite rainfall estimates (Rahmawati 2020), although CMORPH tends to underestimate heavy rain rate in mountainous areas (Derin et al. 2016;Rahmawati 2020;Rahmawati and Lubczynski 2018). This satellite can be used to evaluate its performance in groundwater basin in Merapi Aquifer System since it is more accurate in the tropical island of Bali compared to TRMM (Tropical Rainfall Measuring Mission) and PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks). Satellite rainfall estimates of CHIRPS (Climate Hazards Group Infrared Precipitation with Stations) have the finest spatial resolution (Duan et al. 2016;Funk et al. 2015). The finer spatial resolution of satellite-based rainfall has a probability to catch the shifting time of rainy events of diurnal cycle of precipitation in the tropics (Qian 2008), i.e., Bali Island (Rahmawati 2020). Since CHIRPS has the finest spatial resolution, it is important to assess CHIRPS performance in the tropical groundwater basin of Merapi Aquifer System (MAS). Satellite Rainfall estimates of IMERG (Integrated Multi-satellitE Retrieval for GPM or Global Precipitation Measurement) are also the latest replacement of TRMM mission. It considers bringing together existing satellite rainfall estimates of CMORPH, TRMM, and PERSIANN (Rozante et al. 2018). GPM IMERG has more comprehensive data and accuracy than TRMM (Ahmed et al. 2020;Ma et al. 2020). It is also necessary to evaluate the performance of GPM IMERG in groundwater basin of MAS. Moreover, the performance of TRMM is also possible to be evaluated its performance in MAS as a control point to be able to know the possible improvement of GPM IMERG.
Merapi Aquifer System (MAS) is a groundwater basin in Yogyakarta that abundance with the richness of groundwater storage. This basin has physical border of mountain range that separates Java Island into two parts, northern and southern. There is the line of water body of the Indian Ocean that borders the groundwater basin of MAS in the south part. The border in the east part is Gunung Sewu Mountain Range, while the ancient volcano of Kulon Progo in the west part. This physical border of MAS is leading to the unique character of rainfall pattern and distribution in this area. Sufficient information of precipitation from rain gauge and satellite-based rainfall is necessary to be able to describe this unique character. Therefore, the objective of the research is to evaluate the performance of daily satellite-based rainfall on the basis of 13 rain gauges over 5-year periods (1 January 2008-31 December 2012) applying descriptive statistics, categorical statistics, and bias decomposition at a various class of rainfall intensity.
The novelty of the research is in the first time: (i) validation of satellite rainfall estimates over MAS using 4 satellitebased rainfall assessments, i.e., CMORPH, CHIRPS, GPM IMERG, TRMM; (ii) latest and long periods of validation, i.e., recent years of 2008 to 2012 and over 5 years; (iii) different assessments of rainfall intensity of satellite rainfall estimates applying descriptive statistics, categorical statistics, and bias decomposition.

Study area description
Yogyakarta Special Region is located in the south of Central Java Province, Indonesia. In the north, it is bordered with mountain range that separates Java Island into two parts, northern and southern. Yogyakarta Special Region is situated in the southern part of mountain range with Merapi Volcano as part of this mountain range. The top of Merapi Volcano is the highest place in Yogyakarta Special Region that is 2,925 m.asl based on topography map scale 1:25,000 in 1998 (Fig. 1). Only the south slope of Merapi Volcano is part of Yogyakarta Special Province; others are under the administration of Central Java Province. Most areas of Yogyakarta Special Province are abundant with a richness of groundwater storage because these are included in Merapi Aquifer System. Based on rainfall data from 1 January 2008 to 31 December 2012 within 13 rain gauges stations available in MAS, the maximum daily rainfall is 188.0 mm·day −1 . This rainy event is classified as extreme rain. Most rainfall occurs with the intensity of ≤ 20 mm·day −1 and 20-50 mm·day −1 which are light to moderate rain. Very heavy rainfall or rainfall with the intensity of 100 to 150 mm·day −1 occurs very rarely. This only occurs on a certain day or some days for a year in a certain station. This very heavy rainfall does not only occur in high land areas, but also in lowland areas. It is probably because of the condition of MAS that is surrounded and blocked by mountain range and sea. Heavy rain can occur along the coastline, lee side of mountains, and windward side of mountains (Zhu et al. 2017). Topography and seaside coastal regime play important role in the complexities precipitation mechanism in the tropical region (Kikuchi and Wang 2008) so that heavy rainfall occurs not only in high land areas but also in lowland areas (Kirshbaum and Smith 2009). Extreme rain only occurs twice within 5 years.

Data
The time span to evaluate satellite daily rainfall estimates in Merapi Aquifer System (MAS) is from 1 January 2008 until 31 December 2012 (1,826 days). The data collection is from Serayu Opak River Basin Organization (locally called BBWS Serayu Opak), Yogyakarta. Rainfall data is measured with automatic and non-automatic rain gauges station. The automatic rain gauge measurement is mostly used to validate rainfall estimates. In case the automatic measurement is broken for several months or years, the manual rain data is used for validation. There are 13 daily gauge stations within Merapi Aquifer System that are used to validate satellite rainfall estimates of CMORPH, CHIRPS, GPM IMERG, and TRMM (Fig. 1). These 13 gauges are selected because they have complete data. The type of the gauge is a tipping bucket rain gauge that accumulates a volume in a small bucket corresponding to 0.1 mm (Michaelides et al. 2009). There are also 6 climatology stations (wind gauges) used to analyze daily wind speed in MAS (Fig. 1). ArcGIS is used for processing, analyzing, and finalization of MAS rainfall estimates with the support of Python GUI (IDLE) and R Software.
Satellite data of 8-min CMORPH is resampled on a daily basis for the same period with rain gauge observation data in Yogyakarta Special Region. The 8-min CMORPH (CM) has a 0.07° spatial resolution (roughly 8 km × 8 km at the equator). This finer resolution can capture rainfall dynamic compared to coarser resolution in the diurnal cycle of tropical rainfall in Indonesia (Rahmawati and Lubczynski 2018). CMORPH is merged of infrared (IR) and microwave (MW) applying Lagrangian equation to retrieve rainfall estimates (Joyce et al. 2004). This rainfall product does not use rain gauge data for rainfall retrieval. There is no adjustment with rain gauge data to obtain rainfall estimates of CMORPH. The version used in this paper is purely from satellite sensors. This satellite-only rainfall estimate is called CMORPH version 0.x, while the gauge-adjustment version is called CMORPH version 1.0 (Joyce et al. 2010). It is not only because this version is the best performance in Bali Island (Rahmawati 2020), but also because it shows the improvement performance of a satellite-only product. The poor quality and the lack of spatial representation of the gauges that are used for bias correction are probably the reason satellite-only products have a good performance in rainfall estimates compared to satellite-gauge bias-corrected products (Habib et al. 2014). The data of CMORPH or CMORPH version 0.x is downloaded with the support of ILWIS ISOD Toolbox (Maathuis et al. 2014).
The GPM IMERG satellite data is multi-satellite algorithms from the component of TRMM TMPA, CMORPH-KF, PERSIANN-CCS (Huffman et al. 2019). This is a gridded rainfall estimates product that GPM-CO (GPM Core Observatory) and its partners of satellite precipitation estimates along with geostationary IR sensors to fill the gaps between MW sensors by Lagrangian time interpolation or Lagrangian morphing and monthly precipitation data. GPM-CO uses the most advanced precipitation sensors in space currently. It is the precipitation sensors successors of TRMM (Hou et al. 2014). GPM IMERG (IMR) provides quasi-global satellite rainfall estimates from 60° N to 60°S. It is available in grid of 0.1° spatial resolution (roughly 10 km × 10 km at the equator) and 30-min temporal resolution (O and Kirstetter 2018;Skofronick-Jackson et al. 2018). The datasets are downloaded on a daily basis from 1 January 2008 to 31 December 2012 using this link https:// gpm1. gesdi sc. eosdis. nasa. gov/ data/ GPM_ L3/ GPM_ 3IMER GDF. 06. In this research, we use the final run and current version of GPM IMERG that is version 06. It is because this version has (i) the inclusion of additional sensors specifically of TRMM, (ii) parent GPM product improvement, and (iii) the refinement of morphing components (Freitas et al. 2020).
TRMM TMPA 3B42 version 7 is a bias-corrected satellite rainfall product of TMPA (Guo et al. 2015). It is the merged product of infrared from geosynchronous satellites and passive microwave from low orbit satellites. It covers between coverage of 50° N to 50° S. It is available in gridded of 0.25° spatial resolution (about 27 km × 27 km at the equator) and 3-h temporal resolution. TRMM (TR) and GPM are used to characterize the changes in the earth water cycle, freshwater fluxes, and reservoirs, and to advance the prediction of natural disasters and extreme weather (Skofronick-Jackson et al. 2018). The webpage to obtain a daily temporal resolution of this data is https:// disc2. gesdi sc. eosdis. nasa. gov/ data/ TRMM_ L3/ TRMM_ 3B42_ Daily.7.
CHIRPS satellite data is a quasi-global rainfall data that is available from 50° N to 50° S and 180° E to 180° W coverages (Aksu and Akgül 2020). This dataset is available from 1981 to present (Sacré Regis M. et al. 2020). CHIRPS (CHR) is a high resolution of satellite observation and imagery that provides shorted low latency, high resolution, low bias, and long period of precipitation gridded datasets for drought monitoring and climate change analysis (Funk et al. 2015;Liu et al. 2019). The source of CHIRPS satellite rainfall estimates is from monthly precipitation data, geostationary satellite thermal IR observation, TRMM 3B42, and in situ precipitation observation (Funk et al. 2014). CHIRPS is available for a 0.05° spatial resolution (about 5 km × 5 km at the equator) and daily temporal resolution. The datasets are available in this link https:// data. chc. ucsb. edu/ produ cts/ CHIRPS-2.0/ global_ daily/ tifs. The datasets are downloaded with the same period with ground rainfall estimates from rain gauge.

Methodology
The performance of 4 satellite-based rainfall assessments was assessed applying descriptive statistics, categorical statistics, and bias decomposition adapting Rahmawati and Lubczynski (2018). The modification is performed by the assessment of satellite-based rainfall applying 4 rainfall intensity classifications adapting from BMKG (2010). It is done so that it is to be able to evaluate the performance of satellite rainfall estimates in much detail in each rainfall classification. Most rainfall intensity occurs from light to moderate rain so that rainfall intensity from heavy to extreme is categorized as one classification. The modification of rainfall intensity classification is shown in Table 1. The formula for descriptive statistics consists of mean error (ME) and root mean square error (RMSE) as shown in Eqs. 1 and 2. The assessment was performed for each station in each season division separately. Since satellite-based rainfall is not good performance in transition season in the tropical climate of Bali (Rahmawati 2020), the division of season was based on the following assumptions: (i) wettest months in wet period (January to March or Jan-March), (ii) wet months in wet period (October to December or Oct-Dec), Light rain 20 ≤ rain < 50 Moderate rain ≥ 50 Heavy to extreme rain (iii) dry months in dry period (April to June or Apr-Jun), and (iv) driest months in dry period (July to September or Jul-Sep). The representative and different location of stations will be assessed to evaluate the performance of satellitebased rainfall in much detail. These selected rain gauges stations are shown in Fig. 1.
In Eqs. 1-2, T is total number of daily rainfall events in each rainfall intensity classification from 2008 to 2012 (i.e., T is number of days for 0 mm·day −1 (no rain) for January to March from 2008 to 2012), Rs t is the value of rainfall based on satellite-based rainfall at time series t, and Rg t is the value of rainfall based on rain gauge at time series t.
Categorical statistics and bias decomposition are also classified based on rain intensity. Categorical statistics and bias decomposition were calculated for each 13 available rain gauges. Contingency table for categorical statistics is expressed in Table 2, while the formula for bias decomposition is shown in Eqs. 3-7 adapting from Rahmawati and Lubczynski (2018). The contingency table in Table 2 shows the example of contingency table for zero rain (0 mm·day −1 ) and light rain (0 mm·day −1 < rain < 20 mm·day −1 ), and other rainfall intensity class follows this example. The performance of satellite-based rainfall for categorical statistics in detecting no rain is assessed from CN and FA, while for rainfall intensity > 0 mm·day −1 , i.e., light rain to extreme rain, is assessed from H and M. In bias decomposition, the assessment of no rain detection is based on Rs (or it is equal to TB for no rain since only FB available), while AHB and MB are for rainfall intensity > 0 mm·day −1 (light to extreme rain). ( In Table 2, FA is when satellite-based rainfall detects rainfall and rain gauge detects no rainfall, CN is when both satellite-based rainfall and rain gauge detects no rainfall, H is the daily event when both satellite-based rainfall and rain gauge detects rainfall, and M is when satellite-based rainfall detects no rainfall and rain gauge detects rainfall.
where HB is hits bias, AHB is absolute hits bias, MB is miss bias, FB is false bias, and TB is total bias.

Result
The result of descriptive statistics applying mean error (ME) to validate 4 satellite-based rainfall assessments for 4 intensity classification in MAS is presented in Figs. 2, 3, 4, and 5. The boxplots made for no rain detection for different seasons are shown in Fig. 2. All 4 satellite-based rainfall assessments generally overestimate no rain events in dry and wet periods especially on driest months (Jul-Sep) and wettest months (Jan-March). CM is mostly only slightly overestimates no rain, while CHR highly overestimates no rain compared to others. TR and IMR have comparable performance in detecting no rain. The difference value of ME between those two can be 0 mm·day −1 and mostly below 1.0 mm·day −1 .
The boxplot of ME of 4 satellite-based rainfall assessments against 5 rain gauges to detect light rain is presented on Fig. 3 for 4 seasons separately. The performance of 4 satellite-based rainfall assessments is inconsistent to detect light rain. All 4 satellite-based rainfall assessments can be overestimated and underestimated to detect light rain from gauge observation. The performance of all satellite-based rainfall is generally similar which is no one is superior to one another for all seasons. But, it can be noted that CHR often has lowest underestimation and overestimation on driest months (Jul-Sep) compared to CM, IMR, and TR. Both IMR and TR performances are mostly comparable compared to both CM and CHR.
The ME boxplot for 4 satellite-based rainfall assessments against 5 rain gauges to detect moderate rain is presented in FA is when satellite-based rainfall detects rainfall and rain gauge detects no rainfall, CN is when both satellite-based rainfall and rain gauge detects no rainfall, H is the daily event when both satellitebased rainfall and rain gauge detects rainfall, and M is when satellitebased rainfall detects no rainfall and rain gauge detects rainfall Contingency  Fig. 4 for 4 season divisions separately. The satellite products underestimate moderate rain from rain gauges because rainfall amount from satellite products is lower compared to gauges rainfall amount. CHR mostly has a larger discrepancy compared to rain gauges on wet periods on Jan-March and Oct-Dec, while TR has lower discrepancy on Jan-March and IMR on Oct-Dec. CHR consistently performs underestimation on moderate rain on wet and wettest months. CHR has the largest underestimation of moderate rain, while TR is the lowest. The boxplot made for daily ME of 4 satellite-based rainfall assessments against 5 rain gauges to detect heavy to extreme rain is presented in Fig. 5. CHR is the most underestimate rainfall on the wettest months of Jan-March and on driest months of Jul-Sep, while CM is mostly on Apr-Jun. IMR and TR mostly have comparable performance.
The daily RMSE of 4 satellite-based rainfall assessments against 5 rain gauges to detect no rain is shown in Table 3. From this table, it can be described that CHR has superior Fig. 4 The boxplot of daily mean error (ME) of 4 satellite-based rainfall assessments against 5 selected rain gauges for moderate rain from 1 January 2008 to December 2012 Fig. 5 The boxplot of daily mean error (ME) of 4 satellite-based rainfall assessments against 5 selected rain gauges for heavy to extreme rain from 1 January 2008 to December 2012 performance compared to 3 other satellite products. It mostly has the lowest RMSE in almost all 4 seasons from January to December. TR has the worst performance in detecting no rain on the wettest months (Jan-March), while CM on dry (Apr-Jun) and driest months (Jul-Sep). Both of them mostly have the largest value of RMSE. Only on very few occasions on the driest months (Jul-Sep), CM has lowest discrepancy with rain gauges compared to CHR.
The performance of CHR shows the lowest value of RMSE to detect light rain (Table 4) on wettest months (Jan-March), dry months (Apr-Jun), driest months (Jul-Sep), and wet months (Oct-Dec). The degree of discrepancy of CHR with rain gauge observation is much lower compared to other satellite-based rainfall assessments. The value of RMSE of CHR is very lower compared to the 3 satellite products especially with CM on driest months. CM has the largest RMSE on dry and driest months of April to September. CM and TR have a good performance compared to IMR on Oct-Dec. CM and TR have larger value of RMSE on the wettest months.
The RMSE value of CHR is mostly the lowest to detect moderate rain on Jan-March (Table 5). The largest values of RMSE of CM occur on Apr-Jun. CM is more frequent to have a large RMSE value compared to others on driest months of Jul-Sep. The value of CHR RMSE is the lowest on wet months of Oct-Dec, while the largest is CM. It seems CHR is superior to detect moderate rain on wet and wettest months. IMR and TR have comparable performance; the frequency of superiority performance to detect moderate rain for one another is similar. IMR and TR have better performance on dry periods which are on driest months or on dry months.
The performance of satellite-based rainfall is mostly comparable to detect heavy to extreme rain (Table 6). Mostly, IMR has low RMSE values. CM mostly has better performance in the driest months from July to September. CHR often has large RMSE values, followed by CM. It seems most satellite products have difficulty in estimating heavy to extreme rain because there is no clear indication of which satellite product is inferior to one another in each season division.
The performance of satellite products in detecting no rain (0 mm·day −1 ) is assessed from the frequencies of correct negatives (CN) and false alarm (FA) as in Fig. 6. CM has the best performance in detecting no rain on wettest months (Jan-March), followed by TR and CHR. Mostly more than 90% of no rain events can be detected by CM. IMR is the worst to detect no rain in wettest months. The frequency of CN for IMR is the lowest which also occurs in the wet months of October to December. CM occasionally has a good performance to detect no rain on dry and driest months from April to September, followed by TR. IMR often has difficulty detecting no rain from April to September compared to CHR. The satellite products are more superior to Table 3 The daily root mean square error (RMSE) of 4 satellite-based rainfall assessments against 5 rain gauges for rainfall intensity 0 mm·day −1 (no rain) from 1 January 2008   detect no rain on wet periods (wet and wettest months) than on dry periods (dry and driest months). IMR mostly has the largest hits (H) frequencies to detect light rain on both wet and dry periods, followed by CHR and TR (Fig. 7). More than 88% of light rain events can be detected by IMR on wettest months, while CHR more than 81%. TR occasionally has a good performance compared to CHR on dry or driest months so that TR sometimes has larger H and lower M compared to CHR in these dry periods. CM has the lowest frequency of H for light rain. As a result, CM has the largest number of M. The frequency of M for CM is lower on dry and driest months compared to wet and wettest months, especially on driest months. More than 50% of light rain can be detected by CM on the driest months of July to September.
The spider chart of H and M frequencies in detecting moderate rain for 4 satellite-based rainfall assessments against 5 rain gauges from 1 January 2008 to 31 December 2012 is shown in Fig. 8 for each season separately. The performance of 4 satellite products to detect moderate rain almost similar to light rain. IMR is more often to have the largest H frequency, followed by CHR and TR. More than 90% of light rain events can be detected by IMR on wettest months, while CHR more than 84%. But occasionally, the performance of IMR, CHR, or TR is comparable on dry or driest months. The frequency of H for CM is mostly the lowest in all seasons.
CM is not a good performance to detect heavy to extreme rain (Fig. 9). It has the lowest frequency of H and largest frequency of M. IMR has the largest frequency number of H to detect heavy to extreme rain, followed by CHR. TR is a better performance than CM. It has a lower frequency of M than CM.
The spatial maps of Rs or TB in detecting no rain (0 mm·day −1 ) for 4 satellite-based rainfall assessments against 5 rain gauges from 1 January 2008 to 31 December 2012 are shown in Fig. 10. CM has the lowest Rs or TB on the wettest months of January to March, while CHR is the largest. IMR has a better performance compared to TR. CM mostly has the lowest TB on driest and wet months. It means CM has the largest accuracy to detect no rain. IMR and CHR have larger TB compared to TR in dry months of April to June. The station located near a mountain range or mountain rise, i.e., Tanjungtirto and Kemput, has larger TB values compared to other stations. It is because satellite products falsely detect rainy events leading to large value of TB. Mostly, only 10-20% of no rain can be detected by satellite products on wet periods, except for CM. It is probably the wind flow from south to north that makes satellite products falsely detect rainy events. The air mass flows from the sea sweep away to the direction of mountain peak of volcano to create orographic precipitation. Particularly, the wind speed is mostly slower in wet periods than dry periods. The wind speed is approximately 80 km/day (~ 1.8 Table 6 The daily root mean square error (RMSE) of 4 satellite-based rainfall assessments against 5 rain gauges for heavy to extreme rain from 1 January 2008 to December 2012 knots/day) near mountain rise in wet periods, while in driest months of July to September 100 km/day (2.25 knots/ day). The spatial map radar chart of bias decomposition component for MB and AHB in detecting light rain for 4 satellite-based rainfall assessments against 5 rain gauges from 1 January 2008 to 31 December 2012 is shown in Fig. 10. IMR generally has the lowest MB and CM has the largest MB in detecting light rain on wet and dry periods. The MB values of CM are much larger compared with MB values of CHR in wet periods and TR in dry periods. CHR has lower MB compared to TR on wet and wettest periods. It is an opposite condition on dry and driest months; TR has lower MB compared to CHR. All satellite products have large values of AHB on Jan-March (wettest months) and Oct-Dec (wet months). All of them have large values of AHB near the mountain range. AHB of CHR has the lowest values on wet and wettest months indicating this satellitebased rainfall is a good performance to detect rain amount on wet periods. IMR mostly has the largest values of AHB on all seasons on wet and dry periods. IMR is not a good performance to detect rain amount of light rain.
IMR and CHR mostly have comparable performance to detect moderate rain compared to CM and TR on wettest and dry months (Fig. 10). Both of them have low values of MB. CM is poor performance than TR. IMR has a good performance in wet months of October to December, followed by CHR and TR. CM has the largest MB values in these months. IMR has low values of MB because it has the lowest M frequency to detect rainfall intensity more than 0 mm·day −1 . But IMR has the largest value of AHB indicating this satellite product is not a good performance to detect rain amount. CM generally has the lowest AHB in all periods, followed by CHR. For example, in Bronggang Station, the accuracy of IMR to detect rain amount is 1/3 of CM, while 1/2 of CHR. However, all satellitebased rainfall assessments have the largest values of AHB in Bronggang station near the top of Merapi Volcano. It is Fig. 6 The radar chart of (FA and CN) of 4 satellite-based rainfall assessments against 5 selected rain gauges for rainfall intensity 0 mm·day −1 (no rain) from 1 January 2008 to December 2012. The scale of radar chart is similar only for each season depending on minimum and maximum frequency values in each season probably the orographic events surrounding this mountain that make satellite products have difficulty estimating rain amount.
CHR is not a good performance to estimate heavy to extreme rain (Fig. 10). It mostly has the largest AHB values. The second place that has large AHB values is IMR or TR. Usually, IMR is a good performance compared to TR on driest months of Jul-Sep, while TR on dry months of Apr-Jun. On high altitudes of rain gauge stations, such as Bronggang, Kemput, and Tanjungtirto, CHR has better accuracy to detect heavy to extreme rain on wet periods compared to IMR and TR. This satellite product has lower AHB values than IMR and TR in wet periods. CM has the largest accuracy to detect rain amount of heavy to extreme rain. The value of AHB for CM is the lowest. In contrast with AHB values, MB of CM is the largest. IMR mostly has the lowest MB values and CHR has better accuracy than TR to estimate heavy to extreme rain. CM has the largest frequency of M so that the value of MB will be large although CM is a good accuracy to estimate rain amount. The second place to have a good accuracy to estimate rain amount is CHR.

Discussion and conclusion
A sparse density of rain gauge is a challenging task to estimate strong spatial-temporal variability of precipitation in the tropics (Rahmawati and Lubczynski 2018). Accurate and reliable satellite-based rainfall is necessary to be merged with gauge-based rainfall to catch the diurnal cycle of precipitation in the tropics (Rahmawati 2020). Before satellite-based rainfall is merged to rain gauge observation, it is important to evaluate the performance of satellite-based rainfall. One of the ways to evaluate the performance of satellite-based rainfall is by the direct comparison of satellitebased rainfall against gauge observations or point-to-point comparison as applied in MAS (Bai and Liu 2018;Belay et al. 2019;Chen et al. 2016;Luo et al. 2019). There are several considerations to evaluate the performance of satellite-based rainfall: (i) the detailed knowledge of rainfall The detailed knowledge of rainfall characteristics, i.e., rainfall intensity, is essential to evaluate remotely sensed rainfall estimates (Mandapaka and Qin 2013). It is important to provide nature and characteristics of rainfall and better prediction of hydrologic response in watersheds and urban areas (Chen et al. 2016) since rainfall exhibit different mean value and variability at daily time series (Choubin et al. 2019). The error estimation of satellite-based rainfall in frequencies and intensities of daily precipitation influences the simulation result of surface water, sub-surface water, evapotranspiration, and different amounts and proportion of simulated water balance components ). The daily satellite-based rainfall (CM, IMR, TR, and CHR) assessments mostly overestimate rainfall intensity ≤ 20 mm·day −1 . It is because satellite estimation from infrared and microwave leads to an overestimation of rainfall amount due to ignorance of the surface altitude in the algorithm. Evaporation of rainfall below the cloud base in complex terrain is not counted because rainfall retrieval is from brightness temperature at cloud top (Scheel et al. 2011). The satellite-based rainfall overestimates rainfall in MAS because of the possibility of cloud microphysical, rain processes, and moisture distribution in the environment (McCollum et al. 2000). The daily satellite-based rainfall underestimates rainfall intensity ≥ 50 mm·day −1 . The range of underestimation is much higher especially for extreme rainfall (≥ 150 mm·day −1 ). It is probably satellite-based rainfall has a tendency to underestimates convective rain amounts (Bell and Kundu 2003). In convective system, infrared estimates rainfall from cirrus cloud that does not produce any precipitation (Scheel et al. 2011).
A multitude of techniques are developed and available for the estimation and retrieval of rainfall for satellite sensors (Kidd and McGregor 2007). The algorithm of satellitebased rainfall is mainly based on microwave and infrared sensors together with ground-based data (radar or rain gauge) or multiple sensors (Freitas et al. 2020;Kidd and Levizzani 2011;Tapiador et al. 2012). The nature of the Fig. 8 The radar chart of (H and M) of 4 satellite-based rainfall assessments against 5 selected rain gauges for moderate rain from 1 January 2008 to December 2012. The scale of radar chart is simi-lar only for each season depending on minimum and maximum frequency values in each season error can change with the update of retrieval algorithm and the change of data source leading to different performances of satellite-based rainfall for different regions, seasons, and precipitation types, i.e., rainfall intensity (Ebert et al. 2007;Guo et al. 2015). CHIRPS (CHR) is best performed compared to other satellite-based rainfall assessments. This satellite product mostly shows the lowest RMSE especially in rainfall intensity < 50 mm·day −1 . CHR has the best performance because apparently, the infrared precipitation determines to effectively represent some of the systematic climate effects of complex terrain (Funk et al. 2014). TR and IMR have almost similar performances. The value of rainfall amount occasionally is similar or comparable. It is because the new satellite sensors and algorithm from TR to IMR lead to mixed performance regarding various rainfall intensities (He et al. 2017).
The different performances observed for different rainfall intensities could be attributed to the intrinsic features of satellite-based rainfall, i.e., TR and IMR, produced by algorithmic and sensors discrepancy (He et al. 2017). CM has the best performance in detecting no rain. CM has the most frequent hits (H) for no rain that is why this satellite product has the lowest overestimation toward rain gauge data. Mostly, all satellite-based rainfall assessments perform worse in estimating rainfall intensity ≥ 50 mm·day −1 . CHR also performs worse in detecting heavy to extreme rain. It can be seen from the larger values of RMSE compared to others. It is in agreement with Funk et al. (2015). IMR generally has a good performance compared with TR. IMR advanced satellite sensors leading the IMR to detect the occurrences of light rain and extreme heavy rain ranks better than TR does, although IMR tends to produce significant overestimation of the amounts of extreme rain events (He et al. 2017).
The performance of satellite products is different for dry and wet periods. The performance of satellite-based rainfall is good in wet periods (wet and wettest months) and worse in dry periods (dry and driest months) such as CHR. It is Fig. 9 The radar chart of (H and M) of 4 satellite-based rainfall assessments against 5 selected rain gauges for heavy to extreme rain from 1 January 2008 to December 2012. The scale of the radar chart is similar only for each season depending on minimum and maximum frequency values in each season probably the rain rate variability especially during dry seasons is likely to be much stronger and tend to have strong diurnal modulations for satellite algorithm (Bell and Kundu 2003). It is also because of limited ability to differentiate drizzle or frozen rainfall during the later period . CM and TR mostly have a good performance in dry periods. It is because of the inability of the technique to retrieve precipitation during wet season, either through greater low-intensity precipitation or because of cold surface background affecting PMW retrievals .
Satellite-based rainfall is able to detect spatial and temporal variability of rainfall at finer resolution (Chen et al. 2016;Rahmawati 2020). The resolution and time step of spatial satellite-based rainfall influence the accuracy and outcome of precipitation-based analysis (Gupta et al. 2020), i.e., the lowest RMSE values of CHR. However, it is not fully in agreement based on this research in MAS. The advanced algorithm of satellite-based rainfall also gives an effect on the accuracy to estimate rainfall, i.e., lowest MB values of IMR. CHR mostly is in the second place to have low values of MB. The traditional algorithm of satellite-based rainfall that does not use rain gauge data to estimate rainfall also influences the accuracy of rainfall estimates, i.e., the largest H frequency of CM for no rain detection and lowest values of AHB of CM. It is because rain gauge data that is used for bias correction or bias adjustment for satellite rainfall estimates do not represent the study area.
Eastern and western MAS are generally treated as climatically separate units that influence modulation precipitation in the groundwater basin of MAS. It is perhaps one of the reasons satellite-based rainfall is not a good performance to detect certain rainfall amounts in MAS. All 4 satellite-based rainfall assessments are also poor in estimating moderate to extreme rain, although CHR mostly shows better performance compared to others. All satellite products mostly have large values of AHB along the mountain range. The rain may not fall with equal possibilities at different times of the day at different points such as near the coast and near mountain range (Bell and Kundu 2003). PMW sounding instruments are relatively insensitive to Fig. 10 The spatial maps of radar chart of (Rs or TB) of 4 satellitebased rainfall assessments for no rain and the spatial maps of radar chart of (MB and AHB) of 4 satellite-based rainfall assessments for light to extreme rain against 5 selected rain gauges from 1 January 2008 to December 2012 surface emissions and therefore are essentially immune to cold background issues ) such as along mountain range or hills. Therefore, satellite-based rainfall has low accuracy to detect rainy events along mountain range or hills in MAS. It shows from large values of MB and AHB along hills and mountain ranges.
It is recommended to validate 4 satellite-based rainfall assessments based on streamflow simulation in the hydrological modeling framework before it applies for water resources application and related studies. It is because point-to-point comparison or simple interpolation can create bias (Chen et al. 2016) and the error of satellite-based rainfall can show in time series streamflow simulation (Bai and Liu 2018). The satellite product can be corrected with rain gauge data and/or merged with other satellite products to obtain the advantages of each satellite product. It is better not to merge IMR and TR because both of them often have comparable performance and IMR is an improvement algorithm of TR. It is also possible to use certain satellite product with good performance to detect no rain so that it is good for drought application such as CM in dry periods and CHR in wet periods or a combination of both. It is because CM has a good performance to detect no rain and mostly has lowest values of AHB from light to extreme rain, while CHR has a good performance to estimate light to moderate rain. CHR mostly has the lowest RMSE values, low values of AHB (after CM), and low values of MB (after IMR). CHR mostly has better performance compared to others.