1 Introduction

Heavy rainfall nowcasting (usually defined as 0–2-h forecasting; World Meteorological Organization 2019) is an important and challenging task aimed at providing early warning of quick water-related disasters, such as flash floods and landslides. Major challenges lie in not only the demand for meteorological observations and simulations at high spatial and temporal resolution but also the limited predictability of complex precipitation processes (Morrison et al. 2020). Therefore, researchers have sought for decades to advance precipitation nowcasting methods, incorporating high-resolution Doppler weather radar observations with sophisticated numerical models. The most popular methods have been radar echo extrapolation (e.g., Dixon and Wiener 1993; Li et al. 1995; Johnson et al. 1998; Germann and Zawadzki 2002; Bowler et al. 2004; Ruzanski et al. 2011) and numerical weather prediction (NWP) with radar data assimilation (e.g., Sun 2005; Sugimoto et al. 2009; Tsai et al. 2014; Johnson et al. 2015; Bachmann et al. 2019; Wu et al. 2020). The former is reliable within a very short forecast lead time before existing convective cells dissipate; the latter gains the upper hand for longer lead times after the NWP model spins up to predict new cells. The crossover lead time when the latter overtakes the former was found to vary with different precipitation events from 1.5 to more than 4 h (Nerini et al. 2019). In order to combine the advantages of both methods, a variety of blending schemes using weight functions of the forecast lead time have been developed (e.g., Golding 1998; Wang et al. 2015; Hwang et al. 2015; Poletti et al. 2019; Lin et al. 2020). It is mentionable that machine learning algorithms, which work in quite different ways from numerical modeling methods, have recently been applied to precipitation nowcasting (e.g., Wang et al. 2020; Mao and Sorteberg 2020; Cuomo and Chandrasekar 2021; Ravuri et al. 2021) although machine learning is outside the scope of this study.

Among the aforementioned blending studies, Lin et al. (2020) (hereafter denoted by L20) proposed the extrapolation adjusted by model prediction (ExAMP) blending scheme, the concept of which is to have full trust in the field pattern predicted by extrapolation but allow the field intensity to be adjusted by NWP model prediction. This concept is appropriate only within the crossover lead time but likely to outperform pure extrapolation, for intensity variations of existing cells are empirically better predicted by the NWP model assimilating radar data than positions of new cells during the spin-up period. L20 concluded that, from the overall statistics for multiple heavy rainfall events, ExAMP had greater skill in 150-min reflectivity nowcasting than all of the following methods: extrapolation, NWP with radar data assimilation, the linear cross dissolve blending scheme, and the salient cross dissolve (Sal CD; Hwang et al. 2015) blending scheme. In this study, rainfall rate on the ground for the same events is to be predicted instead of radar-observed reflectivity in the air predicted in L20. For this purpose, quantitative precipitation estimation (QPE; e.g., Marshall and Palmer 1948; Seliga and Bringi 1976; Sachidananda and Zrnić 1987; Zhang et al. 2001) algorithms, which statistically relate rainfall rate to radar variables, can be incorporated with the extrapolation procedure. There are two opposite strategies to do so: (a) estimating rainfall rate from extrapolated reflectivity; (b) extrapolating the current rainfall rate estimated from currently available radar observations. The first strategy is necessary for reflectivity extrapolation systems and confined to using a Z-R relationship for QPE (Austin 1987; Atlas et al. 1990; Xin et al. 1997); the second strategy, by contrast, allows using different QPE relationships including more accurate ones related to dual-polarimetric radar variables (Brandes et al. 2003; Giangrande and Ryzhkov 2008; Wang and Chandrasekar 2010). The contrast leads to an assumption that the second strategy can surpass the first in rainfall nowcasting if dual-polarimetric radar observations are currently available. The assumption is to be verified in the first part of this study.

Subsequently, the ExAMP scheme is used to blend the rainfall rates predicted by the superior strategy of extrapolation and the NWP model assimilating radar data, and the skill is statistically evaluated. Also evaluated are extrapolation, NWP with radar data assimilation, and two sensitivity experiments implementing different strategies of ExAMP. One sensitivity experiment blends the predicted reflectivities as in L20 and then uses a Z-R relationship for QPE; the other blends the predicted rainfall rates but removes the empirical limitation on the innovation of intensity in L20. The overall statistics are evaluated to decide the best strategy for rainfall nowcasting. Moreover, the spatial performance for two contrasting heavy rainfall events is carefully examined, and the statistical performance at rainfall strengthening locations and weakening locations for all the events is investigated to know whether ExAMP generally outperforms extrapolation regardless of the biases of extrapolation.

On the whole, there are three objectives in this study: (a) to compare the effects of reflectivity extrapolation and rainfall rate extrapolation as the latter is feasible for dual-polarimetric radar QPE; (b) to verify the skill of ExAMP in the aspect of rainfall nowcasting instead of reflectivity nowcasting and optimize its strategy; (c) to compare the effectiveness of ExAMP in rainfall strengthening areas and weakening areas and discuss the reasons. In the next section, the extrapolation and NWP models in use as well as the QPE and ExAMP schemes are elaborated. A paired t-test method to assess statistical significance is also introduced. Section 3 describes the studied heavy rainfall events and observation data as well as the design of rainfall nowcasting experiments. Section 4 presents the experimental results, including the strategy analysis, spatial performance, and statistical performance of the ExAMP scheme. Section 5 is devoted to the conclusions and future prospects.

2 Methodology

2.1 Extrapolation model

The McGill algorithm for precipitation nowcasting by Lagrangian extrapolation (MAPLE; Germann and Zawadzki 2002) developed by the J. S. Marshall Radar Observatory of McGill University is used as the two-dimensional extrapolation model. The algorithm consists of two stages: (a) retrieving the optimal motion field from consecutive reflectivity images via variational analysis with a scaling-guess procedure (Laroche and Zawadzki 1994); (b) advecting the current reflectivity for many small time steps via a modified semi-Lagrangian backward scheme, formulated as

$${\varvec{\upalpha}}=\Delta t\mathbf{u}\left({t}_{0}, \mathbf{x}-\frac{{\varvec{\upalpha}}}{2}\right),$$
(1)

where \({\varvec{\upalpha}}\) is the displacement vector starting with \({\varvec{\upalpha}}=0\); \(\Delta t\) is the time step; \(\mathbf{u}\) is the retrieved optimal motion field; \({t}_{0}\) is the start time of extrapolation; \(\mathbf{x}\) is the position. The scaling-guess procedure at the first stage can reduce the risk of cost function convergence towards secondary minima by retrieving the optimal motion field iteratively with increasing grid resolution. The semi-Lagrangian backward scheme at the second stage can allow for rotation by tracing streamlines upstream and reduce numerical diffusion by doing only one cubic interpolation. In this study, the two-dimensional composite reflectivity that synthesizes the maximum reflectivities measured by nine Doppler weather radars in Taiwan is used to retrieve the optimal motion field, and then either the current composite reflectivity or rainfall rate is extrapolated depending on the experiment. The simulation domain of MAPLE and the locations and coverage area of the nine radars are shown in Fig. 1. The detailed configuration settings of MAPLE follow those in L20.

Fig. 1
figure 1

Simulation domain of MAPLE, which has \(921\times 881\) horizontal grid points with a \(0.0125^\circ\) spacing. The red triangles and circles mark the locations of the nine Doppler weather radars, consisting of one S-band dual-polarimetric (RCWF), three S-band (RCHL, RCCG, and RCKT), and five C-band dual-polarimetric (RCCK, RCNT, RCMK, RCGR, and RCLY) radars. The white area represents the coverage area

2.2 QPE relationships

For the experiments estimating rainfall rate from extrapolated composite reflectivity, the Z-R relationship derived by Xin et al. (1997) is used, formulated as

$${Z}_{h}=32.5{R}^{1.65},$$
(2)

where \({Z}_{h}\) is the reflectivity factor in units of mm6∙m−3; \(R\) is the rainfall rate in units of mm∙h−1. For the experiments extrapolating the current rainfall rate, different QPE relationships are used at a surface point depending on the variable and intensity of the lowest radar observation over the point. If the lowest radar observation is dual-polarimetric with logarithmic reflectivity \({Z}_{H}\ge 30\mathrm{ dBZ}\) and specific differential phase \({\text{K}}_{{{\text{DP}}}} \ge 0.3^\circ \cdot {\text{km}}^{{ - 1}}\) simultaneously, the current rainfall rate is estimated using the R-\({K}_{DP}\) relationships derived by Chen et al. (2021), formulated as

$$R=47.5998{{K}_{DP}}^{0.7605}$$
(3)

for S-band radars and

$$R=26.2343{{K}_{DP}}^{0.7485}$$
(4)

for C-band radars; otherwise, the Z-R relationship (Eq. 2) is still used. The motivation of the hybrid approach is that specific differential phase, compared with reflectivity, is a more accurate rainfall rate estimator immune to radar miscalibration, attenuation, and partial beam blockage but less sensitive to light rain (Giangrande and Ryzhkov 2008; Figueras i Ventura et al. 2012). As a side note, the lowest radar observation must be below a 4-km height above ground level with \({Z}_{H}\ge 10\mathrm{ dBZ}\). If every radar observation over the surface point is too high or too weak, the current rainfall rate is declared zero.

2.3 NWP model assimilating radar data

The Weather Research and Forecasting (WRF; Skamarock et al. 2008) Version 3.3.1 model with the Advanced Research WRF solver developed by the National Center for Atmospheric Research (NCAR) is used as the NWP model. This world-renowned community model integrates fully compressible, nonhydrostatic governing equations in a terrain-following coordinate system and encompasses a rich collection of dynamics and physics schemes. Figure 2 shows the two-way interactive, double-nested simulation domains of the WRF model in use. The physics schemes in use include the Morrison double-moment microphysics (Morrison et al. 2009), Rapid Radiative Transfer Model (RRTM) longwave radiation (Mlawer et al. 1997), Goddard shortwave radiation (Chou and Suarez 1994), Fifth-Generation Penn State/NCAR Mesoscale Model (MM5) similarity surface layer (Jiménez et al. 2012), Noah land surface model (Chen and Dudhia 2001), Yonsei University planetary boundary layer (Hong et al. 2006), and Kain–Fritsch cumulus parameterization (Kain 2004; off in Domain 2 with a 3-km grid spacing). Coupled with the WRF model, the WRF Data Assimilation (WRFDA; Barker et al. 2012) Version 3.4.1 system is used to assimilate the observations of the nine radars to obtain a more realistic convective-scale initial condition. The three-dimensional variational (3DVar; Barker et al. 2004) method is used with the control variable option 7 (CV7; Sun et al. 2016) background error statistics. The assimilated variables consist of radial velocity and the mixing ratios of rain, snow, graupel, and water vapor estimated from reflectivity.

Fig. 2
figure 2

Two-way interactive, double-nested simulation domains of the WRF model. Domain 1 has \(280\times 280\) horizontal grid points with a 15-km spacing. Domain 2 has \(331\times 331\) horizontal grid points with a 3-km spacing. There are 45 vertical eta levels with a 30-hPa pressure top

2.4 ExAMP blending scheme

As mentioned in Sect. 1, the ExAMP blending scheme fully trusts the field pattern predicted by extrapolation but allows the field intensity to be adjusted by NWP model prediction. The concept can be formulated as

$$\Psi _{E} \left( t \right) = \left\{ {\begin{array}{*{20}l} {\Psi _{M} \left( t \right) + \left[ {1 - w\left( t \right)} \right]\left[ {\Psi _{W} \left( t \right) - \Psi _{M} \left( t \right)} \right]} & {if\,\Psi _{M} \left( t \right) > 0,} \\ 0 & {otherwise,} \\ \end{array} } \right.$$
(5)

where \({\Psi }_{E}\), \({\Psi }_{M}\), and \({\Psi }_{W}\) are the fields predicted by ExAMP, MAPLE, and the WRF model assimilating radar data, respectively; \(w\) is the weight function ranging between 0 and 1; \(t\) is the forecast time. \(\Psi\) stands for the field of either composite reflectivity or rainfall rate depending on the experiment. \(w\) linearly decreases from 1 to 0 during the first 120-min forecast period for the declining predictability of MAPLE and then stays at 0. In L20, the innovation \({\Psi }_{W}-{\Psi }_{M}\) that exceeded an empirical positive limit of \(0.5{\Psi }_{M}\) or negative limit of \(-0.3{\Psi }_{M}\) was truncated to the limit to mitigate the impact of occasional overprediction or underprediction from the WRF model. The positive limit was given a larger magnitude to consider the possibility of rapid intensification, which deserves more attention in early warning. In this study, the innovation limitation is removed in a sensitivity experiment to reassess its necessity because the predicted field changes from composite reflectivity to rainfall rate.

2.5 Paired t-test

To assess the statistical significance of the differences between similar skill scores of two rainfall nowcasting experiments, an appropriate paired t-test is conducted. The t-value for the paired t-test is calculated as

$$t=\frac{\overline{d}-{\mu }_{0}}{{s}_{d}/\sqrt{n}},$$
(6)

where \(\overline{d}\) is the average difference between the paired measurements; \({\mu }_{0}\) is the hypothesized mean difference; \({s}_{d}\) is the sample standard deviation of the differences; \(n\) is the number of pairs. \({\mu }_{0}\) is set to 0, which means that the null hypothesis is that there is no difference between the two groups. The t-value is then compared to the critical value for a 95% confidence interval, which is 2.028 in this study. If the absolute value of the t-value is less than the critical value, then there is no statistically significant difference between the two groups.

3 Experimental design

3.1 Heavy rainfall events and observation data

Similar to L20, this study samples 37 partially overlapping 3-h periods, during each of which at least one rain gauge observation exceeded 100 mm, from seven heavy rainfall events in Taiwan in 2019 for rainfall nowcasting experiments. The threshold of 100 mm per 3 h refers to the level of extremely heavy rain defined by the Central Weather Bureau (CWB) of Taiwan. The dates, numbers of samples, and weather types of the seven heavy rainfall events are listed in Table 1. The weather types include the local thunderstorm, Mei-yu front, typhoon, and periphery of low pressure, abbreviated to LT, MF, TY, and PL, respectively. Radar and rain gauge observation data are gathered via the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS; Zhang et al. 2011) system of CWB. The radar data come from one S-band dual-polarimetric, three S-band, and five C-band dual-polarimetric radars shown in Fig. 1. Necessary quality control procedures, such as noise removal, velocity dealiasing, and attenuation correction, are applied to the radar data before the spatial interpolation and use of the data. The rain gauge data come from more than 1000 rain gauges distributed all over the main and outlying islands of Taiwan (not shown), which serve as the verifying truth for the predicted rainfall rates in all the experiments interpolated to the rain gauge locations.

Table 1 Dates, numbers of samples, and weather types of the seven heavy rainfall events

3.2 Rainfall nowcasting experiments

There are seven 150-min rainfall nowcasting experiments as described in Table 2 for the 37 samples. The EX_Z experiment extrapolates composite reflectivity and then estimates rainfall rate using the Z-R relationship. The EX_R(Z) experiment estimates the current rainfall rate using the Z-R relationship and then extrapolates it. The EX_R(hyb) experiment estimates the current rainfall rate using the hybrid QPE relationships and then extrapolates it. These three experiments using MAPLE are statistically compared to analyze the strategies of extrapolation. In the WRF experiment, the initial and boundary conditions for each sample are generated from the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) 0.5° forecast closest to the 3-h period, and the observations of the nine radars are assimilated for three analysis cycles with a 30-min interval before the forecast. To mimic the prediction latency in real-time systems owing to data latency and model computation, the first 10 min of each MAPLE forecast and 30 min of each WRF forecast are discarded. The ExAMP experiment blends the rainfall rates predicted in EX_R(hyb) and WRF to evaluate the skill of the ExAMP scheme, and the two sensitivity experiments mentioned in Sect. 1 are carried out to analyze the strategies of the ExAMP scheme. The ExAMP_Z experiment blends the composite reflectivities predicted in EX_Z and WRF and then estimates rainfall rate using the Z-R relationship. The ExAMP_NoLim experiment resembles ExAMP but removes the innovation limitation.

Table 2 Descriptions of the seven rainfall nowcasting experiments

4 Results

4.1 Strategy analysis of extrapolation

The predicted rainfall rates in the seven experiments for the 37 samples are verified against the rain gauge observations during the forecast periods of 0–30, 30–60, 60–90, 90–120, and 120–150 min. The root-mean-square errors (RMSEs) and biases of 30-min rainfall are calculated and separately shown in Figs. 3, 4, and 5 to discuss the aforementioned topics. The mean RMSE and bias values averaged over the 37 samples are listed in Tables 3 and 4, respectively, for the overall evaluation. Firstly, the strategies of extrapolation are analyzed by comparing EX_Z, EX_R(Z), and EX_R(hyb). Figure 3 shows that the RMSEs and biases in EX_R(Z) are significantly smaller than in EX_Z for all the samples. The comparison indicates that, in terms of rainfall nowcasting, extrapolating rainfall rate is better than extrapolating composite reflectivity. This result implies that extrapolating the predicted field directly can lead to less uncertainty than extrapolating its nonlinearly related field under model errors. On the other hand, the biases in EX_R(hyb) are further smaller than in EX_R(Z) for most of the samples. The t-values for the paired t-test conducted on the RMSE differences during the forecast periods of 0–30 and 120–150 min between EX_R(Z) and EX_R(hyb) exceed the critical value (Table 5). The comparison indicates that estimating the current rainfall rate using the hybrid QPE relationships is better than using the Z-R relationship. This result can be attributed to not only the higher accuracy of the hybrid QPE relationships but also the shorter vertical distance between the lowest radar observation and the ground. For example, the maximum reflectivities in stratiform precipitation areas are often measured in the melting layer high above the ground, characterized by the so-called bright band, and therefore applying the Z-R relationship to composite reflectivity can easily overestimate the current rainfall rate on the ground. The performance ranking of EX_Z, EX_R(Z), and EX_R(hyb) from worst to best is consistent during all the forecast periods according to the mean RMSE (Table 3) and bias (Table 4) statistics. This result leads to a conclusion that extrapolating the current rainfall rate estimated using the hybrid QPE relationships is the superior strategy of extrapolation.

Fig. 3
figure 3

(Left) RMSEs and (right) biases of 30-min rainfall during the forecast periods of (from top to bottom) 0–30, 30–60, 60–90, 90–120, and 120–150 min for the 37 samples in EX_Z (orange), EX_R(Z) (gray), and EX_R(hyb) (green). The samples are arranged in sequence of the four weather types, LT, MF, TY, and PL. The line-dot plots are used to clearly compare different experiments rather than to describe temporal evolution in each experiment

Fig. 4
figure 4

As in Fig. 3, but in EX_R(hyb) (green), WRF (blue), and ExAMP (red)

Fig. 5
figure 5

As in Fig. 3, but in ExAMP (red), ExAMP_Z (purple), and ExAMP_NoLim (black)

Table 3 Mean RMSE values of 30-min rainfall averaged over the 37 samples during the forecast periods of 0–30, 30–60, 60–90, 90–120, and 120–150 min in the seven experiments
Table 4 Mean bias values of 30-min rainfall averaged over the 37 samples during the forecast periods of 0–30, 30–60, 60–90, 90–120, and 120–150 min in the seven experiments
Table 5 T-values for the paired t-test conducted on the RMSE differences of 30-min rainfall during the forecast periods of 0–30, 30–60, 60–90, 90–120, and 120–150 min for the 37 samples between EX_R(Z) and EX_R(hyb), between EX_R(hyb) and ExAMP, and between ExAMP and ExAMP_NoLim

4.2 Skill of the ExAMP scheme

Secondly, the skill of the ExAMP scheme is evaluated by comparing EX_R(hyb), WRF, and ExAMP. Figure 4 shows that, during the forecast period of 0–30 min, the RMSEs in EX_R(hyb) and ExAMP are equivalent and significantly smaller than in WRF for most of the samples. The comparison indicates that the first 30-min WRF forecasts are unreliable before the model spins up and consequently little trusted by the ExAMP scheme. As the forecast time proceeds to 150 min, the mean RMSE value in WRF decreases from 5.3 to 3.5 mm, even smaller than the final 3.7 mm in EX_R(hyb) (Table 3), while the mean bias value in EX_R(hyb) increases from 0.55 to 0.82 mm (Table 4). Such concurrent growth of WRF and decline of MAPLE in predictability favor the use of the ExAMP scheme, and the crossover lead time of approximately 120–150 min corresponds with the weight function \(w\) assigned to Eq. 5. Figure 4 shows that, as the forecast time proceeds, the RMSEs and biases in ExAMP become smaller than in EX_R(hyb) for all the samples. The t-values on the RMSE differences during all the forecast periods between EX_R(hyb) and ExAMP are much larger than the critical value (Table 5). The comparison indicates that, on the basis of the superior strategy of extrapolation, the skill in rainfall nowcasting can be further enhanced by gradually absorbing the intensity variations predicted by the NWP model assimilating radar data. The mean RMSE (Table 3) and bias (Table 4) values in ExAMP are significantly smaller than in EX_R(hyb) and WRF during all the forecast periods. This result leads to a conclusion that, following the successful reflectivity nowcasting in L20, the ExAMP scheme also has greater skill in 150-min rainfall nowcasting than extrapolation and NWP with radar data assimilation.

4.3 Strategy analysis of the ExAMP scheme

Thirdly, the strategies of the ExAMP scheme are analyzed by comparing ExAMP, ExAMP_Z, and ExAMP_NoLim. Figure 5 shows that the RMSEs and biases in ExAMP are significantly smaller than in ExAMP_Z for most of the samples. The comparison indicates that blending the predicted rainfall rates is better than blending the predicted composite reflectivities. This result restates the disadvantage of extrapolating composite reflectivity. On the other hand, as the forecast time proceeds, the RMSEs and biases in ExAMP_NoLim become smaller than in ExAMP for most of the samples. This trend is more obvious in Fig. 6 that shows the RMSE improvement percentages in ExAMP_NoLim against ExAMP. The t-values on the RMSE differences during all the forecast periods between ExAMP and ExAMP_NoLim are much larger than the critical value (Table 5). The comparison indicates that, contrary to being beneficial in L20, the innovation limitation is harmful in this study. This result implies that the limits of \(0.5{\Psi }_{M}\) and \(-0.3{\Psi }_{M}\) exerted on the innovation \({\Psi }_{W}-{\Psi }_{M}\) in Eq. 5 can be too restrictive for blending the predicted rainfall rates. For example, if \({\Psi }_{M}\) is 30 mm∙h−1, which is equivalent to 39.5 dBZ following Eq. 2, exerting an upper limit of 45 mm∙h−1 on \({\Psi }_{W}\) in this study is way more restrictive than 59.3 dBZ in L20. The mean RMSE (Table 3) and bias (Table 4) values in ExAMP_NoLim are the smallest in all the experiments during all the forecast periods. This result leads to a conclusion that blending the rainfall rates predicted by MAPLE, which extrapolates the current rainfall rate estimated using the hybrid QPE relationships, and WRF via the ExAMP scheme without the innovation limitation is the best strategy for rainfall nowcasting.

Fig. 6
figure 6

RMSE improvement percentages (%) of 30-min rainfall during the forecast periods of 0–30, 30–60, 60–90, 90–120, and 120–150 min for the 37 samples in ExAMP_NoLim against ExAMP. The samples are arranged in sequence of the four weather types, LT, MF, TY, and PL

4.4 Spatial performance for two contrasting events

Fourthly, after the overall statistics, the spatial performance in EX_R(hyb), WRF, ExAMP, and ExAMP_NoLim for two contrasting heavy rainfall events of different weather types is examined to more concretely illustrate the characteristics of different methods. To compare the performance throughout the radar coverage area both on land and at sea, the rainfall estimated from the radar observations during both events using the hybrid QPE relationships, denoted by Obs_R(hyb), is taken as the benchmark instead of the discrete rain gauge observations only on land. Figure 7 shows the consecutive maps of 30-min rainfall in Obs_R(hyb), EX_R(hyb), WRF, ExAMP, and ExAMP_NoLim for the first event during 1000–1230 UTC, July 2, 2019. The event was characterized by a strengthening offshore convective line approaching southwestern Taiwan (Area A) and a weakening convective system in northern Taiwan (Area B). It can be seen that the strengthening in Area A and the weakening in Area B are both lacking in EX_R(hyb), indicating the inability of extrapolation to simulate intensity variations. On the contrary, both variations are simulated in WRF, but widespread overprediction occurs in Area A and explains the positive bias in Fig. 4. With respect to ExAMP and ExAMP_NoLim, the widespread overprediction is avoided in both experiments by trusting the pattern in EX_R(hyb), and the intensity variations are better presented in ExAMP_NoLim than in ExAMP by removing the innovation limitation. Next, the performance for the second event during 1030–1300 UTC, August 2, 2019 is shown in Fig. 8. The event was characterized by dissipating local thunderstorms in central and southern Taiwan (Area C). Similarly, the dissipation in Area C is simulated in WRF rather than in EX_R(hyb), but overprediction occurs in eastern Taiwan (Areas D and E). Once again, ExAMP_NoLim has the best performance in presenting the dissipation and avoiding the overprediction.

Fig. 7
figure 7

Maps of 30-min rainfall during the forecast periods of (from top to bottom) 0–30, 30–60, 60–90, 90–120, and 120–150 min in (from left to right) Obs_R(hyb), EX_R(hyb), WRF, ExAMP, and ExAMP_NoLim for the first event during 1000–1230 UTC, July 2, 2019. Areas A and B are discussed in the text

Fig. 8
figure 8

As in Fig. 7, but for the second event during 1030–1300 UTC, August 2, 2019. Areas C, D, and E are discussed in the text

4.5 Performance at rainfall strengthening gauges and weakening gauges

Lastly, even with the promising statistical and spatial performance above, the effectiveness of the ExAMP scheme is still doubtful because the smaller RMSEs in ExAMP than in extrapolation may be attributed to coincident bias reduction rather than random error reduction. More specifically, if extrapolation has generally positive biases as in this study, the ExAMP scheme may alleviate the mean bias owing to intensity reduction over numerous points where rainfall is predicted by extrapolation but not the NWP model. Therefore, to prove the effectiveness of the scheme, it is necessary to investigate whether ExAMP also outperforms extrapolation when extrapolation has little or negative biases. One way to do so, motivated by the examination of the strengthening Area A and weakening Area B in the previous subsection, is to pick out the rainfall strengthening gauges (measuring increasing rainfall rates) and weakening gauges (measuring decreasing rainfall rates) for further statistics. It can be expected that extrapolation has negative biases at the strengthening gauges but positive biases at the weakening gauges for its inability to simulate intensity variations. In this study, a rain gauge is defined as a strengthening gauge or weakening gauge if the slope of the linear regression line that fits its rainfall rate measurements during 0–150 min is larger than 0.4 mm h−2 or smaller than − 0.4 mm h−2, respectively. Figure 9 shows the numbers of rainfall strengthening gauges and weakening gauges for the 37 samples. It can be seen that the numbers are generally greater for the Mei-yu front event and typhoon event, reflecting the larger range of impact from the two weather types. The strengthening gauges are outnumbered by the weakening gauges for all the samples, implying that precipitation systems usually have smaller strengthening areas and larger weakening areas regardless of the weather types.

Fig. 9
figure 9

Numbers of rainfall strengthening gauges and weakening gauges for the 37 samples, whose dates and time are labelled. The samples are arranged in sequence of the four weather types, LT, MF, TY, and PL

Now let us further evaluate the skill of the ExAMP scheme separately at rainfall strengthening gauges and weakening gauges. Figure 10 shows the RMSEs and biases during all the forecast periods in EX_R(hyb), WRF, and ExAMP_NoLim calculated at rainfall strengthening gauges only. As expected, most of the biases in EX_R(hyb) are neutral at the start time and become negative as the forecast time proceeds, reflecting that extrapolation cannot grasp the strengthening. By contrast, most of the biases in WRF are positive at the start time and then become neutral, representing the spin-up of model predictability. Based on the pattern of EX_R(hyb) and adjusted by the intensity of WRF, ExAMP_NoLim exhibits similar biases but distinguishable improvement in RMSEs against those in Ex_R(hyb), especially during the latter forecast periods. This result proves that the ExAMP scheme is also effective when extrapolation has little or negative biases. Contrary to Figs. 10, 11 shows the RMSEs and biases calculated at rainfall weakening gauges only. As expected again, most of the biases in EX_R(hyb) are neutral at the start time and then become positive, reflecting that extrapolation cannot grasp the weakening, either. The comparison between Figs. 10 and 11 as well as the statistics of mean RMSEs (Table 6) and mean biases (Table 7) leads to a conclusion that the ExAMP scheme is more effective in rainfall weakening areas than strengthening areas. This is also why the overall statistics are promising since the weakening areas are larger. There are two possible reasons for this: (a) for rainfall strengthening gauges that result from the initialization of overhead new cells, the ExAMP scheme has no chance of improvement because the new cells are absent from the start of extrapolation; (b) for rainfall weakening gauges, the ExAMP scheme has a good chance of improvement because blending either low or absent rainfall predicted by the NWP model can lead to the weakening effect. Moreover, the threat score (TS) above a rainfall threshold is also commonly used for rainfall verification besides the RMSE and bias. The TSs above the threshold of 10 mm per 30 min separately at rainfall strengthening gauges and weakening gauges are shown in Fig. 12. In the TS results, the ExAMP scheme has no obvious advantage over extrapolation. This is because both methods have an identical pattern error, which is a more dominant factor than the intensity error in TS calculation.

Fig. 10
figure 10

(Left) RMSEs and (right) biases of 30-min rainfall during the forecast periods of (from top to bottom) 0–30, 30–60, 60–90, 90–120, and 120–150 min for the 37 samples in EX_R(hyb) (green), WRF (blue), and ExAMP_NoLim (black) calculated at rainfall strengthening gauges only. The samples are arranged in sequence of the four weather types, LT, MF, TY, and PL. The line-dot plots are used to clearly compare different experiments rather than to describe temporal evolution in each experiment

Fig. 11
figure 11

As in Fig. 10, but calculated at rainfall weakening gauges only

Table 6 Mean RMSE values of 30-min rainfall averaged over the 37 samples during the forecast periods of 0–30, 30–60, 60–90, 90–120, and 120–150 min in EX_R(hyb), WRF, and ExAMP_NoLim calculated at rainfall strengthening gauges only or weakening gauges only
Table 7 Mean bias values of 30-min rainfall averaged over the 37 samples during the forecast periods of 0–30, 30–60, 60–90, 90–120, and 120–150 min in EX_R(hyb), WRF, and ExAMP_NoLim calculated at rainfall strengthening gauges only or weakening gauges only
Fig. 12
figure 12

TSs above the threshold of 10 mm per 30 min during the forecast periods of (from top to bottom) 0–30, 30–60, 60–90, 90–120, and 120–150 min separately at (left) rainfall strengthening gauges and (right) weakening gauges for the 37 samples in EX_R(hyb) (green), WRF (blue), and ExAMP_NoLim (black). The samples are arranged in sequence of the four weather types, LT, MF, TY, and PL. The line-dot plots are used to clearly compare different experiments rather than to describe temporal evolution in each experiment

5 Conclusions and future prospects

This study follows up on L20, which proposed the ExAMP scheme for reflectivity nowcasting, to further analyze its skill and strategies for rainfall nowcasting. Based on the contrasting characteristics of extrapolation and NWP model prediction, the ExAMP scheme fully trusts the field pattern predicted by the former but allows the field intensity to be adjusted by the latter. Using MAPLE as the former and the WRF model as the latter, seven 150-min rainfall nowcasting experiments implementing different strategies are carried out for 37 sampled periods from seven heavy rainfall events. The results of the overall statistics lead to some conclusions in this study: (a) the superior strategy of MAPLE can be achieved by extrapolating the current rainfall rate estimated from the lowest dual-polarimetric radar observations using the hybrid QPE relationships; (b) the ExAMP scheme that blends the MAPLE and WRF forecasts can surpass both components in 150-min rainfall nowcasting; (c) the empirical innovation limitation beneficial for blending composite reflectivities can be too restrictive for blending rainfall rates, which have a much larger range of values. Moreover, from the spatial examination of two contrasting events, the ExAMP scheme without the innovation limitation is found to have the best performance in grasping the rainfall strengthening and weakening in different areas. The skill statistics separately at rainfall strengthening gauges and weakening gauges for all the samples further prove the effectiveness of the ExAMP scheme regardless of the biases of extrapolation, even though it is effective in intensity correction instead of pattern correction.

Despite these promising conclusions, some limitations in this study need to be mentioned: (a) the conclusions are not globally consistent because the results are only based on the limited 37 samples in Taiwan; (b) the ExAMP scheme is only appropriate within the aforementioned crossover lead time, approximately 120–150 min; (c) the reasons for the less effectiveness of the ExAMP scheme in rainfall strengthening areas are still be verified. For the future prospects, more research can be done to enhance the skill of the ExAMP scheme, such as replacing the linear weight function \(w\) with statistical weight functions of space and time adapted to different weather types. More events and configuration settings of MAPLE and the WRF model can be tested to further validate the consistency of the scheme. As heavy rainfall nowcasting plays a more and more important role under extreme weather scenarios, the authors will continue to advance the scheme and seek operational applications in the future.