Background

Low-level moisture in the troposphere is essential for the initiation and development of deep moist convection. Kuo et al. (1993) assimilated precipitable water vapor (PWV) data observed by radiosonde into a mesoscale numerical model by relaxing the predicted PWV toward the observed PWV. They reported that assimilation of PWV improved accuracy of short-range precipitation forecast. Guo et al. (2000) assimilated PWV data detected by the global navigation satellite system (GNSS) by using a four-dimensional variational assimilation system and succeeded in reproducing the observed precipitation pattern associated with a squall line. The Geospatial Information Authority of Japan (GSI) operates a nationwide GNSS observation network called GNSS Earth Observation NETwork (GEONET), which contains a mean inter-station distance of about 20 km over Japan. Assimilation experiments of GEONET-PWV were performed using four-dimensional variational assimilation systems (Seko et al. 2004; Kawabata et al. 2007; Shoji et al. 2009). Yan et al. (2009) performed assimilation of GNSS-derived zenith tropospheric delay (ZTD) data, that is, atmospheric delay above the receiver, observed by a dense GNSS network during the Convective and Orographically-induced Precipitation Study (COPS) campaign, where the shortest distance between the assimilated ZTD was longer than 10 km. They reported improvement in forecasts during weak precipitation but reduced accuracy in heavy precipitation. Kawabata et al. (2013) conducted an assimilation experiment of GEONET-derived slant path delay (SPD) data, that is, the atmospheric delay along the ray path of a GNSS radio signal, and succeeded in reproducing the heavy rainfall event that occurred over Okinawa Island in August 19, 2009.

Although many assimilation studies of GNSS-derived water vapor data with variational assimilation systems have been conducted so far, there have been few studies that assimilated PWV using an ensemble Kalman filter (EnKF; Evensen 1994) system. Seko et al. (2011) conducted an assimilation experiment of GEONET-derived PWV by using a local ensemble transform Kalman filter (LETKF; Hunt et al. 2007) method based on the Japan Meteorological Agency Non-Hydrostatic Model (JMA-NHM; Saito et al. 2007). Seko et al. (2013) developed a two-way nested NHM-LETKF system and investigated the synergistic effects of simultaneous assimilation of the Doppler radar radial wind velocity and water vapor data observed by GEONET (i.e., PWV and slant water vapor (SWV) that is the accumulated water vapor amount along the ray path of a GNSS radio signal). They succeeded in increasing the number of ensemble forecasts that reproduced localized heavy rainfall by assimilating the GNSS data and the Doppler radar data.

The EnKF is a state estimation technique based on a Monte Carlo method. The EnKF uses short-term ensemble forecasting to estimate flow-dependent background error covariance, assuming each ensemble member result is a statistical sample. However, generic EnKF studies use up to 100 ensemble members partially due to limited computational resources. This limited ensemble size in EnKF introduces sampling errors into the background error covariance and deteriorates the accuracy of analysis ensemble. To address this problem, the covariance localization method has been used to remove spurious error correlations between distant locations (Houtekamer and Mitchell 1998; Hamil et al. 2001). In recent years, some assimilation studies applied multiple localization scales. Zhang et al. (2009) applied a small localization scale to high-resolution Doppler radar observation data and a large localization scale to synoptic scale radiosonde observation data. Miyoshi and Kondo (2013) applied a multi-scale localization approach by changing the localization scale, depending on the scales of error correlations, and reported promising results.

Oigawa et al. (2014) simulated large PWV fluctuations at the local scale less than 10 km during heavy rainfall. Aonashi (2008) reported that the horizontal scale of background error correlations in a precipitating region is smaller than that in a non-precipitating region. These earlier studies suggest that a smaller horizontal localization radius should be used to assimilate PWV data over a precipitating region. However, as far as the authors know, there has been no study that assimilated PWV data with a horizontal resolution less than 10 km, using a small horizontal localization radius for PWV data over a precipitating region.

The objective of this study is to investigate the assimilation effects of the high-resolution PWV data derived from a hyper-dense GNSS receiver network in an effort to improve the simulation accuracy of heavy rainfall. Here we use a two-way nested NHM-LETKF system and apply it for a heavy rainfall event over Uji, Kyoto, Japan, in August 14, 2012. Figure 1 shows the hyper-dense GNSS receiver network with a mean inter-station distance of 1.7 km near the Uji campus of Kyoto University (Sato et al. 2013), hereafter called the Uji network. Horizontal resolution of the retrieved PWV derived from the Uji network was improved by estimating the PWV from a slant delay at the highest elevation angle. Large horizontal inhomogeneities of PWV, even at the local scale less than 10 km, have been observed by the Uji network in periods of heavy rainfall over the Uji network. Considering the difference in the characteristic length scale of PWV fluctuations between precipitating regions and non-precipitating regions, we applied multiple horizontal localization radii, scales of which depend on precipitation intensity. We also investigated the optimum inter-station distance of GNSS receivers for reproducing heavy rainfall by thinning out PWV data at several stations in the Uji network.

Fig. 1
figure 1

a Simulation domain of inner model. Red triangles and plus symbols indicate selected and unselected GEONET stations for data assimilation, respectively. The rectangle A is the domain for evaluating the simulation accuracy of hourly accumulated rainfall amounts. Seven GEONET stations inside the circle and the GNSS stations of the Uji network were used to investigate the spatial inhomogeneity of PWV. b The dense GNSS receiver network around Uji, Kyoto, Japan. Blue and red triangles indicate stations of the Uji network and GEONET, respectively. Contour lines indicate elevations of the ground (m). A rain gauge was installed at RISH station

The structure of this paper is as follows. First, the heavy rainfall event on August 14, 2012, and the associated PWV variations observed by the Uji network are described. Second, the design of the data assimilation system and the assimilation method of PWV are explained. Next, results of the data assimilation experiment using high-resolution PWV data are described. Finally, a summary and discussion are presented.

Heavy rainfall event on August 13–14, 2012, in Uji, Kyoto, Japan

We focused on the heavy rainfall event occurring on August 13–14, 2012, in Uji, Kyoto (called “Uji heavy rainfall” hereinafter). The Uji heavy rainfall is considered the most devastating event to take place during the observation period of the Uji network between 2010 and 2015. In this event, two people were killed and 2275 houses were damaged due to stationary back-building-type convective systems. Figure 2 shows the hourly rainfall derived from precipitation data observed by the radar network of the Japan Meteorological Agency (JMA). Mesoscale convective systems (MCSs) were stationary over the Uji network from 2100 local time (LT) on August 13 to 0600 LT on August 14. Ishihara and Takara (2013) reported that the nature of the MCSs over the Uji network changed with time. The MCS before 0000 LT on the 14th was a back-building type. After a weak rain period during 0000–0200 LT on the 14th, a back-building-type MCS was formed again during 0200–0500 LT. The MCS changed into a squall line after 0500 LT on the 14th. As shown in Fig. 2, the MCS became active after 0300 LT, and the precipitating region started to move southward after 0500 LT as the MCS was transformed into a squall line. The accumulated precipitation was 322 mm, accumulated over 10 h until 0600 LT on the 14th, measured at the building of Research Institute for Sustainable Humanosphere (RISH). The heavy rainfall at RISH was brought by convective clouds that were continuously generated one after another west of the Uji network. During the heavy rainfall event, the Uji network was located at the northern edge of the MCSs.

Fig. 2
figure 2

Maps of hourly accumulated rainfall from 2200 LT on August 13, 2012, to 0600 LT on August 14, 2012, derived from radar data of JMA. Triangles indicate stations of the Uji network

GNSS-derived PWV observed by the Uji network

In the conventional GNSS meteorology technique, the delays from all available GNSS satellites above an elevation angle of 5°–10° are averaged to arrive at a single-value estimate for PWV. As a result, GNSS-derived PWV is estimated as a spatially averaged water vapor amount within an inverse cone defined by the elevation cutoff angle, resulting in smoothed local-scale PWV fluctuations. Sato et al. (2013) succeeded in improving the horizontal resolution of PWV derived from the Uji network, estimating PWV from a single slant delay at the highest elevation angle. High horizontal resolution PWV data retrieved by this method are called PWVSPD-H, hereinafter. When we retrieved PWVSPD-H, we projected the slant delay at the highest elevation angle onto the zenith direction by using the global mapping function (GMF) (Böhm et al. 2006). We also retrieved PWV data by the conventional GNSS meteorology technique by using a low elevation angle cutoff of 10°, which is called “PWVCON,” hereinafter. Details about the strategy of the GNSS analysis used in this study are described in Appendix. The satellite constellation of the Quasi-Zenith Satellite System (QZSS) is preferable for retrieving PWVSPD-H because the QZSS provides at least one satellite continuously close to the zenith over Japan. The errors of PWVSPD-H due to the variable geometry of satellite–receiver line of sights during heavy rainfall could be reduced by about 20% if PWVSPD-H is retrieved by using the QZSS satellite at the highest elevation instead of using the GPS satellite at the highest elevation (Oigawa et al. 2014). During the Uji heavy rainfall event, the QZSS satellite did not exist close to the zenith. Therefore, the GPS satellites above 60° elevation angle were used to estimate PWVSPD-H.

Figure 3a indicates the time variation of the 10-min rainfall observed by a rain gauge at RISH. To investigate the degree of the spatial inhomogeneity of PWV around the Uji network, we calculated the root-mean-square (RMS) values of PWV using the GNSS data observed by 15 GNSS stations around Uji, i.e., seven GEONET stations inside the circle in Fig. 1a and eight GNSS stations of the Uji network. The RMS values were calculated every 2 km,

$${\text{RMS}}(2(i - 1) \le r < 2i) = \sqrt {\frac{1}{n(i)}\sum\limits_{j = 1}^{n(i)} {({\text{dPWV}}_{j} )^{2} } } \quad (i = 1,2, \ldots ,12)$$
(1)

where n is the total number of combinations of any two GNSS stations inside the circle. The inter-station distance (r) is greater than or equal to 2(i − 1) km and smaller than 2i km, where i varies from 1 to 12. Variable dPWV is the difference in PWV between two GNSS stations. During 0300–0600 LT on August 14 when the MCS became active, RMS values of PWVCON and PWVSPD-H also became large. Precipitation intensity at RISH became strong after 0240 LT. RMS of PWVSPD-H became strong after 0140 LT at a horizontal scale between 5 and 10 km, while RMS of PWVCON became larger after 0205 LT. In addition, RMS values of PWVCON and PWVSPD-H achieved maximum values, preceding the peak time of the surface rainfall at RISH. Oigawa et al. (2015) analyzed a 250-m mesh model data simulated by JMA-NHM that successfully simulated the observed rapid increase in PWV prior to surface rainfall during the Uji heavy rainfall event. It was found that in the model, the local PWV maximum began to form about 16 min before the surface rainfall due to wind convergence near the ground. Based on the earlier study, it can be inferred that the RMS of PWV, i.e., horizontal inhomogeneity of PWV, became large because the local PWV maximum formed by wind convergence near the ground was detected by the Uji network. The RMS of PWVCON and PWVSPD-H was especially large for a mean inter-station distance between 5 and 10 km (Fig. 3b), suggesting that the GNSS receiver network with spatial separation denser than GEONET is useful for observing the variations in water vapor associated with convective precipitation. Figure 3b and c shows that the RMS values of PWVSPD-H were larger than those of PWVCON at scales of 1–10 km, indicating that PWVSPD-H is more suitable for detecting water vapor fluctuations at the meso-γ scale (2–20 km).

Fig. 3
figure 3

a Time variation of 10-min rainfall observed by a rain gauge installed at RISH. b Time–distance variations of RMS of PWVCON around Uji. c Time–distance variations of RMS difference of PWVSPD–H around Uji

Figure 4 indicates spatial distributions of PWVSPD-H over the Uji network during 0300–0430 LT at about every 30 min. The PWV distribution is relatively homogeneous at 0300 LT. However, with time the distribution became increasingly complex, and a strong meridional gradient formed in which the PWV values in the north were higher than those in the south.

Fig. 4
figure 4

PWVSPD-H distributions observed by the Uji network about every 30 min from 0300 LT to 0430 LT on August 14, 2012

To investigate the horizontal scale of PWV variations, we analyzed the distance dependency of correlation coefficients of zenith wet delay (ZWD), that is, the vertically integrated signal delay by water vapor, of the Uji network. According to Askne and Nordius (1987), the ZWD is proportional to the PWV and the proportionality factor can be estimated from the surface temperature at the GNSS receiver (Bevis et al. 1992; Rocken et al. 1993). To eliminate the effect of errors derived from the conversion procedure from ZWD to PWV, we analyzed ZWD instead of PWV. Figure 5 shows the correlation coefficient of ZWD of the Uji network as a function of horizontal distance. The correlation coefficient was relatively high during 0600–0900 LT, the period when raining had ceased at Uji. On the other hand, when there was weak rain during 0000–0300 LT and heavy rain during 0300–0600 LT at Uji, the correlation coefficients were smaller as the precipitation intensity became strong. This observational fact means that the characteristic length scale of water vapor variability became smaller, affected by convective activity. Fitted curves indicate that the e-folding scales, where e is Napier’s constant, were 3.5 km during 0000–0300 LT and 1.9 km during 0300–0600 LT. This result is comparable to the result of Shoji et al. (2004), which analyzed the distance dependency of correlation coefficients of GNSS post-fit residuals and reported that the horizontal distance at which the correlation coefficients equaled 1/e was about 2–3 km.

Fig. 5
figure 5

Horizontal distance dependency of correlation coefficient of ZWD within the Uji network. The ZWD was converted from the SPD with the GNSS satellite at the highest elevation angle. Observational periods during 0000–0300 LT (green), 0300–0600 LT (red), and 0600–0900 LT (blue) on August 14, 2012, corresponded to weak rain, heavy rain, and no rain periods at RISH, respectively. Curves are fitted second-degree polynomials

Design of the data assimilation experiment

Assimilation system

We used the LETKF assimilation method implemented in the JMA-NHM. The NHM-LETKF system used in this study was developed by Seko et al. (2013), in which mesoscale assimilation is conducted in the outer domain with a 15-km mesh, and convective scale assimilation is conducted in the inner domain with a 1.875-km mesh. The model domains were centered at Uji (135.8°E, 34.88°N) on a Lambert conformal projection with horizontal grid points of 80 × 80 and 120 × 120 for outer and inner domains, respectively. We used a hybrid terrain-following coordinate system with 50 layers and a model top of 22.6 km. The depth of the layers increased from 40 m to 886 m as their height increased. In the outer domain, the Kain–Fritsch cumulus parameterization scheme (Kain and Fritsch 1993) was used. In the inner domain, we employed bulk cloud physics, which predicts the mixing ratios of cloud, rain, ice crystals, and graupel without a cumulus parameterization scheme.

Figure 6 shows the flowchart of the data assimilation experiment. The number of ensemble members was 40 in both cases. For the first cycle, the initial conditions of the outer domain were derived from a JMA mesoscale analysis every 6 h from 1500 LT on August 1, 2012, to 0900 LT on August 11, 2012. The outer LETKF cycle was repeated from 0900 LT on August 11, 2012, until 0900 LT on August 14, 2012, with a 6-h assimilation window. Observed data were assimilated every 1 h. Lateral boundary conditions were derived from the JMA mesoscale analysis every 6 h from 1500 LT on August 11, 2012, to 0900 LT on August 14, 2012. In the inner domain, the initial conditions in the first cycle and hourly boundary conditions were derived from the ensemble simulation results of the outer domain. The assimilation window was 1 h, and observation data were assimilated every 10 min. Nine cycles were conducted from 2100 LT on August 13, 2012, to 0600 LT on August 14, 2012. Surface and upper air sounding data used in the operational analysis at JMA, i.e., upper air sounding data (horizontal wind, temperature, and relative humidity), aircraft data (horizontal wind and temperature), wind profiler data (horizontal winds), ship and buoy data (pressure), and surface station data (pressure), were assimilated in both outer and inner domains. PWV data derived from the Uji network and nearby GEONET were assimilated only in the inner domain.

Fig. 6
figure 6

Flowchart of the data assimilation experiment using the nested NHM-LETKF system. Thick arrows indicate ensemble simulations with 40 members

Assimilation method of PWV

It is not easy to assimilate PWV by using LETKF because analysis is independently performed at each model grid point in LETKF, while PWV is not a local variable. One of the methods is to assimilate PWV as observation data at the surface where water vapor amount is large. This method modifies water vapor amount mainly near the surface by using the vertical localization because the sampling error in the background error correlation between PWV and water vapor amount generally becomes larger as the layer of the model is distant from the surface. However, we needed to modify the water vapor amount of the model also at the middle troposphere because Uji was located at the south side of a stationary front and the middle troposphere was very humid during the Uji heavy rainfall event. Therefore, in this paper, we used the assimilation method of PWV proposed by Seko et al. (2011) which retrieves relative humidity (RH) at the all layers of the model above a GNSS station before conducting the LETKF analysis. This method retrieves a RH profile above a GNSS station by modifying RH of the first-guess ensemble mean of the LETKF, considering the ensemble spread of RH and the correlation between PWV and water vapor amount estimated from the ensemble perturbations. Those intermediate profiles of RH were assimilated by LETKF. The procedure to produce the intermediate profile is as follows: (1) calculate the ensemble mean relative humidity (RHmean(k)), mixing ratio (Qmean(k)), density (ρmean(k)), and ensemble spread of relative humidity (RHspread(k)) at the position of GNSS receivers; (2) estimate the correlation coefficient (Corr(k)) between PWV and water vapor amount by using first-guess ensembles; and (3) modify the first-guess relative humidity using the following equations:

$${\text{RH}}_{\bmod } (k) = {\text{RH}}_{\text{mean}} (k) + \alpha \times {\text{RH}}_{\text{spread}}(k) \times {\text{Corr}}(k)\quad (k = 1, \ldots ,50)$$
(2)
$${\text{PWV}}_{\bmod } = \sum\limits_{k = 1}^{50} {\frac{{{\text{RH}}_{\bmod } (k)}}{{{\text{RH}}_{\text{mean}}(k)}} \times Q_{\text{mean}} (k) \times \rho_{\text{mean}} (k) \times {\text{d}}z}$$
(3)

where RHmod(k), PWVmod, and dz are the modified relative humidity, modified PWV, and thickness of layers, respectively. The coefficient α is determined so that PWVmod has the same PWV value as the observation. Considering the height difference between the real and model ground altitude at the position of the GNSS receivers, we calculated the product between the water vapor amount at the lowest layer and the altitude difference, and we added or subtracted it from the observed PWV data. This modification to the observed PWV data can be applicable only when the height difference between the real and model ground altitude at the position of the GNSS receivers is small. Therefore, PWV data were not assimilated when height differences between the real and model ground altitude exceeded 50 m. Locations of the eliminated stations for this reason are indicated in Fig. 1a. PWV data were also not assimilated when the PWV difference between the model and observation was larger than 5 mm. For the GNSS stations shown in Fig. 1b, both PWVCON and PWVSPD-H were used for data assimilation, while, for other GEONET stations not included in Fig. 1b, only PWVCON was used for data assimilation.

Settings of localizations

As noted in earlier studies (Houtekamer and Mitchell 1998; Hamil et al. 2001), covariance localization is needed in the ensemble Kalman filter to handle the problem of sampling errors due to the limited ensemble member size. In the LETKF, an observation localization is adopted such that the inverse of the localization function is multiplied by the observation error covariance (Hunt et al. 2007, Miyoshi et al. 2007). The localization function used in this study was the following Gaussian function:

$$w(r) = \exp \left( { - \frac{{r^{2} }}{{2\sigma^{2} }}} \right),$$
(4)

where σ is the parameter that determines the localization scale and r indicates the physical distance between the analysis grid point and the observation point. Because this function does not go to zero, the covariance values were forced to equal zero outside the following localization radius:

$$r = 2\sqrt {\frac{10}{3} }\sigma$$
(5)

That is, only the observed data inside a circle with radius r are assimilated. The σ values of five grids in the horizontal direction and three layers in the vertical direction were used in the previous study, which corresponded to a horizontal localization radius of 34.2 km in the inner LETKF. The vertical error correlation of relative humidity is generally large over a rainfall region. To reduce the vertical error correlation of the assimilation data, we thinned out the retrieved humidity data every four layers. In addition, we set the vertical localization parameter “σ” equal to the length of one layer.

In the conventional setting of the NHM-LETKF, a single horizontal localization radius of 34.2 km was used. However, this radius is larger than the horizontal scale of convection, i.e., a few km, and scale of background error correlation of PWV around convection. When the ensemble simulation reproduces observed precipitation, assimilation of PWV observed in the precipitation area makes the simulation accuracy worse at distant grids from the observation site. This is because of sampling errors due to limited ensemble size. In contrast, when the ensemble simulation does not reproduce observed precipitation, a wide area around the GNSS receiver is incorrectly moistened by assimilating PWV with the large localization radius. To deal with these problems, we make the localization radius smaller when precipitation is observed. Figure 7 shows the background error correlation of PWV derived from a numerical model as a function of the horizontal distance. The model-derived PWV values were calculated by vertically integrating the products of the density and water vapor mixing ratio at each model layer. The error correlations were analyzed using a 500-m mesh ensemble dataset with 40 members. Correlation scales were analyzed for a rain-free area, weak rain area, and heavy rain area, by dividing the rain rates (R [mm/h]) into the three intensity levels, i.e., R < 0.1, 0.1 ≤ R < 10, and R ≥ 10. The e-folding scales for the correlations were 30.8, 7.5, and 4.8 km for the rain-free, weak rain, and heavy rain areas, respectively. This result suggests that if we use the localization radius of 34.2 km, the assimilation of PWV observed in the precipitating region introduces analysis errors because of the effects of spurious error correlations that were not eliminated by the large localization. Therefore, we investigate whether the simulation accuracy can be improved by using smaller localization radii to assimilate RHmod data converted from GNSS-derived PWV over the precipitating region. Although increasing the ensemble size or model resolution is effective alternative methods to estimate the appropriate background error covariance, we fixed the ensemble size and model resolution in order to evaluate the effects of using small localization radii. We adopted multiple horizontal localization radii, depending on the rain rates observed by weather radar. This multi-localization setting is called “SLOC,” hereinafter (Table 1), while the experiment using the single localization of 34.2 km is named “CNTL.” Figure 8 shows a flowchart showing how to set the localization radii in the SLOC experiment. In the SLOC experiment, the horizontal localization radii on each grid point for assimilating RHmod data converted from GNSS-derived PWV were determined by estimating the precipitation intensity at the analysis grid point from the interpolation of JMA weather radar data. The RHmod data converted from GNSS-derived PWV were assimilated with the horizontal localization radius of SLOC only when the radar-derived precipitation intensity at the GNSS stations belonged to the same intensity level at the analysis grid point. Before conducting the LETKF analysis, we made a list file which wrote combinations of analysis grid (i, j, k), RHmod data and its latitude/longitude information, and localization radius “r.” This file is read by LETKF program to change the localization radius to assimilate RHmod data.

Fig. 7
figure 7

Horizontal distance dependency of error correlation of PWV. Black, blue, and red colors indicate correlation coefficients at rain-free, weak rain, and heavy rainfall areas, respectively

Table 1 Setting of localization radii for SLOC
Fig. 8
figure 8

Flowchart illustrating how the localization radii in the SLOC experiment were determined. R m and R o indicate precipitation intensity observed by weather radar at the analysis grid point, and the GNSS station, respectively. The character “r” indicates the localization radius needed to assimilate RHmod data converted from the GNSS-derived PWV

Experimental results were evaluated by the ensemble mean of the hourly accumulated rainfall amount in the inner domain, where the radar rainfall data of JMA were referenced. We analyzed the spatial average of the root-mean-square error (RMSE) in the enclosed area that included the MCS over the Uji network, i.e., the area delimited by the rectangle A shown in Fig. 1a. We calculated the improvement rates (IR) of RMSE using the equation below:

$${\text{IR}} = \frac{{{\text{RMSE}}_{{{\text{w}}/{\text{o}}\;{\text{PWV}}}} - {\text{RMSE}}_{{{\text{w}}/\;{\text{PWV}}}} }}{{{\text{RMSE}}_{{{\text{w}}/{\text{o}}\;{\text{PWV}}}} }} \times 100,$$
(6)

where “w/o PWV” indicates an experiment without assimilating PWV data. The sensitivity of the simulation accuracy of the MCS to the mean inter-station distance was investigated by thinning out the PWV data of the Uji network. Figure 9 shows the distribution of GNSS stations used in each thinning experiment. The code for each experiment and the settings of the thinning experiments are described in Table 2. We aimed to clarify the following two effects by the experiments: (1) effect of using multi-localization scale to remove sampling errors and (2) the number of assimilated PWV (i.e., information content) and observation error correlation. We investigated the first issue by comparing CNTL8.0km and SLOC8.0km or CNTL3.5km and SLOC3.5km. Next, we compared the results of five SLOC experiments to discuss a trade-off problem of the information content and observation error correlation.

Fig. 9
figure 9

Distributions of GNSS stations around Uji used in the thinning experiments with mean inter-station distances of 8.0, 4.2, 3.5, 2.9, and 1.7 km. Blue- and red-colored triangles indicate stations of the Uji network and GEONET, respectively

Table 2 Code for each experiment and the settings of the thinning experiments

Results of the data assimilation experiment

Figure 10 shows the mean RMSE of PWV at 0200, 0300, 0400, and 0500 LT on the 14th, analyzed by the inner LETKF. The RMSE values were calculated by using the GNSS-derived PWV data observed at the ten GNSS stations in Fig. 1b. RMSEs of the analyzed PWV of the SLOC experiments were smaller than those of the CNTL experiments. This result suggests that small horizontal localization radii are needed to reproduce small-scale, i.e., less than 10 km, PWV distributions by ensemble data assimilation. In the SLOC experiments, the RMSE of the analyzed PWV was smallest when the horizontal resolution of the assimilated PWV data was 2.9 and 1.7 km. Figure 11 shows the horizontal distribution of the analysis increment of PWV for each SLOC experiment, which assimilated PWVSPD-H data at 0500 LT on the 14th. At this moment, a strong meridional gradient of PWV formed over the Uji network (Fig. 4). The observed meridional gradient of PWV was not well reflected in the analysis increment of the PWV of the SLOC8.0km experiment in which only GEONET-PWV data were assimilated. In contrast, the observed meridional gradient of PWV was well reflected in the analysis increment of PWV when the horizontal resolutions of the assimilated PWV data were 3.5, 2.9, and 1.7 km.

Fig. 10
figure 10

RMSE of PWV analyzed by the inner LETKF. RMS values are averaged during 0200–0600 LT on August 14, 2012. Experiments indicated by the plus symbols are assimilated PWVSPD-H with the SLOC localization setting. Experiments indicated by the cross symbols are assimilated PWVCON with the CNTL localization setting. The GNSS-PWVSPD-H data used to evaluate RMSEs were observed at the 10 stations shown in Fig. 1b, which are the same data assimilated in the SLOC experiment with the mean inter-station distance of 1.7 km

Fig. 11
figure 11

Distributions of the analysis increment of PWV at 0500 LT on August 14, 2012, analyzed by the SLOC experiments that assimilated PWVSPD-H at the stations indicated by triangle symbols. Open triangles and filled triangles indicate stations of the Uji network and GEONET, respectively

We show in Fig. 12 the assimilation effects of high-resolution PWV on the simulation accuracy of the hourly rainfall amount. In the experiments in which the horizontal resolution of the assimilated PWV data was 8.0 km around Uji, the simulation accuracy of the SLOC experiment was better than that of CNTL. This is because we eliminated the spurious error correlations of PWV between distant locations by applying the small localization radii over the precipitating region. This result suggests that the horizontal localization scale in the precipitating region should be smaller than that in the non-precipitating region when we assimilate the GEONET-derived PWV data. In the “CNTL” experiment, the simulation accuracy of the hourly rainfall amount was degraded when the mean inter-station distance of the assimilated PWV around Uji was decreased from 8.0 to 3.5 km. In the nested NHM-LETKF, data assimilation is conducted on the assumption that observation data are uncorrelated. The simulation accuracy of the CNTL experiment decreased when the horizontal spacing of assimilated PWV data was 3.5 km probably because the assimilated PWV data around Uji were correlated. In contrast, in the “SLOC” experiment, the simulation accuracy was most improved when the inter-station distance of the PWV data around Uji was 3.5 km. We inferred that the influence of the observation error correlation of PWV was small because the number of PWV data used in the LETKF analysis in each grid point was decreased with the use of the small horizontal localization radii. Two kinds of “SLOC” experiments were conducted here (Fig. 12)—assimilation of PWVCON (black) and PWVSPD-H (red). In all thinning experiments, the results of assimilating PWVSPD-H were better than the results of assimilating PWVCON. As already explained in section "GNSS-derived PWV observed by the Uji network", PWVCON is estimated as a spatially averaged water vapor amount within an inverse cone defined by the low elevation cutoff angle. Therefore, PWVCON at each GNSS stations of the Uji network observed same portions of the atmosphere, probably having larger observation error correlation than PWVSPD-H data. We inferred that the results of the SLOC experiments assimilating PWVSPD-H were better than those of assimilating PWVCON because the observation error correlation of PWVSPD-H was smaller than that of PWVCON. However, quantitative examination about the observation error correlation of PWV should be investigated in future study.

Fig. 12
figure 12

Improvement rates of RMSE of hourly rainfall amount simulated by the inner domain model as a function of the inter-station distances of the assimilated PWV data. To calculate the mean improvements of RMSE, we used the ensemble mean data of each inner model cycle during 0200–0600 LT on August 14, 2012, i.e., four samples. Red- and black-colored symbols indicate that assimilated PWV data around Uji shown in Fig. 1b were PWVSPD-H and PWVCON, respectively, using the SLOC localization setting. The blue-colored symbol indicates that assimilated PWV data were all PWVCON, using CNTL localization setting

Figure 13a shows horizontal distribution of 1-h rainfall amount at 0600 LT on 14 simulated by SLOC3.5km, and Fig. 13b shows horizontal distributions of difference of 1-h rainfall amount between SLOC3.5km and CNTL8.0km. Rainfall amount simulated by SLOC3.5km increased compared to that of CNTL8.0km along the northern side of the MCS where the Uji network was located (Fig. 13b). In contrast, rainfall amount simulated by SLOC3.5km decreased at the southern side of the MCS. It was found that simulation accuracy mainly improved at the leeward side of the Uji network.

Fig. 13
figure 13

a Horizontal distributions of 1-h rainfall amount at 0600 LT on August 14, 2012, simulated by SLOC3.5km. b Horizontal distributions of difference of 1-h rainfall amount between SLOC3.5km and CNTL8.0km (shade) at 0600 LT on August 14, 2012. Solid (broken) line indicates that RMSE of 1-h rainfall of the SLOC3.5km is 1 mm smaller (larger) than that of CNTL8.0km. Open triangles and black squares indicate GNSS stations used in SLOC3.5km of the Uji network and GEONET, respectively. c Horizontal distributions of 1-h rainfall amount at 0600 LT on August 14, 2012, observed by JMA weather radar

Summary and discussion

We first investigated the characteristic length scale of water vapor variability observed by the Uji GNSS network. The e-folding distance of the correlation coefficients of the observed ZWD was 1.9–3.5 km when precipitation was observed around the network. We also analyzed the scale of the background error correlations of PWV by using 500-m mesh ensemble data with 40 members. Correlation scales were analyzed for rain-free areas, weak rain areas, and heavy rain areas by dividing the rain rates (R [mm/h]) into three intensity levels, i.e., R < 0.1, 0.1 ≤ R < 10, R ≥ 10. It was found that the e-folding scales of correlations were 30.8, 7.5, and 4.8 km for the rain-free, weak rain, and heavy rain areas, respectively. Therefore, a smaller horizontal localization radius is recommended to assimilate PWV data over the precipitating regions.

Using the nested NHM-LETKF system, we conducted an assimilation experiment of high-resolution PWV data from the Uji network. Although a single localization radius of 34.2 km was used in the earlier conventional LETKF experiment, we applied small localization radii, depending on the rain rates (R [mm/h]) observed by weather radar, i.e., 30.8 km (R < 0.1), 7.5 km (0.1 ≤ R < 10), and 4.8 km (R ≥ 10). By using multiple localization radii over the rainfall area, the accuracy of both PWV analyzed by LETKF and the simulation results of the hourly rainfall was improved. The result was improved because we eliminated the spurious error correlations of PWV between distant locations by applying small localization radii over the precipitating region. The use of small localization radii was also effective in reducing the influence of observation error correlation of PWV around the hyper-dense GNSS receiver network.

Miyoshi et al. (2014) conducted 10,240-member LETKF with an intermediate atmospheric global circulation model (AGCM) and revealed meaningful long-range error correlations at continental scales. Kunii (2014) investigated the influence of sampling noise on the background error covariance by using 1000-member ensemble forecasting with the JMA-NHM. It was reported that an ensemble size of 500 would be large enough to approximate the error covariance under the configuration with a horizontal resolution of 15 km. We adopted an ensemble size of 40, so localization was needed for such a small ensemble size. Using multiple localization scales, we tried to solve the problem of sampling errors in the background error covariance. A grid size of less than a few km was needed to capture the structure of the convective scale background error covariance. In this study, we set the grid size to 1.875 km to reproduce the heavy rainfall.

In the assimilation experiment, we used PWV data that were converted from slant delay data at the highest elevation angle. This conversion procedure may introduce additional observational errors, so we consider that the direct assimilation of SWV or SPD (e.g., Kawabata et al. 2013) is preferable to get more accurate analysis results. However, it is computationally expensive to assimilate SWV directly in the SLOC experiments because we need to estimate the precipitation intensity at each point where the slant paths of the SWV intersect each model layer and apply different localization scales for each point. Therefore, we converted SWV at the highest elevation angle into PWVSPD-H to shorten the time for determining whether the assimilated relative humidity converted from GNSS water vapor data existed in the rainfall area or not. During the Uji heavy rainfall event on August 13–14, 2012, the QZSS satellite did not exist at a high elevation angle. Therefore, we used GPS satellites above 60° elevation angles to estimate the PWV distribution within the Uji network. The error due to the mapping will be greatly reduced when the satellite constellation of the four QZSS is completed, and at least one QZSS satellite remains near the zenith. Shoji et al. (2014, 2015) showed that the error of PWVSPD-H derived from mapping reaches its minimum at the location where the line of sight reaches the scale height of water vapor. Assimilating the PWVSPD-H data as PWV just over the location where the line of sight reaches the scale height of water vapor is recommended. However, we assimilated PWVSPD-H just over the GNSS stations because the elevation angles were greater than 60°.

The thinning experiments showed that in the case of the Uji heavy rainfall event, the simulation accuracy was most improved when the mean inter-station distance around Uji was 3.5 km. This result is consistent with the observed characteristic length scale of ZWD (1.9–3.5 km) during the rainfall, suggesting that the optimum spatial resolution of the PWV measurement is related to the characteristic length scale of water vapor variability.

By using the PWV data observed by the hyper-dense GNSS receiver network, the present case study demonstrated that the characteristic length scale of water vapor variability changed significantly depending on precipitation intensity. It is important to consider the scales of the variability of water vapor to improve the simulation accuracy when PWV is assimilated.