Variability of sampling adjustment factors for extreme rainfall in Germany

The sampling adjustment factor (SAF) can correct the underestimation of fixed time interval maxima (F-Maxima) relative to maxima of moving time windows (M-Maxima) as a direct consequence of temporal discretization of time series. Radar data can help to understand the significance of using moving window aggregation rather than fixed window. Here, we investigate SAFs for two gridded radar quantitative precipitation estimates from the German Meteorological Service (RADKLIM-RW and RADKLIM-YW, DWD 2018) with different temporal resolutions (5 min and hourly) and a spatial resolution of 1 km ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 1 km from 2001 to 2016. For each grid cell’s time series within Germany, the overall maximum intensities per chosen duration were derived with (1) moving and (2) fixed window aggregation. Correction factors (SAF) from fixed to moving window were retrieved. The findings of this study partly match previous findings, more for the 5-min product than for the hourly product. No clear dependency on topography or season could be derived from the analysis. It showed that high SAF are usually related to high(er) actual rainfall values. The study emphasises the probabilistic nature of the rainfall maxima correction and shows that it is important to not only consider average SAFs, but also take an in-depth look at the distribution when correcting maxima. As a consequence, design precipitation as used, e.g., in context of flash or urban floods could profit from spatially adjusted or uniform scaling, depending on the characteristics found in the spatial distribution of the SAFs.


Introduction
A major obstacle in extreme rainfall analysis is that values are relatively coarsely represented due to the inability of measuring continuous rainfall at infinitesimal spatio-temporal scales over a long period of time (Papalexiou et al. 2016). Especially small but heavy rainfall events are difficult to capture and predict.
One common problem regarding the identification of rainfall maxima for certain time intervals (e.g., sub-hourly to annual or historical maxima) is the rainfall recording.
Currently, long and reliable records are available mainly from rain gauge networks, which lack in spatial coverage. Throughout the world, there exist multiple small to large networks of rain gauge measurements with different network characteristics, data record lengths and temporal resolution. Before the advent of data-loggers, rainfall data were always characterised by coarse temporal resolution (Morbidelli et al. 2021) and until today, sub-hourly accurate measurements from rain gauges are available at certain locations only and data based on daily and hourly measurements are still widely used in analysing rainfall climatologies (Yoo et al. 2015;Dunkerley 2018;Darwish et al. 2021).
For example, daily or hourly values are often and have been for a long time measured beginning at a fixed time (e.g., every full hour or every 7:00 AM) for a given time interval. It is long known that this can result in a bias that is difficult to ignore and can underestimate extreme values significantly. Also, many times rainfall records are collected at high resolution, but aggregated before saving to decrease computing times and storage requirements (Morbidelli et al. 2018). These so-called fixed maxima (F-Maxima) can in worst case underestimate rainfall records up to 50 % compared to maxima based on moving window (M-Maxima) (Morbidelli et al. 2017;Meier et al. 2016Meier et al. , 2019. F-Maxima of, for example, historical time series with lower temporal resolution can be corrected with the help of correction factors, strictly larger than 1, called, i.e., sampling adjustment factors (SAF), Hershfield factor (H) or clock hour correction factor (CHCF) (Hershfield 1961;Young and McEnroe 2003;Papalexiou et al. 2016;Ghate and Timbadiya 2022). Multiple studies in the past have analysed the differences of M-Maxima and F-Maxima based on gauge data for different temporal aggregations and time periods and derived empirical SAF values for different durations since the 1950 s [i.e.,] (Keifer and Chu 1957). Early studies compared the SAF values for calender days and 24 h (H = 1.13 (Hershfield 1961), H = 1.15 (Harihara Ayyar and Tripathi 1973), H = 1.11 (Demarée 1985). Van Montfort (1990) calculated F-maxima for 1 to 3 calender days for all months of 67 years for one station in New Zealand, and later analysed 130 stations of daily precipitation for 30 years in Eastern China (Van Montfort 1997). Other studies derived empirical correction factors for different regions, durations and temporal resolution of the original data. Some results are based on very few or short time series, such as Jakob et al. (2005), who evaluated seven Australian cities with a representative SAF = 1.15, Dwyer and Reed (1994) with SAF values of 1.167 for daily rainfall, or Fowler et al. (2005), who derived SAF values of 1.16, 1.11, 1.035, and 1.005 for 1,2,5, and 10 days between 1961 to 1990). Other studies used multiple time series or very high resolutions down to 1 min. Morbidelli et al. (2017), for example, found that the SAF factors decrease with longer aggregation windows while analysing data of 1 min resolution in Central Italy, Muñoz Proboste (2018) used 34 years of data for 52 stations in Switzerland and derived SAF factors of 1.04 to 1.22 for 20 to 120 min, Llabrés-Brustenga et al. (2020) evaluated spatial pattern of 120 weather stations, Marasini (2020) analysed 10 years of data for 809 stations in Germany with a 1 min resolution, and Hnilica et al. (2021) analysed 20 years of 1 min data for 23 gauging stations in the Prague region. Young and McEnroe (2003) focused on the ratio of the duration of interest to the sampling interval and showed a consistency across durations from 1 to 24 h. The empirical derivations come to similar conclusions; however, there is yet no fundamental understanding of the variability and thus usefulness of these factors. There exist also some theoretical assumptions on the conversion of fixed to true interval rainfall. However, there are couple of questions involved, such as if similar correction factors can be used for different time windows or if they are dependant on weather type, regional or seasonal features. Also, the influence of rainfall characteristics is unclear. Weiss (1964) introduced some very simple statistical concept that did not consider the problems mentioned, but found similar factors as those from empirical studies. Yoo et al. (2015) updated the concept of Weiss (1964) by including different temporal rainfall distributions; however, the results do not match well with empirical observations. Finally, Papalexiou et al. (2016) proposed the SAF to be a rather random factor, using hourly 7127 stations in the USA. He further introduced a method to correct both mean and standard deviation of F-Maxima.
With ongoing technique, new possibilities arise for analysing the spatial distribution and characteristics of these correction factor. Remotely sensed rainfall, like from radar, have the advantage of a high temporal resolution and spatial coverage. Though maximum rainfall values from rain gauge measurements can usually not fully been found in remotely sensed data (Breña-Naranjo et al. 2015) and sub-pixel variability of rainfall is hard to capture (Fabry 1996;Peleg et al. 2013;Cristiano et al. 2017;Peleg et al. 2018), radarbased quantitative precipitation estimates (QPE) showed to be a useful tool for spatio-temporal rainfall analyses as they combine information from weather radar precipitation fields with ground-based truth from rain gauges (Panziera et al. 2018).
The aim of this study is to assess the variability of SAF for whole Germany based on radar QPE of 1 km spatial resolution and two different temporal resolution (5 min and hourly). Maximum rainfall values are derived from time series based on around 400,000 radar cells with fixed and moving window sampling for time intervals from 10 min to 3 days. F-and M-Maxima are calculated for each considered duration with focus on sub-daily and sub-hourly time intervals. F-and M-Maxima are analysed with the help of SAFs regarding their spatial variability. With this, rain gauge-based assumptions for correcting F-Maxima shall be revised.

Quantitative precipitation estimates (QPE)
The German National Meteorological Service (DWD) is providing different rainfall products that are based on radar information from the radar network of 17 German C-band radars and radars from neighbouring countries. The so-called radar climatology project of the DWD (RADarKLIMatologie, RADKLIM, Winterrath et al. (2017)) has reanalysed the complete radar data since 2001 with a consistent method and has published the data freely available in two products (version 3 available until 2022 as of today): (1) RADKLIM-RW is an hourly precipitations product resulting from radar based precipitation estimates that are calibrated with ground stations (see , and for a product evaluation Lengfeld et al. 2019); 2) RADKLIM-YW ) is a 5-min product resulting from a correction/factoring of DWD's 5-min product RADOLAN-RY with the help of RADOLAN-RH and RADKLIM-RW on a sequential hourly base (for more information on RADKLIM-YW compare) (Kreklow and Kuhnt 2019).
Both RADKLIM-YW and RADKLIM-RW versions 2017.002 were used in this study in order to account for different temporal resolution. We evaluated years 2001 to 2016 due to comparison reasons with another study at our institute. Both products come with a spatial resolution of 1 km and a total area of the product of 1100 × 900 grid cells, where around half of the cells contain values due to the shape of Germany.
The data contains a lot of missing values (NA) (1) in the vicinity of the border in parts of Eastern, Northern and Southern Germany due to changes and ongoing extension of the radar network (some time series are for example only available from 2014 on) and (2) due to malfunctioning of the radar or general (radar) errors. Generally, missing hours of less than 10 % in most areas are still a very good coverage (Lengfeld et al. 2019).
Generally, the RADKLIM-QPE seems to underestimate values for regions with higher elevations, potentially due to radar function and sparse gauging information. We tested this for the higher elevated Ore Mountains in the South of Saxony, and the QPE showed significant lower precipitation sums than those we produced with rain gauge records. It is rather unproblematic for our analysis, since we use the same data set for the analysis and this error will not affect the comparison.

Maximum rainfall intensities
The maximum rainfall intensities for different intervals were retrieved by aggregating both data sets for each duration of interest with fixed and moving window sampling using the R package RccpRoll (Ushey 2018).

Overall maxima
For each time series (grid cell, denoted as cell) and each duration τ , the fixed and moving window maxima F τ,cell max and M τ,cell max are calculated for the whole time period of 16 years. The maxima are calculated in a first step for each grid cell. The maxima for all grid cells (N∼400,000 cells) for each τ are then calculated according to Eqs. 1 and 2.
We were only interested in looking at grid cells that contain time series with high precipitation maxima. This was done to later be able to analyse the "severity" of underestimating very high precipitation values better. Thus, we based the choice on RW maxima with moving window approach. The ∼400,000 grid cells were reduced by (1) calculating percentiles for each window and (2) only keeping those grid cells that exceed the 75 % percentiles for each window. This kept ∼13,000 grid cells. To reduce the grid cells to around 10,000 grid cells, additionally, the 75 % percentiles for YW maxima (moving window) for 48 and 72 h were chosen. This resulted in 10,048 grid cells for both RADKLIM products displayed in Fig. 1.

Sampling adjustment and difference factors
Based on both fixed and moving maxima, the sampling adjustment factor (SAF) is calculated as ratio of the "true" maximum M max and "fixed" maximum F max for each τ (Eq. 5) as well as the average over all τ (Eq. 6) (Young and McEnroe (2003); Papalexiou et al. (2016); Meier et al. (2019)).
Note that in the classical definition, the SAF factor is not estimated as the mean value of all estimated SAF factors, i.e., S AF (y) = E M τ max,(y) /F τ max,(y) , but equals the ratio of the moving maxima mean value to the fixed maxima mean value.
The factor is calculated independently of the concurrent occurrence of both F-and M-Maxima. For the annual analysis, this approach is meaningful, since using values from the same event only can lead to an underestimation of the correction factor. For the overall analysis, this approach is more critical, since it would be difficult to correct values from 2001 with values from 2016 in worst case. We consider the potential problem rather negligible since when doing a random sample analysis to check for the date of occurrence, 95% of the M-Maxima happened concurrently with the F-Maxima (overall analysis).
We additionally introduced a new measure called sampling difference factor (SDF), since the SAF does not consider the "severity" of deviation. An SAF equal to 2 would have a higher consequence if it means that instead of 100 mm 200 mm were observed for an individual time window than if instead of 10 mm 20 mm were observed. The SDF is calculated for each τ according to Eq. 7.
Both SAF and SDF were also calculated for the annual maxima (denoted as y in Eqs. 6 and 7).

Overall rainfall maxima for whole Germany
The highest maximum rainfall values retrieved with fixed and moving window aggregation for whole Germany F τ max and M τ max are displayed in Fig. 2. Additionally, maxima from German rain gauge data for a very long period from 1855 to 2014 are shown as a reference. The rain gauge-based values follow a scaling relationship as expected from literature [e.g.,] (Jennings 1950). A detailed evaluation of the (non-)scaling behaviour of the QPE based maximum rainfall (M τ max ) is carried out in Pöschmann et al. (2021).
For most parts, F τ max and M τ max values are lower than those taken from longer rain gauge time series, high likely due to the shorter measurement period of only 16 years for RAD-KLIM. Especially, F τ max are more or less consistently below the rain gauge values. Interestingly, M τ max values exceed rain gauge maxima from 24 h onwards. This indicates that a higher spatial coverage together with a moving window analysis will tremendously improve the knowledge of maximum values.
When comparing the two radar-based products, RAD-KLIM-YW shows lower values than the hourly product up to 16 h. Above that, RADKLIM-YW (moving window) has higher values than those of RADKLIM-RW (moving window).
As shown in Fig. 3, the maxima from Fig. 2 do not necessarily overlap in space for RADKLIM-RW and RADKLIM-YW for some durations. However, there is almost no difference when comparing locations for F τ max and M τ max . Overall, the maxima are not very scattered to different locations, but can be identified at six grid cells within Germany. When looking at sub-hourly resolution of RADKLIM-YW, there are more locations and more variations (not shown in this figure).

Spatial distribution of F-Maxima, M-Maxima and SAF
The difference of moving and fixed window sampling can also be seen in the spatial comparison as shown in Fig. 4 (exemplarily shown for RADKLIM-YW). Moving window sampling gives higher maximum intensities, but is not similarly distributed. Especially for those time scales where single rainfall events occur, moving window sampling gives a significantly better spatial representation of the maximum intensities. At the example of 12 h as displayed in Fig. 4a, the differences are really strong. The figure shows around three fields of higher rainfall intensities that are potentially related to single events. From the figure, it shows that "true" patterns of rainfall can be identified better with moving window aggregation. This can be seen at the example of a large pattern in the Eastern part of Germany that evolved most likely during the desastrous August 2002 flood. When comparing moving and fixed window, it shows that the area of higher rainfall is much larger for the moving aggregation than when using fixed aggregation. Additionally, the areas of large rainfall values are connected more than for fixed window aggregation. For other time scales, this effect is especially visible in the Alpine and mountainous regions. The different effects of different sampling methods can be further inspected with the help of the sampling adjustment factor (SAF). Figure 4b shows the spatial variability of SAFs for the whole of Germany and for different aggregation states. Interestingly, the SAF varies a lot between the different temporal aggregation steps and spatially. This causes different spatial SAF pattern for different time scales, with no evident reason for this high variability. The figure clearly underlines the importance of investigating in the way data is collected and aggregated temporally and spatially. It seems that the SAF factors are rather randomly distributed over Germany, so a focus on regional characteristics was not further emphasised.

SAF for annual maxima
No pattern became visible from the spatial evaluation of the overall SAF; however, it looked like that the average SAF will get lower the more time steps are aggregated. This can be seen in Fig. 5 where the empirical average SAF factors are depicted for temporal aggregations between 2 and 72 (RADKLIM-RW in hours, RADKLIM-YW in 5 min steps).
There is a clear difference between the hourly based RW curves and the 5-min-based YW curves. For RW, the average SAFs are higher for the annual analyses than what is found in literature. Most averages exceed values of 1.15. When comparing the different data lengths of RADKLIM-RW, there is not so much difference. For RADKLIM-YW, all curves are much lower than those of RADKLIM-RW with values similar to those found in literature around 1.14. When comparing the annual based analyses with the overall analyses, it showed that the overall SAF values are lower than the annual-based ones. This could show that when comparing higher maxima, for example, just the highest maxima of the whole time series for one grid cell instead of annual   Fig. 6. Curves for RADKLIM-YW look quite similar for all seasons and show values of around 1.14 similar to those in literature. RADKLIM-RW on the other hand is more variable, even more for the seasonal analysis and goes up to SAF equal to 1.2. Also, instead of getting lower, in most cases, the average SAF increases for RADKLIM-RW, contrary to our expectations and contrary to RADKLIM-YW that shows a decline in most cases.
One possible explanation is that this study did not take into account different ways of totalizing rainfall (for fixed window sampling). Additionally, since most studies in the past base their findings on a rather low amount of rain gauges, it might also be possible that the theoretical assumption is not necessarily true and that indeed Papalexiou et al. (2016) with some thousands of rain gauges analysed is correct, who associated SAFs with a very strong random characteristic.
The curves are fitted by using the simple parametric function proposed by Papalexiou et al. (2016) with parameters a, b and c retrieved for all cases. The annual and seasonal relationships can be fitted quite well for most cases (parameters for the annual analyses are provided in Table 1). This is not true for overall SAFs based on YW data. For the overall relationship, no curve could be fitted with the suggested Eq. 8.
The curves are fitted by using a different simple parametric function proposed by Papalexiou et al. (2016), and parameters are provided in Table 2.

Probability and occurrences of SAFs
Besides average values for SAF, it is important to take a look at deviations from the mean, especially when thinking of high and very low SAFs. Figure 7 shows the probability distributions of SAF to equal 1 for annual as well as overall SAF and the different RADKLIM products between 2001 and 2016. As expected, the probability decreases quickly with higher aggregation states. The drop is quicker for RADKLIM-YW. It looks additionally like the probability increases a little bit for higher aggregation states. Compared to the study of Papalexiou et al. (2016), SAF to equal 1 occurs more rarely in our study (annually based)

Annual Overall
Probability of SAF = 1 (-) and even less for the overall values. Again, the shape of the YW-curves look more alike than those of RW. The overall distribution of SAF is displayed in Fig. 8 as conditional SAF across different time scales (in hours) for the annual analysis. It is obvious that RADKLIM-RW and -YW differ for the 2 h time scale, because YW is already aggregated 24 × 5 min and has thus lower occurrence of SAF equal to 1. However, this difference quickly evens out, and the remaining time scales displayed show very similar distribution (just a little more SAF equal to 1 for RW). Interestingly, the higher the time scale (e.g., 12 or 24 h) higher values of SAF seem to occur a little more often. The distributions are very similar when looking at the overall conditional SAF distribution (Fig. 10 in Appendix), except of the different number of occurrence (not in years, but in grid cells). It shows that the overall distributions have a lower occurrence of higher SAFs. One explanation could be that if taking fixed and moving maxima out of the whole time period sample, the chances are higher to find a fixed maxima value more similar to the moving maxima value. If looking at the yearly values, it is more likely that the fixed window by chance really only captures half of the moving window maximum value, i.e., if there is only one big event within the year for example.
As mentioned before, it is of special interest what SAF equal to 1.5 or higher refer to in actual rainfall values (mm) and how often very high values occur for high SAF. Additionally, to the annual values, the overall comparison is shown in Fig. 11 of the Supplement Information. Generally, both RADKLIM products show a very similar distribution, when separately looking at overall and annual-based values. Figure 9 compares SAFs for different hourly scales and their corresponding SDF.
For the annual analysis, SAFs equal to 2 refer to SDF values of up to 100 mm in 24 h. There is also a lower boundary for all hourly scales that shows a linear course. As expected, the differences between the 2-h and 24-h scale is rather low. Maximum SDF for SAF > 1.5 are between 40 and 80 mm, which would mean that for 12 h, it would be 3.3 to 6.6 mm/h underestimation of "true" rainfall values, but for 2 h already 20 mm/h to 40 mm/h. From the figure, we assume that for most cases, the differences between moving and fixed window maxima are based on lower aggregation states. The relationship of SDF and SAF looks quite different for the overall values. Annual-based relationships are more scattered and have a wider range. The overall relationships are more narrow, and though the minimum SDFs show a straight course, they have a much higher slope compared to the annual ones. Another observation is that the maximum values of SDF are lower, especially for 24 h. These interesting deviation can be again explained as before: The chances are higher if only taking the whole time period that fixed and moving maxima are based on two events that are more similar; thus, lower SAF values are expected in general. However, because of the higher grid cells sample size, there are a lot of additional pairs considered than when looking at the selected grid cells for the annual analysis. That is why there are still a lot of pairs in the overall data set giving SAF of 2, but with lower SDF values.

Summary and conclusion
The study performed a very large analysis of two radar QPE of different temporal resolutions (1 h and 5 min) for the whole of Germany. Annual as well as overall fixed and moving interval maxima for several time scales were estimated, giving an unprecedented big data sample. The F-and M-Maxima were used to calculate the corresponding sampling adjustment factors (SAFs). Average values for each time scale as well as yearly, seasonal, regional and individual grid cells were evaluated. Our study shows a significantly high variability of the SAFs in space and time. The spatial distribution in space was much more variable the lower the aggregation step has been. This is because the underlying maxima also are not so spatially variable the longer the duration considered is.
When comparing the results with point-based studies, for RADKLIM-YW, similar SAFs equal to 1.14 can be found. This is in contrast to RADKLIM-RW, where values are in most cases higher than literature averages. Proposed equations for describing the behaviour of SAF over time scales as well as the probability of SAF equal to 1 were very suitable to fit our data to. Overall, RADKLIM-YW showed quite consistent average SAF compared to RADKLIM-RW.
Our study showed that it is important to not only consider average SAF for the individual time scales, but also take an in-depth look at the distribution of SAF when correcting fixed maxima. As expected, all values between 1 and 2 were possible for SAF, however, with a lower likelihood for higher SAFs. Nevertheless, higher SAFs are usually paired with higher "true" values of rainfall, and SAF equal to 2 can refer up to 40 mm/h for our data sample. Interestingly, the distribution of SAF as well as the "true" rainfall values do not change too much between 2 and 24 h. This indicates that higher SAFs are much more critical for lower time scales (i.e., 2 h), because it refers to much higher hourly values than for example for 24 h.
Though it is not recommended to extract single maxima per record, the findings still revealed interesting insights. We assumed that it might be possible to trade the large amount of grid cell values with annual values (thus trading space with time in a way). This however did not work out well. SAFs based on the overall analysis were generally lower than those of SAFs based on annual maxima. The reason most likely is that if taking longer time periods and neglecting the joint occurrence of fixed and moving maxima, the probability is higher that a fixed maxima in the whole data set can be found that is more comparable to the moving maxima than the "same-event" fixed maxima is. It is thus really important to consider this for further analyses.
Generally, there is a quite good consistency between RADKLIM products RW and YW. The discrepancies are explainable by the different processing as well as the fact that RADKLIM-RW already has a underlying fixed aggregation, because it is aggregated from a 5-min radar product. This means that moving window maxima for RADKLIM-RW are already a strong deviation from reality, which was not further analysed in this study. The different underlying temporal resolutions of 5 min and 1 h mainly had an effect when comparing hourly values of SAF, but not too much when comparing the different aggregation steps.
The study did not carry out a special sampling ratio analysis, did not consider different rainfall event types and also did not try different totalization types for the fixed window sampling (compare, i.e., Marasini (2020)). Still, the observations and conclusions from this study are a very good base for further analyses on the distribution of the sampling adjustment factor (SAF). It might also help to adapt current strategies to correct fixed values based on point measurements that are even interpolated in space. We suggest further developing the correction methodology proposed by Papalexiou et al. (2016) as the SAF distribution seems rather random based on our findings.
Further investigation will try to explain the different spatial pattern of SAFs in Germany as well as the scaling behaviour of rainfall maxima in space. Deeper evaluation of the radar product will help to understand the impact of data processing, especially the different temporal adjustment steps during the procedure in order to improve precipitation estimates by radar data over time scales. Acknowledgements The authors thank Technische Universität and SLUB Dresden for the financial support of the research.

Author Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by J. Pöschmann. The first draft of the manuscript was written by J. Pöschmann, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding This research was funded by TU Dresden's Institutional Strategy, which is funded by the Excellence Initiative of the German Federal and State Governments. Open Access publication of this paper was financed by SLUB Dresden.

Declarations
Ethics approval Not applicable

Conflicts of interest
The authors declare no competing interests. Nonfinancial interests: Christian Bernhofer is Editor-in-Chief of the Journal.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.