1 Introduction

Quality assurance of precipitation data is crucial for the reliability of applications and models that use them as input variables, as well as for an adequate representativeness of the statistics derived from the raw time series. Firstly, an appropriate maintenance of the weather station is necessary, checking records at different timesteps in the field and periodically calibrating the sensors if necessary (Estévez et al. 2011). Then, the application of quality control test is required in order to validate meteorological data, identifying errors, doubtful, or suspect records, among other kind of labels flagging each value. Although the World Meteorological Organization (WMO) recommends following the guidelines reported in WMO (2008), many weather station networks belonging to public organizations do not yet have reliable systems that guarantee the availability of quality precipitation long-term series (Morbidelli et al. 2020). Therefore, scientists are often the ones who implement validation procedures and apply them to precipitation data as a prerequisite for their use (Parchure and Gedam 2019; Medina-Cobo et al. 2017; García-Marín et al. 2015; Estévez et al. 2020; Yang et al. 2020; Lana et al. 2020). Most of these quality control methods are based on algorithms, rules, tests, or procedures reported in different works. For example, Feng et al. (2004) applied different checks (range, temporal, internal, and spatial consistency) in China; Estévez et al. (2015) proposed a new method to detect spurious precipitation signals using the transmissivity atmospheric coefficient in irrigated areas; You et al. (2007) developed a test based on the multiple intervals’ gamma distribution assuming that normal distribution is not efficient for identifying precipitation outliers; Sciuto et al. (2009) used the comparison of rainfall values on target station to reference estimations using an artificial neural networks model; Durre et al. (2010) presented a comprehensive set of fully automated quality assurance procedures for observations of daily surface precipitation, among other meteorological magnitudes, to detect duplicate data, climatological outliers, and internal, temporal, and spatial inconsistencies; and Serrano-Notivoli et al. (2017) developed a validation method based on probabilistic predictions (occurrence of a wet or dry day and the amount of precipitation) from nearby stations. Moreover, the detection of erroneous precipitation data due to problems in tipping-bucket rain gauge measures has been studied (Michaelides 2008; Habib et al. 2001; Einfalt et al. 2006).

Basically, in order to detect potential precipitation erroneous data, there are two main categories of validation methods: those designed to use data from a single site and those based on the comparison of candidate station data to neighboring station data. The procedures based on statistical decisions have a great efficiency if long-term precipitation series are available (Hubbard et al. 2005; Llabrés-Brustenga et al. 2019; Vicente-Serrano et al. 2010) and an adequate density of locations if spatial methods are involved. In this work, and due to the great importance (forecasting, trends and non-linear analyses, etc.) of having long-term validated datasets, a complete spatial quality control procedure for daily precipitation data validation has been applied in the semiarid region of Andalusia (Southern Spain).

The main goal of this work is the validation of daily precipitation records from different networks in an extensive region (one of the largest regions in the European Union) by applying a spatial quality control procedure. The whole process includes three consecutive steps: basic, absolute, and relative quality control methods. It is important to note that no similar works have been carried out in a heterogeneous region like Andalusia (mainly semiarid but also with arid and dry sub-humid areas) where different climates coexist. This region has important singularities such as water being a limiting factor or its low density of precipitation stations that make it critical to verify the applicability of this type of quality control mechanisms.

2 Data and methods

Daily precipitation data from 1947 stations of three meteorological networks operating in Andalusia (AEMET, RIA, and RAIF) were available. The AEMET network is the oldest one and it is managed by the Spanish Agency of Meteorology, while RAIF and RIA correspond to Agroclimatic Information Network and Phytosanitary Information Alert Network, respectively. Andalusia is a region of almost 90,000 km2 with a high interannual variability in rainfall and located in the south of the Iberian Peninsula (see Fig. 1). The dataset had an average number of 8145 daily data per station (with a minimum of 6 days and a maximum of 45,798 days) and a total of 15,857,892 days/station data.

Fig. 1
figure 1

Meteorological stations of Andalusia (Spain)

Figure 2 shows the temporal evolution of the number of available series of the Andalusian network from 1870 to the present. The first available data was taken in 1870 at the San Fernando station (Cádiz), the only operative station of the network for more than 30 years. At the beginning of the twentieth century some other few stations started to work: Jaén-Instituto, which began to operate in 1901; Granada (Cartuja) and Granada-Universidad, both started in 1902; and Huelva and Sanlúcar de Barrameda, in 1903. From 1910 the network began to grow, keeping the number of stations at a few tens until the interruption caused by the Spanish civil war (1936–1939). After war, the growth rate increased considerably and uninterruptedly until the 1970s. Around 2000, the RIA and RAIF networks of automatic weather stations started up. After 2010, a progressive dismantling of some of the manual stations left the network at the present in just over half of the operational stations it had in 2000.

Fig. 2
figure 2

Temporal evolution of the number of meteorological stations in Andalusia from 1870 to the present. Lines represent the number of series with a minimum annual percentage of available data

The quality control procedure applied is based on the methodology developed by Llabrés-Brustenga et al. (2019) using rainfall datasets in Catalonia (NE Spain), and consists of three stages which can be followed in the scheme of Fig. 3. The main novelty in the present study is to check the behavior of this procedure to verify its extensibility in different geographical areas and with different meteorological characteristics. The objective of this procedure is to select series with the best quality while detecting possible errors, starting with the most serious ones and gradually increasing the level of detail. First stage is a basic quality control that consists in the detection and removal of certain erroneous values. That is the case of physically impossible rainfall amounts as negative ones or values too high for the climatology of the studied area. An upper threshold of 600 mm in 24 h was considered; this is the highest daily rainfall amount ever recorded in Andalusia, collected on October 19, 1973 in Zurgena (a locality of Almería, the easternmost province of Andalusia) during the extraordinary precipitation events that occurred in the southeast of the Iberian Peninsula (Capel Molina 1974), which gave rise to river and Mediterranean river bed overflows and floods, causing the disappearance of crops, household goods, and the loss of human lives. In any case, all values above 300 mm were manually checked and verified. A filter to detect identical values repeated for consecutive days was also used, to further manual checking. Due to the common old procedure to write down equally distributed values for each day from a cumulative measure done for several days, incorrect daily values can be found. Series with this kind of errors could be used to calculate monthly totals, but these incorrect amounts should be removed from the daily dataset.

Fig. 3
figure 3

Scheme of the quality control procedure

2.1 Absolute quality control

After the basic quality control, an absolute quality control has to be performed. At this stage every single series is evaluated individually and a quality index \(Q\) is calculated for every year at every station, taking into account the percentage of annual data, the distribution of gaps in the sample, the heterogeneity in the records of rainfall by day of the week, and the abundance of outliers.

Bearing in mind the most common problems in daily rainfall series, a quality index \(Q\) is used to categorize every series for their absolute and individual quality. This index is calculated using Eq. (1) and ranges from 0 to 100. Values below 50 indicate of a very poor quality while above 80 acceptable quality is considered.

$$Q=\frac{1}{4}\left(P+{Q}_{\mathrm{gaps}}+{Q}_{\mathrm{zero}}^{w}+{Q}_{\mathrm{outliers}}\right)$$
(1)

The four terms in Eq. (1), contributing with the same weigh, evaluate the presence of the most common troubles in daily rainfall series. Parameter \(P\) measures completeness and is calculated dividing the number of daily data by 365 (or 366 in the case of leap years). Parameter \({Q}_{\mathrm{gaps}}\) evaluates the gap distribution, penalizing the presence of gaps interspersed with data more than single empty periods, and is calculated using Eq. (2). In Eq. (2) \({n}_{\mathrm{gap}}\) is the number of empty days from the total, \(n\), and \({L}_{\mathrm{gap}}^{\mathrm{max}}\) is the length of the longest empty period, for every station-year.

$${Q}_{\mathrm{gaps}}=100-100\frac{2{n}_{\mathrm{gap}}+{L}_{\mathrm{gap}}^{\mathrm{max}}}{n}$$
(2)

There is no significant evidence of weekly cycles that could favor or disfavor rainfall on a specific day of the week (Schultz et al. 2007; Stjern 2011), so the number of precipitation days in a given year should be approximately the same for every day of the week. The term \({Q}_{\mathrm{zero}}^{w}\) in Eq. (1) is able to detect if there is a much lower number of rainy days for a specific day of the week than the rest, with the aim to penalize series for which some days of the week would have been systematically filled by zeros due to not having any measure. For each year at every station, using an index \(i\) ranging from 1 to 7 (one value for each day of the week), the number \({n}_{i}\) of rainy days for each day of the week is considered. Then, for every station-year, there will be a set of seven numbers, each of them indicating the number of rainy Mondays, Tuesdays, Wednesdays, etc., ideally similar between them, for which the coefficient of variation \(\mathrm{CV}\) has to be calculated. Being \(\mathrm{CV}\) the standard deviation divided by the mean value, it is supposed to be close to zero if the number of rainy days for each day of the week is similar. The term \({Q}_{\mathrm{zero}}^{w}\) is then calculated using Eq. (3), for every station-year.

$${Q}_{\mathrm{zero}}^{w}=100-100\mathrm{\;CV}$$
(3)

For the last term, \({Q}_{\mathrm{outliers}}\), the proportion of number of days which have not exceeded the outliers’ threshold with respect to the total number of days is calculated. For every month, the outliers’ threshold is determined as the rainfall amount three times the interquartile range above the third quartile.

The quality control procedure applied by Llabrés-Brustenga et al. (2019) included also a term \({Q}_{\mathrm{zero}}^{m}\) intended to penalize series with many whole months with null precipitation, which could correspond in some areas to not operational time periods filled with false zeros. This is not the case of the Andalusia dataset, a region with several dry areas where it is common to have 2 or even 3 months each year without any precipitation. The automatic calculation of this term as it was defined cannot work properly in Andalusia; therefore, it has not been considered.

The value of this quality index \(Q\) can be a useful indicator for an appropriate selection of series, as it was the case of the works by Casas-Castillo et al. (2018) and Lana et al. (2020, 2021a, b), which used series extracted from the complete rainfall dataset of Catalonia (Llabrés-Brustenga et al. 2019) that fulfilled, among other quality requirements, to have an index \(Q\) higher than 80%.

An analysis of the values of the four terms of Eq. (1) found for all the years and all the stations of Andalusia has been performed, being shown in Section 3.

2.2 Relative quality control

The final stage of the control procedure is a relative quality control that takes into account and compares data from neighboring stations leading to a complete evaluation of each daily value. For every singular station some convenient auxiliary stations are selected to compare their registers. For this selection, a representativeness index \(R\) is then calculated for every couple studied station — possible auxiliary station. This index \(R\) takes into account the spatial coordinates, the altitude of the stations, and their daily records, and it is calculated using Eq. (4), where \(d\) is the distance (in km) between the two stations, \(h\) is the difference in altitude (in m), and \({C}_{\mathrm{corr}}\) is the Pearson’s correlation coefficient between daily data from both stations. Llabrés-Brustenga et al. (2019) considered necessary the inclusion of the two first terms due to the complex orography of their area of study (Catalonia). The same consideration can be made to the Andalusian region, where big differences of altitude at short distances are possible, resulting in different local meteorological conditions and consequently different rainfall production.

$$R=\frac{100}{3}\left(\frac{D-d}{D}+{0.5}^{{~}^{h}\!\left/ \!{~}_{500}\right.}+ {C}_{\mathrm{corr}}\right)$$
(4)

For the distance term of Eq. (4) a maximum radius of influence D has to be established, depending of the characteristics of the studied area and the density of stations. For instance, a value of \(D=50 \mathrm{km}\) was considered for Catalonia, with an averaged density of 5.3 stations per 100 \({\mathrm{km}}^{2}\). Then, values for this term range from 1 (for \(d=0\)) to 0 (for \(d=D\)), decreasing linearly with distance. Values for the altitude term are also between 1 and 0 but decrease exponentially to penalize big differences of altitude more than small ones, with no upper limit considered.

The correlation term, which takes into consideration daily rainfall data, was designed to calibrate the similarity between the rainfall regime of the auxiliary station and that of the study station. Even though it is calculated from Pearson’s correlation coefficient, no negative values were considered, assigning a null value to the correlation term \({C}_{\mathrm{corr}}\) in the case of negative coefficients.

To compare daily data from the studied station and the possible auxiliary station, a relative difference DIF is calculated using Eq. (5), which takes into account the daily value for the studied station, PPT, and that for the auxiliary station, PPTaux, dividing by the average of both values and multiplying by the maximum value.

$$\mathrm{DIF}=\frac{\left|\mathrm{PPT}-{\mathrm{PPT}}_{\mathrm{aux}}\right|}{\mathrm{Av}\left(\mathrm{PPT}, {\mathrm{PPT}}_{\mathrm{aux}}\right)}\mathrm{Max}\left(\mathrm{PPT}, {\mathrm{PPT}}_{\mathrm{aux}}\right)$$
(5)

To be considered valid, pairs of studied station-possible auxiliary station have to present DIF values lower than an established threshold \({T}_{m}\) calculated using Eq. (6).

$$T_m=C_{m\;}\ln\left(101-R\right)$$
(6)

The values of the coefficient \({C}_{m}\) in Eq. (6) have to be determined from high-quality data series and are expected to be seasonal. The procedure to estimate these coefficients for every month is the following: (1) for each specific value of the index \(R\), the relative differences between each monthly amount of each year collected in a station and the amounts of neighboring stations are calculated; (2) the 95% percentile as well as the interquartile range of these series of relative differences for each \(R\) are determined; and (3) the amounts corresponding to ten times the interquartile range above the 95% percentile (one point for each \(R\)) are fitted to Eq. (6) leading to the value of the coefficient \({C}_{m}\) for the considered month. Llabrés-Brustenga et al. (2019) found a seasonal cycle for this coefficient, the highest values corresponding to summer months when local storms are likely to occur, being the lowest values for winter months. The lowest value of threshold \({T}_{m}\) is 0, and corresponds to a value of representativeness \(R\) of 100; it would be the case, for instance, of two stations located at the same place with perfectly correlated data. If distance and difference of altitude between the two stations increase or data correlation decrease, this threshold increases logarithmically. One of the aims of the present work is to verify if the threshold coefficients \({C}_{m}\) calculated from the best quality series of Andalusia perform a similar seasonal cycle and compare the obtained values. Their possible application in other areas, with similar climatology but unable to obtain their own coefficients due to the lack of an operational network of automatic and controlled stations, might be of great interest.

In case the DIF values for a specific day of the year were under the threshold, the couple of stations are labeled as “valid” for this day. If the threshold is exceeded, the couple is labeled as “invalid.” For each studied station, this relative comparison has to be made for every day of the year and every auxiliary station. Then, in order to be declared as a valid station, the studied station must obtain more than a certain established value for a weighted arithmetic mean \(\mathrm{WM}\) (see Eq. *(7)) which considers the representativeness of all available pairs, valid or not. In Eq. (7) \(L\) takes the value 1 for valid pairs and 0 for invalid pairs, and \({R}_{\mathrm{min}}\) is the minimum value of representativeness considered.

$$\mathrm{WM}=100 \frac{{\sum }_{\mathrm{aux}}{\left(R- {R}_{\mathrm{min}}\right)}^{2}L}{{\sum }_{\mathrm{aux}}{\left(R- {R}_{\mathrm{min}}\right)}^{2}}$$
(7)

The results of the relative quality control performed on the Andalusian database are shown in the next section.

3 Results and discussion

3.1 Absolute quality control

Regarding the annual completeness, Table 1 shows the number of stations with a certain percentage of completeness for a minimum number of years. Thus, for almost 90% of the stations at least 5 years of their records present a completeness of 50%, while only 3 stations have records with at least 50 years 100% complete. Figure 4 shows the number of available stations with at least 1 day with data for every year of the considered period (1870–2018).

Table 1 Number of stations with a certain annual completeness of daily data for a given number of years
Fig. 4
figure 4

Number of available stations with at least one daily data for every year

The temporal evolution of the four terms contributing to the calculation of the quality index \(Q\) using Eq. (1) is shown in Fig. 5. Regarding the term \(P\) of completeness, some traits are remarkable: the only rainfall record working alone in the Andalusian set for more than 30 years since 1870 (San Fernando, Cádiz) had a very good completeness along this period, always over 93% for the available years. Regarding the other four stations which were put into operation in the early years of the twentieth century (Jaén-Instituto in 1901; Granada-Cartuja and Granada-Universidad, both in 1902; and Huelva and Sanlúcar de Barrameda, in 1903), due to the fact that their records did not start in January 1 but in the middle of the year, in any month, the term \(P\) suffers a visible drop in these years to values below 80% (Fig. 5). This drop is even more pronounced a decade later, when in 1912 several new stations began to work and the total number of stations suddenly rose to more than 20. The fact that their first operational year was not complete makes the global term \(P\) to fall to values near to 50%. The Spanish Civil War (1936–1939) might be another cause contributing to the next sharp drop in the 30 s. This dropping effect in the graph of \(P\) due to new records with the first year incomplete is only noticeable when the number of stations is small. Then, later drops after 1960 might not be attributed to this effect.

Fig. 5
figure 5

Temporal evolution of the four quality terms contributing to the quality index Q. Every point has been drawn by assigning to each year the mean value obtained for all stations that had value this year. The temporal evolution of the mean number of rainy days in Andalusia, as well as that of the quality term \({Q}_{\mathrm{outliers}}\) in Catalonia, have also been represented. The blue line indicates the significant decreasing trend (\(p<0.01\)) of this term in Catalonia (slope \(-0.013\mathrm{ \%}\) per year)

Sharp fluctuations of the term \({Q}_{\mathrm{gaps}}\) can be seen at the beginning of the dataset, its values oscillating between 87 and 98% for the oldest stations. Even though, since 1915 this term begins to increase and after 1970 it remains almost steady around a value of 97–98%. This term and the term \({Q}_{\mathrm{outliers}}\) are both very sensitive to the number of operational stations, presenting anomalous behaviors at the beginning, when only few stations were working. The term \({Q}_{\mathrm{outliers}}\) remains almost steady around a value of 95% after 1910, showing no temporal trend. On the contrary, the values of this term for the Catalonian dataset analyzed by Llabrés-Brustenga et al. (2019) do show a clear decreasing temporal trend (Fig. 5). A Mann–Kendall test performed on both series of \({Q}_{\mathrm{outliers}}\) for the years after 1920, when there were almost 40 stations in operation, shows a significant decreasing trend for Catalonia (\(p<0.01\), slope \(-0.013 \%\) per year), which means an increase of the presence of outliers in the sample, while no trend is detected for Andalusia. When a more refined analysis is done by the calculation of this quality term by provinces, decreasing trends are also found for the eastern Andalusian areas, more influenced by the Mediterranean Sea, as Catalonia is. These increases of the presence of outliers are in general accordance with the temporal increase of rainfall irregularity reported in the Mediterranean area during the most last part of the twentieth century (Lana et al. 2022; Casas-Castillo et al. 2018, 2022).

As expected, no evidence of weekly cycles favoring or disfavoring rainfall on a specific day of the week was found. The apparent general decrease over time of term \({Q}_{\mathrm{zero}}^{w}\) might be related to a decrease of the number of rainy days in certain zones. In this sense, a decreasing trend in seasonal precipitation has been observed in some areas of the Eastern Andalusia (Ramos-Calzado et al. 2008; Ruiz Sinoga et al. 2010). Apart from that, there have been well-known periods of drought affecting the whole region: 1978–1982, 1992–1995, and 2004–2010.

Figure 6 shows the distribution histogram of the mean values of the quality index \(Q\) for every meteorological network. As expected, automatic weather station networks RAIF and RIA obtained the best absolute quality indices, with almost 87% of the stations with values of \(Q\) above 85%, while only 57% of network AEMET, with older and mostly manual stations, reached this value.

Fig. 6
figure 6

Distribution histogram of the mean values of the quality index Q (%) for every meteorological network

The mean value of this global quality index \(Q\) for all the station/years of the entire database has found to be 86%. No spatial pattern has been detected for \(Q\), indicating that this index seems not to depend on the diverse climatic characteristics of the study area. This absolute quality control has made it possible to select the best quality series from the Andalusian dataset to perform a simple scaling analysis under the different precipitation regimes in the region (Casas-Castillo et al. 2022).

3.2 Relative quality control

The value of the representativeness index \(R\) has been calculated for every couple studied station — possible auxiliary station of the Andalusian database using Eq. (4), which considers location, altitude, and daily records from every station. Even though the density of stations in Andalusia is lower than for Catalonia, 2.3 per \(100 {\mathrm{km}}^{2}\), in Eq. (4) the maximum radius of influence between stations has been also set to \(D=50\mathrm{ km}\). One of the possible consequences derived from the choice of an inappropriate D value in relation to the characteristics of the study area is that the number of possible auxiliary stations could be too low to carry out an appropriate relative quality control. In this case, 50 km has proven to be a suitable maximum radius of influence for the Andalusian network.

For a selection of couples of stations with a representativeness index \(R\) between them exceeding a value of 70, the relative differences (DIF) observed in daily data measured at each station have been calculated using Eq. (5). As commented in Section 2.2, couples with DIF values under an established threshold will be considered as valid. Threshold \({T}_{m}\) has to be calculated using Eq. (6), for which the specific monthly coefficients \({C}_{m}\) of the Andalusian region, for the maximum radius of influence chosen, have to be determined. It has been done using data from 97 automatic weather stations of the RIA network, all of them with values of the absolute quality index \(Q\) above 90%.

For all the RIA couples with representativeness above 70, the relative differences in daily measurements in the period 2008–2018 have been calculated. Then, following the procedure presented in Section 2, for every value of the index \(R\), the relative differences corresponding to ten times the interquartile range above the 95% percentile have been determined for every month. Then, these monthly amounts have been fitted to the threshold functions (Eq. (6)) leading to the monthly coefficients \({C}_{m}\). Figure 7 shows four examples of observed relative differences and fitting threshold functions: those corresponding to September, the month with the highest relative differences, those to February, one of the months with lower differences, and the intermediate cases of April and November.

Fig. 7
figure 7

Relative difference observed in daily measurements for couples of RIA stations with a representativeness above 70. Blue points indicate the value of 10 times the interquartile range above the 95% percentile of the set of points corresponding to every value of representativeness, fitted to the threshold functions (red lines) for February (a), April (b), September (c), and November (d)

A seasonal cycle for the coefficients \({C}_{m}\) has been obtained (Fig. 8; Table 2), with high values for late summer and beginning of the autumn (above 25 mm; 33.3 mm for September) and low values (around 10–15 mm) for winter (December to February). This outcome, very similar of that found by Llabrés-Brustenga et al. (2019) for Catalonia, is a consequence of the extremely irregular surface registers, even at close locations, for late summer storms and convective events in autumn, whereas for the continuous and more uniform winter rainfall episodes, often of synoptic origin, differences in measurements taken at distant stations are usually low. The values for early summer are not much higher than in spring (even a slight decrease in July) due to the fact that, in the most part of Andalusia, rainfall is usually scarce in summer. Then, the observed rain differences cannot be very high, and there is a low dispersion of data that causes the value of the interquartile to decrease. In fact, July is the month for which fewer coincident rainy days have been found, being also few the couples contributing to the analysis.

Fig. 8
figure 8

Monthly \({C}_{m}\) values of the threshold’s function for Andalusia. The dashed line is a fitted third-order polynomial line drawn for a better visualization

Table 2 Monthly values of the coefficient \({C}_{m}\) of the threshold’s function for Andalusia for a maximum radius of influence between stations of 50 km

After a selection for which only the auxiliary stations with relative daily differences below the corresponding threshold remain, next step consists of tagging each station and each day of the whole dataset as valid/invalid/doubtful/insufficient information after relative comparison with its auxiliary stations. To finally declare a station as valid for a certain day, the studied station must achieve the value 50% for the arithmetic mean \(\mathrm{WM}\) (Eq. (7)) for this day. If the value of \(\mathrm{WM}\) is between 20 and 50% the station is labeled as doubtful for this day, and it is considered invalid for values below 20%. For a reliable comparison it was considered that a minimum of three auxiliary stations were necessary for every studied station and day. Days with less than three values to compare were considered to have insufficient information. A total of 15,857,892 days/station have been analyzed following this procedure (Table 3). The average number of auxiliary stations for every station was found to be 18.

Table 3 Numbers of days/station and percentage of the total for every quality category considered

In Table 3 are listed the numbers of days/station assigned to every of the relative quality control categories considered, together with their percentages of the total. The percentage of data declared invalid, i.e., detected errors, has been 0.14%. This a percentage similar to that found by Llabrés-Brustenga et al. (2019) for Catalonia (0.2%), to the rejected values by Vicente-Serrano et al. (2010) in the NE of Spain (0.1%), and those (1.58%) by Serrano-Notivoli et al. (2017) in the Spanish island of Mallorca. Similar is the percentage of doubtful cases (0.17%). From the rest, a substantial percentage (18%) have been determined to have insufficient information, a percentage slightly lower to that found by Llabrés-Brustenga et al. (2019), which was 25%. This is a consequence of the low density of stations in some places; not enough auxiliary stations under the considered requirements could be assigned to some stations for some days. Moreover, analyzing to which of the 1947 available series these cases with insufficient information belong, a percentage of almost 20% correspond to stations for which at least during their first 15 operational years it has not been possible to compare measurements with neighboring stations. In fact, in the period 1912–1932 some stations begin to have valid data (a maximum of 19 stations with valid data in 1924), but from 1933, in previous years and during the Spanish Civil War (1936–1939), the number of operational stations dropped drastically. It is not until 1950 that there are again about 20 stations with valid data.

Bearing in mind both results from the absolute and the relative quality controls, the final quality of every year at each station have been assessed considering these three categories: (1) excellent quality, for series with a minimum absolute quality index \(Q\) of \(90\)% and a minimum percentage of valid data after the relative control of 99%; (2) good quality, for series with a minimum absolute quality index \(Q\) of 85% and a minimum percentage of valid data of 95%; and (3) acceptable quality, for series with a minimum \(Q\) of 80%, and a minimum percentage of valid data of 90%. From a total of 49,789 station/years, 50% (24,778) have been declared of acceptable quality or better, 35% (17,350) of good or excellent quality, and only 17% can be considered of excellent quality. Notice that not all the available years/station correspond to complete years. If only the series from 1970 onward are considered, it would be a total of 39,338, with a 57% (22,443) of acceptable cases or better, 41% (16,111) of good or excellent ones, and 20% (7913) of excellent quality series. Figure 9 shows the temporal evolution of the number of series in each of these three quality categories, as well as the total available series.

Fig. 9
figure 9

Temporal evolution of the number of available and selected quality series

4 Conclusions

A quality control procedure has been applied to a large database of 15,857,892 daily rainfall measurements from 1947 stations of three meteorological networks (AEMET, RAIF, and RIA) operating in Andalusia, in the south of the Iberian Peninsula, from 1870 to the present. This procedure consists of three successive steps which obtain increasingly higher degrees of quality from the database. At the second stage a quality index \(Q\) is calculated for every year at every station individually, considering the percentage of annual data, the distribution of gaps in the sample, the homogeneity of rainfall records by day of the week, and the abundance of outliers. The mean value of this quality index \(Q\) for the whole database has found to be 86%, and no spatial pattern has been detected for it, indicating no dependence on the diverse climatic characteristics of the study area. The automatic weather stations networks RIA and RAIF obtained absolute quality index \(Q\) above \(85\%\) for almost \(87\%\) of their stations, while only 57% of AEMET network with older and mostly manual stations reached this value. However, it is remarkable that the only rainfall record working alone in the Andalusian set for more than 30 years since 1870 (San Fernando, Cádiz) had a very good completeness along this period, always over 93% for the available years. Regarding the presence of outliers over time, no temporal trend has been found after 1910, unlike what it can be found for Catalonia: an increasing temporal trend of the presence of outliers possibly related to a reported increase of rainfall irregularity over time caused by climate change.

After the last stage, each daily value measured at each station is labeled among three possible categories (valid, doubtful, or invalid) according to the relative comparison between stations with sufficient representativeness between them. For this comparison, maximum possible differences under a certain threshold have been determined after the analysis of some reference series obtained from automatic weather stations of proven high quality (RIA, between 2008 and 2018). These maximum differences have turned out to be seasonal: higher values are observed for late summer and beginning of the autumn as a consequence of the extremely irregular surface records for late summer storms and convective rainfall events in autumn, even at nearby locations. For the more regular winter rainfall episodes, often of synoptic origin, lower differences are usually observed, even at distant stations. Monthly threshold functions have been found, with fitting coefficients very similar to those obtained in Catalonia (NE of the Iberian Peninsula): maximum values for September and minimum from December to January. The main exception is the coefficient relative to July, a month for which rainfall is usually scarce or null in the most part of Andalusia, whose value has resulted noticeable lower than in Catalonia. It can be concluded that for the use of this quality control procedure, it could be recommendable the determination of the threshold functions for the specific studied region, depending on its climate characteristics. The density of stations has to be considered too, eventually leading to different threshold functions that make possible the application of the methodology with enough stations to compare with each other.

A total of 15,857,892 days/station data have been processed, from which almost 82% have been declared to be correct. Only 0.14% of the cases have been found to be incorrect values, while the doubtful cases are 0.17%. For a significant percentage of cases (18%) there was not enough information to perform the relative quality procedure under the established requirements. This is a reasonable percentage due the characteristics of the analyzed database, covering a large period which starts at the very beginning of the rainfall measurement instrumental history of Andalusia in 1870.

Three quality categories have been considered to evaluate every year at each station taking into account both the absolute and the relative controls: from a total of 49,789 available years/station since 1870, 50% have been declared of acceptable quality or better (a minimum absolute quality index \(Q\) of 80% and a minimum percentage of valid data after the relative control of 90%), 35% of good or excellent quality (minimum \(Q\) of 85% and a minimum percentage of valid data of 95%), and only 17% can be considered of excellent quality (minimum \(Q\) of \(90\)% and minimum 99% of valid data).

The application of this type of complete quality control procedure has been very successful despite the limitations of the semiarid region of Andalusia, especially due to its low density of weather stations and its structural water deficit which acquires great relevance in the summer months.