A comparison between different methods to fill gaps in early precipitation series

The aim of this work is to analyse and compare different methodologies to fill gaps in early precipitation series, and to evaluate which time resolution is reachable, i.e. monthly or daily one. The following methods are applied and tested to fill the 1764–1767 gap in the precipitation series of Padua: (1) using a relationship between monthly amounts and frequencies; (2) transforming a daily log with visual observations into numerical values through analysis, classification, and calibration; (3) substituting the missing values with an instrumental record from a nearby, contemporary station in the same climatic area. To apply the second method, the descriptions reported in the Morgagni Logs are grouped in 37 classes and transformed into numerical values, using for calibration the observed amounts in the Poleni record over the 24-year common period. As a third method, the series of Temanza and Pollaroli in Venice is used to fill the gap, and the application of a factor scale based on the ratio Padua/Venice tempted. The results of these three methods are discussed and commented.


Introduction
Extended datasets of past weather conditions are extremely valuable for the assessment of climate change and related consequences. The growing need for high resolution, high quality and long-term continuous records has required an enormous effort to recover and reconstruct early precipitation series.
Documentary evidence and early instrumental records constitute an important source of data for historical climatology.
Almost all long instrumental series are affected by gaps, due to different reasons, such as instruments malfunctions, poor health or even death of the observer, political changes, wars, and so on. This leads to exclude periods with gaps from data analysis. Another approach is to try to reconstruct missing values, or validate uncertain records. This is particularly important especially in the early instrumental periods, when the data are scarce, but their recovery, correction and reconstruction is crucial for climate studies.
The problem with precipitation is twofold, since both frequency and amount have to be reconstructed. A number of techniques have been developed over the decades aimed at estimating missing values in precipitation time series, mainly on monthly and seasonal basis. Methods for working at daily resolution are rare (Ruane et al. 2015;Caldera et al. 2016; Hansan and Croke 2013) for a number of di culties that will be discussed in the next sections.
When dealing with historical series, the task is more challenging, as generally data have to be carefully interpreted and validated, and gaps are quite large (i.e. years). Therefore, not all the methods described in literature can be applied to historical series, depending on the nature and amount of data and/or metadata available. This paper considers the following methods: 1) Relationship between monthly amounts and frequencies. Historically, this was the rst attempt and it was based on the monthly relation between the total precipitation amount and the number of rainy days.
In the presence of gaps, when the frequency was known from narrative sources, the missing amount was substituted with the matched value based on different criteria, e.g. similarities, return periods and so forth (Toaldo 1770;Crestani 1926Crestani , 1933. This method is applied at monthly and daily resolution. 2) Transformation of narrative sources into numerical values through analysis, classi cation, and calibration. The transformation from the narrative format to numerical proxy values is a challenging task, not only because of the di culties in recovering and interpreting the historical sources, but for the very nature and quality of the proxy. The use of indices in historical climatology follows a long tradition: Macdonald 2020). This approach has been applied to both normal and extreme events (e.g. droughts, storms, oods). This latter application is favoured by the natural tendency to document unusual weather/hydrological phenomena (Brázdil et Brunetti et al. 2002). The used scale is monthly, seasonal or yearly. In this paper, this approach is tested for the rst time at daily level, to reconstruct daily amounts.
3) Using records from one or more stations in the same climatic area. Several long series have been reconstructed, or had their gaps lled by using data from one or more neighbouring contemporary . The rst problem is that more than one simultaneous record must be available in nearby sites with similar climate, and this is very unlikely in the early period, when only a few stations operated. It must be considered, however, that precipitation has high time and space variability, and a great density of stations is needed to assess statistically signi cant precipitation patterns. In addition, the quality and homogeneity of the datasets is a crucial element (Lanza et al. 2022). The second key item is the choice of the interpolation method.
The aims of this paper are: i) ll a gap of a historical series, for which different methodologies can be applied; ii) ascertain the pros and cons of these methodologies; iii) reach the daily resolution. More precisely, this paper compares and discusses the three different methodologies above mentioned to ll a four-year gap in the mid-eighteen century Padua series (Camuffo et al. 2020) at daily resolution, using contemporary sources. In the same period, in Padua, Morgagni left a precise visual observation and description of the precipitation. In Venice, some 30 km west of Padua, a parallel instrumental record was taken. This fortunate situation gives the possibility of applying and testing different methodologies to the same case study. Another aim is to see whether it is possible to reach the daily resolution, or only the monthly one, and at what con dence level. This is the rst time that different procedures are compared to reconstruct the daily rainfall amounts in a historical series.

Padua: Poleni and Morgagni observations
The history of the three-century daily precipitation series in Padua has been described elsewhere, including data, instruments, exposure, relocation and observational protocols (Camuffo et al. 2020). However, the reader may nd useful a short summary to compare the situation in Padua with Venice. Giovan Battista Morgagni, a good friend of Poleni made a parallel series with indoor/outdoor readings for medical purposes from 1740 to 1768. Morgagni observed at home, 1 mile from Poleni's house. However, he didn't measure daily rain amount, but included in his Log some weather notes, 2 or 3 times per day, specifying whether the day was clear, cloudy, foggy, rainy, snowy or dewing, adding several adjectives useful to classify intensity, amount or duration (e.g. a few drops, drizzle, light rain, rain, continuous rain, heavy rain) and to distinguish liquid and solid precipitation from condensation.
The precipitation series has a gap from 1st April 1764 to 31th December 1767 (Camuffo et al. 2020). The Morgagni Log is apparently regularly lled from 1st April 1764 (Fig.ESM1) to 31th October 1765 (with the exception of April 1765 that is missing), but this was a late reconstruction made by Toaldo. The daily amounts are reported in a column without heading, on the right border of each page, written with a lighter ink, with the same handwriting used in a note at the bottom of the page of April 1764, stating: "NB The measurements of precipitation are estimated, in inches and decimals of the English foot, as Mr. Marquis Poleni used". Toaldo explained (1770) that he analysed the relation between monthly amount and frequency of the Poleni series and then estimated the daily amounts considering the frequency given by Morgani. The method used for matching frequency and amount is not fully clear, as explained in section 3.
Both Poleni and Toaldo used cubic funnels. The former had unspeci ed side length; the latter 1 Paris foot. To reduce evaporation, they xed a tube to the at bottom and carried the collected water into a vessel located below. Poleni used a cylindrical vessel and measured the precipitation depth plunging a graduated rod. The reading was ampli ed by the ratio between the funnel opening and the vase cross sections, obtaining 0.6 mm resolution. Toaldo measured the volume of the collected water with three calibrated cups with cubic shape (1-, 2-and 3-inch side length), and divided the collected volume by the cross section of the catching funnel (Toaldo 1770). His resolution was 1/12 of line, i.e. 1/144 Paris inch = 0.19 mm.

Venice: Temanza and Pollaroli instrumental series
The period of the Padua gap has contemporary observations in Venice. Temanza observed the early part, i.e. 9% of the total gap, and Pollaroli the remaining, i.e. 91%. The former is responsible for a small fraction of the gap, but is particularly relevant because had established the methodology that the latter followed.
Tommaso Temanza was a Venetian architect, engineer and art historian. He studied at the University of Padua, and his most leading teacher was Giovanni Poleni. As Poleni taught at his house, Temanza had the opportunity to become familiar with the Poleni's instruments and he too used an Amontons air thermometer, a barometer, and a raingauge with cubic funnel. Temanza became chief architect of the Magistrate of the Waterways of the Most Serene Republic of Venice. His interests included the measurement of meteorological variables and the sea level. Temanza was highly regarded and Toaldo reported his data on several occasions, complimenting the author (Toaldo 1770, 1797).
The location and the exposure were not speci ed. It is very likely that his instruments were at his house, following his reference Poleni. In Venice, the exposure had a very limited choice: a typical terrace to dry laundry on roofs, named "altana" (see Fig.ESM2), or a balcony. Temanza knew well that Poleni measured on his roof, and very likely followed his example. Like Poleni, Temanza observed near noon, when he returned home for lunch, following the astronomers' practice of setting the clock with culmination. Temanza dipped the graduated rod directly into the collecting vessel, and therefore he was not interested in knowing the cross section exactly. However, in the absence of ampli cation, the resolution was limited to ounches and lines, and was 2. Pollaroli succeed Temanza in the publication of the monthly weather tables from 1st August 1763 to 31st December 1769 and added some notes about public health (Pollaroli 1764(Pollaroli -1770. The Giornale di Medicina had a gap and returned again in 1773 with the monthly tables published by another Venetian physician, Jacopo Panzani, who also added some comments about the public health. In the Giornale di Medicina, the measurements taken by Pollaroli are presented without metadata (Pollaroli 1764). This suggests that the type of instrument, exposure and reading time were kept unchanged, except for the location that became the Pollaroli's house, that is unknown. The resolution of the precipitation readings, i.e. 2.4 mm, is the same of Temanza, and this con rms that Pollaroli used the same type of rain gauge, i.e. a basic cubic box into with a graduated rod. If the funnel had mouth larger than the at bottom, the resolution would have been different, i.e. multiplied by the ratio of top to the bottom section. When a few years later Panzani adopted a more sophisticated pluviometer composed of a big cylindrical vessel with a funnel inserted inside to divide the volume in two parts and reduce the evaporation, he reported the characteristics of his instruments in a note to his record (Panzani 1773). At least in the Venice area, rain-gauges constituted of a simple cubic catching box were popularly used by scientists not speci cally specialist in meteorology, till the end of the 18th century, as con rmed by Trevisan (1793)

The 1st Method: Relationship Between Monthly Frequency And Amount
Historically, the rst approach used to reconstruct missing daily precipitation amounts was based on the relationship between monthly frequency and amount (Camuffo et al. 2020). Toaldo (1770) devised this method to ll the gap from 1st April 1764 to 31th October 1765, and used the Poleni dataset. The method assumes that there is a strong relationship between the known monthly frequency (from local documentary sources) and the unknown amount. However, the amount was not derived by the simple linear relationship with frequency, but it was reconstructed following the observed 9-year periodicity of precipitation based on lunar cycle (Toaldo 1781). Toaldo made a further passage from monthly to daily values and reproduced previous rain sequences. His method is not fully clear and Toaldo himself declared that "the monthly precipitation amount did not correspond always to the number of rainy days, as it might rain for several days, but in small amount". He concluded that "it is necessary to measure the precipitation amount to establish if a year or month was rainy or not" (Toaldo 1770). Crestani (1935) severely criticized Toaldo's method, even at monthly level.
To go more in depth into Toaldo's method, we investigate the monthly relation between precipitation frequency and amount in the Poleni dataset. Every month is considered separately, for the different character that the precipitation assumes during the calendar year, i.e. heavy and long-lasting precipitation in spring and autumn, when the Atlantic perturbations reach northern Italy; dryness interrupted by heavy showers in summer; light winter precipitation. For each month, we make the scatter plot of the amount versus the frequency (Fig.ESM5), and then apply a linear regression. The monthly average of the values of the determination coe cient R 2 are reported in Fig. 1. The values range between R 2 = 0.84 in February and 0.21 in August, and the average over the 1725-1760 period is 0.55. This suggests that at monthly level this reconstruction may be acceptable for February, April, June, November and December, and not acceptable at all in May, July and August. Finally, there is no way to pass from monthly to daily resolution.

The 2nd Method: Conversion Of Contemporary Weather Notes Into Quantitative Values
This method is based on the conversion of contemporary weather notes into quantitative values. The unbroken and accurate descriptions that Morgagni reported in his Logs can be transformed into quantitative values, using for calibration the observed amounts in the Poleni record over the 24-year common period. Calibration is a challenging item, because weather notes depend on the skill and accuracy of the observer, the modality of observation and/or recording, and perception of the weather phenomena.
A di culty is the exact time correspondence between Poleni and Morgagni observations. Poleni observed at noon, but his rain-gauge recorded over the previous 24 hours. Morgagni observed three times a day, i.e. one hour after sunrise, two after noon, plus a note for the night. However, the sunrise and the sunset changed over the calendar year, in Padua the night is 1/3 of the day at the summer solstice and 2/3 at the winter solstice, and very likely Morgani missed the situation over night when he was sleeping.
The precipitation series by Poleni (instrumental) and Morgagni (visual, but with event classi cation) are analysed over the common period 1740-1763. The comparison of the frequencies is useful to test the accuracy of the visual observations. Overall, there is very good agreement in the occurrence of the precipitation events according to the two series, with the lowest difference in winter and the highest in summer (Fig. 2). The lower frequency of the Poleni's series in summer may be explained by the fact that he passed the hottest months in a countryside locality with better climate than Padua, and charged a trained servant to take note of the weather record. Evidently, this person was not very accurate. Finally, it should be considered that several summer months are missing in the Morgagni series.
The analysis of the contemporary precipitation events of the two series allows to calibrate the method, associating to the weather notes reported by Morgagni the daily precipitation amounts measured by Poleni.
Morgagni wrote several (i.e. two or even three) comments a day, often different between them according to the weather variability.
The daily precipitation amount (DPA) is given by the integral of the precipitation intensity (PIN(t)) over where t 0 and t n are initial and nal times of the precipitation event. Poleni measured the daily amounts DPA. Morgagni made an effort to give an accurate evaluation of PIN, using a number of different adjectives and adverbs, including a variety of their combinations, therefore it was not easy to extract classes from them, and sometimes the "label" of the class itself was composed by more than one adjective and/or adverb. Sometimes PIN is represented by the precipitation type, e.g. drizzle, rain, shower. Less clear is the duration, i.e. t 0 and t n . Morgagni often referred to the duration in terms of "long", "continuously", "ceaseless", without specifying the exact number of hours.
Some weather notes include only PIN; some others include some information about the duration. Therefore, it was necessary to adopt two classi cations: one for PIN alone, and another for the combination of the two variables, e.g. "continuous little rain" was considered a different class than "little rain". When duration is missing, that is the majority of the cases, the events characterized by the same intensity are considered within the same class (PIN alone) and supposed to have the same duration. This leads to apparent paradoxes when matching the Morgagni de nitions to the Poleni DPA; e.g. a continuous drizzle, de ned simply "drizzle" by Morgagni, but lasting over the whole day, could be associated to the same large amount of an intense but short shower, even if a shower belongs to an intensity class greater than drizzle.
In total, 37 classes are recognized (Table.ESM1). To increase the representativeness of the statistical approach, when an original de nition is used only a few times, the related events are merged with classes with similar de nition, based on slightly different terms, but with the same meaning. Every class is identi ed with one or two letters. The English translation of the Italian original de nitions is only indicative, as it is impossible to nd the precise correspondence of the many diminutives, nicknames and terms of endearment, of which the Italian language is extremely proli c.
The most populated class, i.e. 47% of the cases, is "rain" (N), without further speci cation. Then, the most numerous classes are "light rain" (U) 11% of the total, and "big rain" (G), 6%. According to this result, about a half of the daily amount reconstructed using this method are characterized by the same value, the one associated to class U.
Once each precipitation event is attributed to a speci c class, the next step is to associate a quantitative value to each class. Each class is composed of the ensemble of the matched amounts, i.e. those read by Poleni in the days when precipitation events belonging to that class occurred. This establishes for each class a broad and skew distribution, characterized by a speci c mode, median and mean. The mode is scarcely representative, being determined by the most frequent precipitation type, i.e. ne and light rains. The mean and the median are better representative to distinguish one class from another, and are represented in (Fig.ESM6).
The results for the three most numerous classes, i.e. "big rain", "rain" and "light rain", are shown in Fig. 3 with the indication of the most signi cant statistical values, i.e. median, mean and mode. The precipitation amounts associated to each class are quite scattered and the mode does not seem to be the best representative of each class. For example, according to the results, the mode of the class "rain" is lower than the mode of the class "light rain", i.e. in a rainy day the amount of rain collected is lower than in the case of light rain. This is not reliable from an objective point of view, and it can be explained because the information concerning the duration of each event was missing, therefore it is likely that if a light rain lasted for enough time, the daily total amount can be even higher than the amount collected in a rainy day.
Between the mean and the median, the choice may be subjective. In this paper, we assume the latter as representative of each class.
The daily precipitation amounts with the values in the gap reconstructed using all the 37 classes is shown in Figure 4. The result is not fully satisfactory because the method misses the lowest and highest values.

Use of the Venice series
This method is based on the use of a contemporary record of a nearby location of the same climatic region. In the early instrumental period, it is not always possible to nd a satisfactory solution, or even another record. In the literature, several long precipitation series were obtained by combining records taken at different sites in the same geographic area and/or at different levels from the ground, although this method opens issues concerning the homogeneity of the resulting series (Wales-Smith 1971; Craddock 1976Craddock , 1979 In addition, the measurements bene t of a basic homogeneity, because the rst observer, Temanza, made his best to follow the protocol of his former teacher and friend Poleni, and the second, Pollaroli, made his best to follow Temanza. All observers used a cubic funnel, very likely on the roof, and read at noon. However, precipitation is a local phenomenon, so even if two cities are quite close together it is possible that the frequency and intensity of precipitation is quite different (Berndtsson 1988;Li et al. 2014). In addition, the wind eld distortion caused by each particular roof could be responsible of unpredictable departures.
The gap in the Padua series can be lled with the contemporary observations by Temanza from 1st April to 31th July 1764 and by Pollaroli from 1st August 1764 to 31th December 1767. A correction factor may be due, because the precipitation in Padua and Venice is very similar, but not exactly the same. Therefore, the ratio Padua/Venice is calculated over a common period. It must be said, however, that on the yearly or monthly timescale, the precipitation in Venice is highly correlated to Padua; while, on the daily scale, local phenomena may lower the correlation.

Comparison of precipitation frequency and amounts in Padua and Venice
The study of the relationships between the two cities, more speci cally how the monthly precipitation frequencies and amounts may differ in Venice and Padua, indicates that in the 1764-1767 period, there were about 424 rainy days in Padua, 421 rainy days in Venice, of which 302 (about 70%) occurred simultaneously in the two cities, the main differences being in the summer months, explained by local instability and cumulus cloud development (Fig. 5).
The plot of the daily precipitation amounts in Padua over the wider 1725-1811 period is shown in Fig. 6a, with the 1764-1767 gap lled with the Venice data. The two highest values of the whole series occurred in the gap, i.e. recorded in Venice, but both the days had intense rain in Padua. In fact, on the 14th September 1764 Pollaroli measured 92 mm and Morgagni wrote "gran pioggia" (great rain), and on the 10th July 1765 Pollaroli measured 130 mm and Morgagni reported "coperto poi vento grandissimo con pioggia e tuoni" (overcast, then strong wind and thunderstorm with rain).
If the amounts are represented with dots instead of vertical lines (Fig. 6b), it is evident that the resolution of the instrument used by Pollaroli is lower than the ones used by Poleni before and Toaldo after (as discussed in section 2), resulting in a less variability in the daily values.
The 4-year gap is too short to apply homogeneity tests to the series, as well to analyze the cumulative values of Padua vs Venice. The same can be said for the calculation of the yearly percentiles from the 10to the 90-ile of the reconstructed series (Fig. 6c) (Camuffo et al. 2021).

The Padua/Venice ratio as scaling factor
A challenging problem is to scale the daily amounts of the Venice series with the ratio Padua/Venice (PD/VE). Tests made over different sub-periods, determined by the homogeneity and availability of data, show that PD/VE = 0.95 for the period 1751-1758 (this study); PD/VE = 1.24 for 1880-1895 (Eredia 1908); PD/VE = 1.10 for 1920-1932 (Crestani 1933); PD/VE = 1.01 for 1960-1990 (this study) ( Table.ESM2). All of these sub-periods have a different instrument and exposure either in Padua, Venice, or both, showing that several contributing factors alter considerably the result, e.g. local roof turbulence, elevation, in uence of wind drag, rain-gauge threshold, wetting (i.e. dew) and evaporative losses. Only in modern times the precipitation is measured according to the WMO (2018) recommendations to make possible a reliable comparison between two sites. Wind eld deformation can account for 2-10% underestimation, which can exceed 50% for solid precipitation (Goodison et al. 1998;WMO 2018). Wetting loss in the collector walls and measuring cylinder when it is emptied can reach the 15% in summer. Evaporation from the container can be responsible of the loss of another 4%, in-and out-splashing up to 2%. These errors were arguably larger in the earliest instrumental periods (Brugnara et al. 2020). Any change in the instrumental threshold has a dramatic impact on the precipitation frequency, much higher than on the amount, in particular for the ne and light rains, that are dominant (Camuffo et al. 2021). For example, if only the precipitation values above 10 mm are considered, 70% of the data will remain undetected, but they are responsible for the 25% of the total amount. In conclusion, considering that the PD/VE ratio is around 1, but the exact value depends by different factors, and the main part of the Padua gap (i.e. 91%) is covered by the Pollaroli series that cannot be calibrated with other contemporary records, the application of a scaling factor to adjust the daily values of Venice would be affected by subjectivity and not justi ed.

Conclusions
In this paper, three different methodologies are applied and compared to reconstruct the rainfall daily amounts of the 1764-1767 gap in the Padua series. The case of historical series is quite challenging as a limited number of techniques can be applied, due to the nature of the data and metadata available, as well as the character of the gaps and their length.
The rst approach is based on the relationship between monthly frequency and amount. Results indicate that a linear relation is acceptable only at monthly level, and limitedly to some months, having the precipitation a seasonal character.
The second method tries the conversion of contemporary weather notes into quantitative values. The unbroken and accurate descriptions reported in the Morgagni Logs are transformed into numerical values, using for calibration the observed amounts in the Poleni record over the 24-year common period. The comparison of the rain frequencies of the two series over that period indicates that Morgagni's visual observations were quite accurate, with the highest difference with Poleni records in summer. The analysis of the contemporary precipitation events allows establishing 37 classes of precipitation, quite scattered in their values. Every class has a wide range, so that it is impossible to nd a precise relationship between a selected class and a characteristic value. A characterization of the classes should exclude the peak value (i.e. the mode), but both the mean and the median are possible choices; we select the latter as better representative of each class. The main disadvantage of this method is that the Morgagni's descriptions are based on the precipitation intensity at the moment of the observation, but the duration of the event is missing. This is misleading, because a long-lasting rain of light intensity may cumulate a higher amount of a more intense, but shorter precipitation. This justi es the broad ranges found in the calibration of the classes. In conclusion, the daily values reconstructed even on the ground of an accurate daily description of the rain intensity are not fully satisfactory.
The third method consists in lling the missing values with a record from a nearby, contemporary station in the same climatic area, i.e. the series of Temanza and Pollaroli in Venice. In these two nearby cities, at monthly level, the precipitation values are closely related between them, both in frequency and amount. However, at daily level, this method shows some problems: i) the precipitation may differ from one site to another, due to the local character of the rainfall; ii) the two records are not homogeneous, as observations were taken by different observers with different instruments; iii) the application of a scaling factor does not improve the results. In particular, a scaling factor can not be uniquely determined, as it depends strongly on a number of variables related to the period in which is calculated. Therefore, if this last method may be acceptable at monthly level, it is not reliable at daily resolution. Results clearly show that, in case of historical series, the use of nearby, contemporary series has to be evaluated carefully, as any change in the instrumental threshold has a dramatic impact on the precipitation frequency, much higher than on the amount, in particular for the ne and light rains, that are dominant. Figure 1 Determination coe cient R 2 between monthly precipitation frequency and amount in the 1725-1760

Declarations
Poleni dataset Figure 2 Scatter plot of the precipitation monthly frequency observed by Poleni versus Morgagni over the common period 1740-1760 and linear regressions. The seasons are indicated with different symbols and colors: winter with blue diamond, summer with red dots, spring/autumn with violet circles Figure 3 Normalized frequency of daily precipitation amounts (mm) of the three most numerous classes: a) rain; b) light rain; c) big rain