Fishing historical sources: a snapshot of 19th-century freshwater fauna in Spain

Historical information is needed to describe in a robust manner long-term changes in the distribution of organisms, although it is in general scarce or contained in non-scientific sources. Gazetteers (or geographical dictionaries) constitute a potential source of historical species records, which has not been accurately explored yet. The dictionary edited by Pascual Madoz between 1845 and 1850 extensively described the geography, population and socioeconomic aspects in Spain. The dictionary included abundant information on wild animals and plants, with a special focus on socioeconomically relevant species. Here, we present a database generated by collecting and georeferencing the mentions to freshwater fauna records in the Madoz, which includes 10,750 occurrence records of 39 freshwater-associated taxa from 5,472 localities. This database has been made public and usable (following FAIR criteria) in GBIF. Most of the records correspond to fish (10,201 records, 94.9% of total; 33 taxa), followed by crayfish (418 records, 3.9% of total; one species). Annelids (one taxon), amphibians (one taxon), reptiles (one taxon) and mammals (three species) sum up to 132 records (1.2% of total). The database presented here can be used to estimate the baseline ranges of many freshwater species, which should inform present-day management for the conservation and recovery of endangered species and freshwater communities.


Introduction
Human activities are driving the decline of populations of a myriad of species worldwide, initiating the sixth mass extinction event (Barnosky et al. 2011;Cowie et al. 2022).In this context, freshwater organisms are declining with notable intensity, if compared with terrestrial or marine ones (Living Planet Index; WWF 2020).As an example, a recent estimation of the sampled Red List Index for fish shows that this index is notably lower for freshwater fishes than for marine species (Miranda et al. 2022).The generalised negative trend of freshwater biodiversity is driven by multiple, complex and interacting anthropogenic impacts on riverine, lake and wetland biodiversity (Reid et al. 2019).Even being notably large, the real magnitude of the decline of freshwater biodiversity might be largely underestimated, because humans have impacted freshwater systems for centuries (Limburg and Waldman 2009), while biodiversity indicators rarely account for processes longer than a few decades (e.g. the Living Planet Index uses data starting in 1970).The underestimation of biodiversity losses can mislead the definition of conservation baselines and the conservation targets derived from them (Clavero et al. 2022a), potentially generating a shifting baseline syndrome (Lovell et al. 2020).
Assessing long-term biodiversity change requires historical information, which is scarce in general.Long-term data series exist for some commercially exploited freshwater species and can be used to estimate temporal dynamics in the abundance of local or regional populations (Yoshiyama et al. 1998;Aschonitis et al. 2017;Brevé et al. 2022).However, long-term fisheries datasets are rare and in general have a reduced spatial extent, informing about landings in specific spots.Mining historical sources is a promising approach to describe past species distributions and design conservation baselines and management efforts (Clavero and Hermoso 2015;Duarte Vol.: (0123456789) et al. 2018;Viana et al. 2022).In Europe, geographic description initiatives have been developed at least since the eleventh century (Duarte et al. 2018;Jones 2018), often in the form of geographic dictionaries (Clavero and Revilla 2014).Many of these historical documents provide information on natural resources, including vegetation and fauna, which can be used to describe past ecosystems and to identify conservation baselines (Clavero and Hermoso 2015;Viana et al. 2022).
Here, we present a database containing over 10,000 occurrence records of freshwater fauna from Spain in the mid-nineteenth century.Records were compiled from the geographic dictionary edited by Madoz (1845-1850) (henceforth 'the Madoz'), an extensive, exhaustive and standardized socioeconomic survey of Spain that provides abundant information on wildlife occurrences.The mid-nineteenth century is a relevant time-frame for freshwater conservation in the Iberian Peninsula because by that time freshwater ecosystems were still minimally impacted by the most relevant, large-scale human disturbances, such as water pollution, invasive species or damming (e.g.Elvira and Almodóvar 2001;Clavero and Hermoso 2015).The resulting database has been made publicly available (Blanco-Garrido and Clavero 2022) and provides an unprecedented amount of information to model the past distribution of species that can be used to define reference conditions for the conservation of freshwater biodiversity in the Iberian Peninsula.This database can constitute a model for the development of similar data generation initiatives elsewhere.

Data source
We extracted records of freshwater fauna (mainly, but not exclusively, fish) from the Madoz.This geographical dictionary summarised geographic, historical, population and socioeconomic information for Spanish villages and larger administrative units, also describing rivers, mountains, capes and other geographical features.The Madoz was published in 16 volumes between 1845 and 1850 and contains some 11,800 pages and around 70,000 articles.It incorporated information from previous sources (e.g.Martínez Marina 1802;Miñano 1826Miñano -1828)), although the bulk of the information provided was generated from its own sampling, which involved the participation of more than 1400 local collaborators over a 15-year period.The articles of the Madoz followed a systematic structure, with different sections, including a description of the cultivated areas and natural vegetation (under the section terreno, "terrain"), the urban area, the rivers, fountains and mills, the industrial and trading activities, the municipal budget and taxes, and crops, livestock and wildlife (under the section producciones, "productions").Most mentions to freshwater fauna in the Madoz are reported either when describing rivers and wetlands or within the producciones section in articles dealing with population centres.Some mentions to freshwater fauna are also associated to the heading terreno, although this is much less frequent.

Building the freshwater fauna database
We searched for mentions to freshwater fauna included in the Madoz using digitalized copies of the dictionary with text recognition, available at the Virtual Library of Andalusia (http:// www.bibli oteca virtu aldea ndalu cia.es), and following a snowball-like active search procedure.The search began with frequently cited species, such as European eel, brown trout or barbels (anguila, trucha or barbo in Spanish, respectively) throughout the 16 volumes of the Madoz.When finding a mention to one of those species, we noted any other mention to freshwater fauna elements, which were the focus of subsequent search rounds.Searches included name variants used for the different species as well as common mistakes made by the pdf reader in interpreting species names.The detection of name variants was made easier because fauna mentions were usually provided as short lists, while errors were identified by copying lists of fish names and pasting them in a text processing program (the most common mistake was the inclusion of blank spaces among letters, e.g.b a r b o for barbo).The search process continued until no new names appeared.
Taxonomy was harmonised by assigning the variants of vernacular names to a unique scientific name based of the GBIF's taxonomic backbone.Whenever possible, records were identified to species level.If this was not possible, the lowest possible taxonomic level was assigned (e.g., genus, family, order).Vague terms not fitting a clear taxonomic category were discarded (e.g., "fish" or "fishery"; "peces" or "pesca" in Spanish, respectively).
Mentions to freshwater fauna were translated into spatially explicit records by georeferencing localities (articles) using Google Earth (Decimal Degrees Coordinates, unprojected WGS84).Georeferentiation was done for localities including villages, small topographical accidents, and small rivers.To homogenise georeferencing procedures, coordinates assigned to villages correspond roughly to the centroid of the built area, so they do not necessarily represent the exact localization of the taxa mentioned by the Madoz.Considering the average area of Spanish municipalities (60 km 2 ; http:// www.ine.es) and that often multiple records (villages) were cited within a single municipality, the spatial accuracy of these records could, therefore, be confidently assigned Fig. 1 A within a 5 km radius around the citation.Articles dealing with large geographical (e.g.stretches within a river) or administrative units (e.g.judicial districts, provinces) were not georeferenced.For further hydrological context, the information on freshwater fauna was also linked to the specific hydrological basin to which each georeferenced locality belongs to, using QGIS (QGIS Development Team 2022).
The database on nineteenth century Spanish freshwater fauna is hosted in GBIF (Blanco-Garrido and Clavero 2022), from where it is freely accessible and downloadable.It is structured as: (i) Event database, composed by 5,472 rows (sites) and 22 columns (variables), (ii) Ocurrence database, with 10,750 rows (species × sites) and 30 columns (variables), downloadable as a DwC-A file.Henceforth, we report and discuss the distribution of the main species and faunal groups resulting from the mining of the information included in the Madoz.

Overview of the dataset
We generated 10,750 occurrence records of 39 freshwater taxa from 5,472 localities (Fig. 1A).Records are mainly distributed through Peninsular Spain, although they also include the Canary (three localities) and Balearic (two localities) archipelagos, as well as one locality in Portugal (Pulo do lobo, a cascade in the lower Guadiana River).The availability of records has important spatial bias due to different human settlement patterns across Spain, where there are larger and sparser municipalities towards the south, resulting in less Madoz articles (Fig. 1A, B).

Brown trout Salmo trutta Linnaeus 1758
Brown trout (trucha, in Spanish) was the most frequently cited freshwater species in the Madoz (3943 records; 36.7% of total), with other 32 records of the migratory form, the sea trout (discussed below, together with other migratory fish).The species was cited in all main river basins, although records of the species were rarer to the south (Fig. 2).The six trout records in the Guadiana and the mention to its presence in the Guadalhorce basin are especially noteworthy, because the species does not occur naturally in these basins in the present (Doadrio et al. 2011).In both cases, the Madoz dictionary provides explicit mentions of the presence of trout both in tributaries of the Guadiana River (for example Gévora, Estena, Jola and others) and in the Guadalhorce River itself.
Trout records from the Madoz were used by Clavero et al. (2017) to evaluate the impacts of recent warming trends (around 1.5 °C since the begging of the twentieth century), showing that the species has declined when compared with the mid-nineteenth century situation and that this decline was accurately predicted by temperature-distribution relationships modelled with the Madoz data.

European eel Anguilla anguilla Linnaeus 1758
The Madoz dictionary provided 2848 records of the European eel (26.5% of total).The eel was widely distributed, being present and widespread in all Iberian basins (Fig. 3) and occurring also in the Balearic and Canary archipelagos.The presence of eels in Gran Canaria Island is remarkable, since these are the southernmost records of the entire species distribution range joint to the Khnifiss lagoon in Morocco (see Qninba et al. 2021).The Madoz mainly used the Spanish voice "anguila".Other vernacular names, although less commonly used, were "orihuelo", "meixones" (referred to elvers) and "congrios".The latter is also widely used to designate the conger eel Conger conger Linnaeus 1758 in marine environments.
Eel records extracted from the Madoz were used by Clavero and Hermoso (2015) to produce a baseline scenario for the conservation of the eel in the Iberian Peninsula and to propose management action in relation to the mitigation of river fragmentation.

Barbels (Luciobarbus and Barbus genera)
Barbel species, generally reported as barbos but also picones in a few cases, agglutinate 1867 records in the Madoz (17.4% of all records).There are eight barbel species in the Iberian Peninsula (Doadrio et al. 2011).In general, Iberian barbels share a similar morphology (Gante et al. 2015) and are not differentiated through vernacular names, although this differentiation was occasionally possible (e.g.codirroyo for Barbus haasi Mertens 1925 in the Ebro Basin, where it coexists with Luciobarbus graellsii Steindachner 1866).Thus, we assigned possible species identity of barbel records following the biogeographical distribution of barbel species in
Large nases embrace six different Iberian species.As in the case of barbels, we deduced the species linked to nases recorded in the Madoz attending at the current the distribution of these fish across different Iberian basins (Doadrio et al. 2011;SIBIC 2017).

Small nases (Iberochondrostoma lemmingii Steindachner 1866 and Achondrostoma arcasii Steindachner 1866)
The Madoz provided 110 records of Achondrostoma arcasii.The species was mainly mentioned with the present-day common name "bermejuela", although it had several variations ("bermeja", "bermejo", "bermejuelo", "bermijuela"), and the Madoz also recorded the names "sarda", currently assigned to Achondrostoma salmantinum Doadrio & Elvira 2007, and "sardina", this latter located close to the known distribution of this congeneric species (Doadrio et al. 2011).Records of this species were found in the Duero and Ebro River basins, as well as across several small Cantabrian basins, with one record within the Tagus River basin (Fig. 6).
Twelve fish records from the Madoz were assigned to Iberochondrostoma lemmingii, mentioned with the name "pardilla" in the Guadiana and Tagus River basins (Fig. 6).
Minnow and Gudgeon records were located mainly in Ebro River basin, but both species occurred also in Cantabrian basins.Both, minnow and gudgeon, have a much wider present distribution, which have originated through multiple human-mediated introductions (Amat-Trigo 2017; García-Raventós et al. 2020).Freshwater blenny record was located in the Fluvià River basin (Fig. 8), in which the species still occurs today (Méndez et al. 2019), in a very particular mention

Loaches (genus Cobitis)
We found 27 records in the Madoz assigned to Cobitis loaches, either C. paludica de Buen 1930 or C. calderoni Băcescu 1962 (Fig. 9).The most common term used to name loaches was "lampreas", the same vernacular name used for the migratory Sea lamprey Petromyzon marinus Linnaeus 1758.We assigned to lampreys those records that were accompanied by other migratory fish species (excepting eel, given its ubiquitous distribution, see above), or when any reference to it large size was provided.Body size criteria was useful to differentiate loaches from lampreys as loaches rarely exceed 100 mm in total length (Perdices and Doadrio 1997a, b).Mateus et al. (2012) apparently did not take this criterion into account, considering that the term "lamprea" referred exclusively to sea lampreys when interpreting historical references.This could have generated an overestimation of the distribution of this migratory species.The term "lamprea" is still used today in Spanish to refer to loaches, as its variation "lamprehuela" used to refer to C. calderoni.Cobitis paludica was mentioned twice as "colmillos", a name very similar to "colmilleja", the term currently used in Spanish.Records from the Ebro basin assigned to C. calderoni could also refer to Barbatula hispanica Lelek 1987 (Denys et al. 2021; Fig. 9), a benthonic species with morphological characteristics very similar to Cobitis.

Migratory fish [Acipenser sturio Linnaeus 1758, Alosa spp., Petromyzon marinus, Salmo salar Linnaeus 1758, Salmo trutta reo Linnaeus 1758 and Grey mullets (Mugilidae)].
We compiled 539 records referring to migratory fish (5.0% of the total), excluding those of the eel, reported above.We assigned these records to six fish taxa that could include up to 12 different species.
The Madoz reported 259 records of Atlantic salmon Salmo salar, always using the Spanish standard name "salmón", mainly from watercourses from north and northwestern Spain (Fig. 10).There was an interesting mention to this species from the Portuguese section of the Guadiana River, stating that "marine fish, such as sturgeon, salmon, lampreys and others, also go up through its mouth to the Salto del Lobo" (Pulo do Lobo, a 20 m high waterfall in the lower Guadiana River).This mention suggests that migratory salmonids (salmon, sea trout, or both) might have a larger range towards the south in the nineteenth century than in the present.
Sea lamprey Petromyzon marinus was cited 199 times, named as "lamprea".Lamprey records could also refer to migratory Lampetra fluviatilis Linnaeus 1758, although at least since the second half of the twentieth century this species has been rare in the Iberian Peninsula (Mateus et al. 2012).Sea lamprey records were frequent in the lower Miño River and in some Cantabrian basins, and the presence of the species was also frequent in the lower sections of the Guadiana, Guadalquivir, Duero, Ebro and Ter rivers (Fig. 10).One notable record mentions the presence of sea lamprey in the Duero River more than 200 km upstream from its mouth, where, in words of Madoz, there was an "abundant fishery of large lampreys, as well as smaller ones, and delicate eels".
Madoz records for the sea trout Salmo trutta reo (mentioned as "reo"), 32 in total, are concentrated in the northwest of Spain, partially coinciding with records of salmon (Fig. 10).
The now critically endangered Atlantic sturgeon Acipenser sturio (Gessner et al. 2022) was cited 21 times, mainly in the lower stretches of the Fig. 10 Records of migratory fish extracted from the Madoz's dictionary (n = 539 in total).Mentions of sea lamprey Petromyzon marinus from those basins that drain into the Cantabrian Sea may also correspond to both P. marinus and/ or species belonging to the genus Lampetra Vol:.( 1234567890) Guadalquivir, Ebro and Miño Rivers, but with records in several other river systems (Guadiana, Ulla, Río Grande de Xubia, Ría de Villaviciosa, Lérez and Fluvià Rivers).Sturgeon was mentioned as "sollo", or with related variations ("soyo", "zoyo"), with the now popular name "esturión" being absent from the Madoz.

Non-native species
The Madoz provided records of three non-native freshwater species, two fish, common carp Cyprinus carpio Linnaeus 1758 and tench Tinca tinca Linnaeus 1758, and the Italian crayfish Austrapotamobius fulcisianus Ninni 1886, a species that cannot be confused with any other because it was the only crayfish present in Spain at that time (Clavero and Villero 2014).
The Madoz included 418 records of the Italian crayfish (3.9% of total records of freshwater fauna), mainly concentrated in north-central Spain, with isolated records towards the south.The tench, mentioned as "tenca" in the Madoz, was found in 223 sites, showing a relatively wide distribution, and being especially frequent in the Duero, Tagus, Guadiana and Ebro River basins, although it was also mentioned to occur in other basins.Carp, mentioned as "carpa", was much less frequently cited than tench, being cited at only 21 locations, although distributed across several Iberian basins (Fig. 11).

Leeches, frogs, terrapins and mammals
The Madoz provided 54 records of leeches, mentioned as "sangüijuelas" or, more rarely, "sangujas".We identified these leech records as Hirudo troctina Johnson 1816, assuming that they would refer to medicinal leeches in the genus Hirudo (Arias et al. 2021).Leech records were distributed across the Spanish mainland territory (Fig. 12).
The 12 frog records, "ranas" in the Madoz, were tentatively assigned to Pelophylax perezi Seoane 1885 (Fig. 12), the most common and abundant frog species in the country (Llorente et al. 2002), although we also considered the possibility that mentions referred to different species in the genus Rana (Esteban and García-París 2002;Esteban and Martínez-Solano 2002;Manenti and Bianchi 2011;Fig. 12).
The Madoz provided 18 terrapin records, which could refer to Mauremys leprosa Schweiger 1812, to Emys orbicularis Linnaeus 1758 or to both species simultaneously.Mentions of terrapins appear widely distributed across Spain, excepting the archipelagos (Fig. 12), using the name "tortugas".
Mammals linked to freshwater habitats were represented by three species in the Madoz, the watervole Arvicola sapidus Miller 1908 (one record), the Iberian desman Galemys pyrenaicus Geoffroy Saint-Hilaire 1811 (two records) and the Eurasian otter Lutra lutra Linnaeus 1758 (44 records).Watervole was referred as "rata de agua", and the record of the species appears in the northeast, in Ebro basin.Iberian desman was named as "topo", the same Spanish voice used to refer moles (Talpa spp.).However, we linked topo with desman when the Madoz cited it together with other freshwater fauna (fish, crayfish or terrapins).Curiously, the two records of the desman were located in relative lowlands far from the mountains, the typical known habitat for the species (Nores 2017).Similar citations are known in the Duero River basin until the mid-20th (González and Román 1988 and references therein).The records of otter, "nutrias" in the Madoz, were widely distributed across the Iberian Spanish territory (Fig. 12).

On interpreting historical biodiversity records
The biodiversity records generated based on the Madoz presents various information gaps and biases that should be acknowledged (Clavero et al. 2022a, b).On the one hand, mentions to freshwater fauna included in the Madoz did not aim to be exhaustive faunal inventories, but to identify socioeconomically relevant species, which in the case of freshwaters involved mainly fisheries-targeted species (or other uses, including medicinal practices in the case of leeches and the fur market for otters).Thus, the absence of records should not be directly interpreted as absences of any taxa, and particularly for taxa with small economic profitability.For example, smallbodied fish species would be more frequently mentioned in the absence of large, profitable taxa (e.g.eel or trout), independently of their coexistence with the later.On the other hand, as mentioned above, the availability of records has important spatial biases, related to a combination of different human settlement patterns across Spain (with larger, sparser municipalities towards the south, resulting in less Madoz articles) and potential biases of local informants to report wildlife-related information (Clavero et al. 2022a, b).These features and biases imply that historical records should not be directly translated into past species ranges, calling for the application of modelling techniques in that process (Clavero and Hermoso 2015;Jetz et al. 2019).In any case, gaps and biases, both taxonomic and spatial, are not a specific issue of historical datasets, being shared with contemporary repositories of biodiversity information (Beck et al. 2014;Callaghan et al. 2021;Hughes et al. 2021).
Despite of the aforementioned limitations, some quality aspects should be highlighted when analysing and interpreting this historical information.The species covered in this dataset are generally part of popular culture and thus easily identified, as most of the species mentioned were directly exploited (fisheries).Moreover, the people who responded to the questionnaires lived from and interacted with their natural resources, for which it can argued that they transmitted first-hand information.Also, the exceptional spatial accuracy of the records presented in this database is noteworthy, being comparable or even better than that offered in the widely used Atlases with current species distribution data (for example, Doadrio 2002).Finally, the distribution of historical records is, with few exceptions, consistent with the existing knowledge on the biogeography of Iberian freshwater fauna.In fact, data from Madoz's dictionary have already been used in several international scientific publications, such as Nores andLópez-Bao (2022), Ramos-Merchante et al. (2021), Clavero et al. (2017), Clavero and Hermoso (2015), Clavero and Villero (2014) or Granado-Lorencio (1991).

On the potential of historical data
The dataset on historical freshwater records presented here, and already made publicly available, has no precedents in its combination of amount of information provided and its antiquity, the taxonomic coverage, the spatial extent of the available date, and the temporal and spatial precision of the records.It offers an extraordinary opportunity to model the historical distribution of different elements of the freshwater biota to generate reference conditions for freshwater biodiversity (see for example Ramos-Marchante et al. 2021).This historical baseline distribution of Vol.: (0123456789) species is necessary to assess their conservation status, understand their declines, manage their recovery and stablish the ecological status of freshwater habitats (Clavero and Hermoso 2015).As recently shown, the compilation of historical species records allows producing baseline range scenarios with much higher resolution than usually available to inform species status and set recovery targets (Clavero et al. 2022a, b).For these reasons, the historical data should be considered in the environmental legislation of the countries when establishing effective plans for the protection and recovery of threatened species.

Fig. 2
Fig. 2 Records of brown trout Salmo trutta, excluding records to migratory sea trout, extracted from the Madoz's dictionary (n = 3,943)

Fig. 11
Fig. 11 Records of introduced species extracted from the Madoz's dictionary (n = 662 in total)