Using botanical resources to select wild forage legumes for domestication in temperate grassland agricultural systems

The need for better understanding and conservation of wild plant resources with potential for domestication or utilization in crop improvement has been highlighted in recent years. Botanical resources such as herbaria, databases, and floras offer an information-rich platform from which to select species of interest based on desirable traits. To demonstrate the potential of these resources, wild, native forage legumes were screened for inclusion in northern Swedish grassland agricultural systems (leys). The poor persistence of red clover in multi-year leys is a limitation to the current management strategy in the region. Wild, native forage legumes with the potential for longer persistence were considered for inclusion as minor components in leys to contribute to the system in later years of production. Using the Umeå University Herbarium, local floras, and both regional and international biodiversity databases, seven wild forage legume species were selected based on phenology, morphology, and native range. Particular focus was given to the potential for species to provide pollinator resources early in the season, leading to species with early flowering being preferred. Biodiversity databases were also used to locate wild populations of the selected species to facilitate seed collection for future cultivation, as additional study of the agronomic potential of the selected species is necessary. Here, we have shown that the rich biodiversity data stored in botanical institutions can jumpstart the selection of wild species for utilization in the agriculture sector based on various traits of interest.


Introduction
Globally, it is estimated that there are nearly 400,000 species of flowering plants (Pimm and Joppa 2015). Throughout the last 12,000 years, humans have domesticated about 2500 of these species, though only 250 are considered to be fully domesticated (Dirzo and Raven 2003;Gruber 2017;Fernie and Yan 2019). Considering that upwards of 50,000 plant species are considered edible, a large gap exists between currently domesticated crops and their wild progenitors that may have potential for cultivation (Warren 2015). The potential of wild plant species is particularly relevant when current crops fail to fit the agricultural systems in which they are integrated. Wild species with agronomic potential may be able to improve the suitability of crop systems to a changing environment and alternative agricultural practices through their unique adaptations to their native region.
Botanical resources are a greatly underused avenue of research into wild species that have agricultural potential. Globally there are over 3000 botanical gardens with herbaria that house an estimated 390 million plant specimens (Miller et al. 2015; Thiers (updated continuously)) ( Fig. 1).
Additionally, numerous botanical databases compile data on the taxonomy, morphology, phenology, and ethnobotanical use of nearly all described plant species (Missouri Botanical Garden.; Kattge et al. 2020;Molina-Venegas et al. 2021). These data compiled over centuries provide the perfect platform for the study of crop wild relatives. Two of the world's most prominent botanical gardens, the Missouri Botanical Garden and the Royal Botanic Garden Kew, have initiated projects focusing on the identification and conservation of crop wild relatives within the last 10 years (Dempewolf et al. 2014;Ciotir et al. 2019). These projects serve as models for the utilization of botanical knowledge in the field of crop wild relative research.
The legume family (Fabaceae) contains over 19,500 species, making it the third largest family of flowering plants (Azani et al. 2017). Though much biodiversity exists in the family, only 65 species are commercially important and traded globally, 50 of which are forage legumes (Howieson et al. 2008;Kulkarni et al. 2018;Schlautman et al. 2018). This discrepancy suggests that some species with unique adaptations to their native environment and great potential for cultivation may have been overlooked for use in agriculture. Perennial forage legumes are an essential source of protein in sustainable livestock production throughout Europe and the rest of the world. Through their ability to fix atmospheric nitrogen and subsequently contribute usable organic nitrogen into the crop system, legumes increase the sustainability of feed and food production (Carlsson and Huss-Danell 2003). In Europe, the main perennial legume forages planted for harvesting are red clover (Trifolium pratense L.) and lucerne (Medicago sativa L.), while white clover (Trifolium repens L.) is the most commonly sown for grazing (Halling et al. 2004;Geleta et al. 2019). Considering the diversity of wild legumes in the region, increasing the agrobiodiversity of forage legumes has the potential to improve the adaptability and sustainability of forage production.
The potential for new forage legume species is of particular interest in forage production in northern Sweden. Leys, a system in which forages are grown for animal feed as a break from annual crops in a rotation, play an important role in food production systems in northern Sweden, as they provide the forage necessary for dairy and meat production (Kipling et al. 2016). In Västerbotten and Norrbotten, the two northernmost provinces in Sweden, leys made up 68% and 75% of the total arable land in 2021, respectively (Jordbruksverket 2021). Leys are generally harvested two to three times per season for three to four years in the north, though are sometimes harvested for up to eight years before being resown to an annual grain crop (Ericson 2018). In northern Sweden, these leys are generally multispecies swards, containing various grass species and red clover as the dominant forage legume species. Though its productivity is unmatched in the region during the first two years, issues with root rot and clover rot negatively impact its persistence and therefore its yield in the long term (Frankow-Lindberg et al. 2009;Marshall et al. 2017). Solving issues with root and clover rot are challenging and have not been fully resolved through the breeding of disease resistant clover varieties or the application of chemical agents. In addition to the reduction in forage yield following the loss of red clover, the ley productivity is also hindered due to the decrease in biological nitrogen fixation (Riesinger and Herzon 2010). Ley persistence must be increased to fit the current management strategy in the region. A potential solution to this may be the inclusion of wild forage legume species with longer persistence that would continue to contribute fixed nitrogen to the ley system for the later years of production. These wild legumes, when grown as minor components of a ley alongside red clover and forage grasses, could help solve ley persistence issues after year three while still maintaining yields in years one and two. A study on 26 species and four subspecies of native legumes in Sweden demonstrated nodulation in all 30 taxa (Ampomah et al. 2012). These results likely signify that the evaluated species are capable of fixing nitrogen in their native environments. Further study on nitrogen fixation and nodulation of new forage legume species will be essential in evaluating their persistence.
The inclusion of alternative native legumes and the consequent increase in biodiversity in leys can act not only to improve their persistence, but also to contribute additional ecosystem services to create more sustainable agricultural systems (Bianchi et al. 2013). As monocultures have the largest negative impact on pollinators, an increase in agrobiodiversity can act to alleviate some of this threat by providing diverse food sources (Wratten et al. 2012). When managed with biodiversity in mind, grassland systems have the potential to house high levels of plant diversity and thus pollinator resources. Additionally, higher plant diversity in leys would help ensure pollen and nectar sources for pollinators throughout much of the season, with early flowering species being vital due to the lack of floral resources early in the season, particularly in northern Sweden (Johansen et al. 2019).
In an effort to demonstrate the potential for utilizing botanical resources in agricultural research, we used Sweden's major botanical databases and the Umeå University Herbarium to select candidate species of native forage legumes from northern Sweden based on characteristics such as morphology, phenology, range, and habit.

Study system
Fabaceae is distributed throughout Sweden, with a total of 25 genera and 84 species that are native or naturalized in the country (Krok et al. 1994). All 84 species are placed within the subfamily Papilionoideae DC., the largest legume subfamily with 503 genera and ca. 14,000 species (Azani et al. 2017). In Papilionoideae, leaves are pari-or imparipinnate to palmately compound, but also commonly uni-or trifoliate and leaflets can be modified into tendrils. Following the name, the corolla is typically papilionate, with an adaxial standard petal, two lateral wing petals, and two abaxial keel petals. Root nodules, either indeterminate or determinate, are prevalent in the subfamily, with nodulation occurring in roughly 90% of the genera (Tutin et al. 1968;Sprent 2001;Azani et al. 2017).

Initial selection criteria
As the focus of this study was to select candidate species that could be grown in northern Sweden, a list of species found in the northernmost provinces was extracted from the Swedish Virtual Herbarium (http:// herba rium. emg. umu. se/). The faunistic provinces included were Lycksele Lappmark, Norrbotten, Pite Lappmark, Västerbotten, Ångermanland, and Åsele Lappmark (Johansson and Klopfstein 2020). Based on herbarium records, 79 species representing 25 genera had been collected in these six northernmost provinces. As the specimens were collected over a range of roughly 215 years, an initial taxonomic check using the Tropicos database (https:// www. tropi cos. org/ home) was done to ensure species names were still valid. Four of the specimen names were now invalid at the species level, as they had been reclassified as subspecies or were now considered synonyms to valid species. The remaining 75 species were then used as the initial candidate species list. Important characteristics were considered as selection criteria to narrow down the list of candidate species. As the goal of this study was to select native legumes that could be grown in leys, species must be native to northern Sweden, be perennial to survive throughout the lifespan of the ley, and have an herbaceous (non-woody) habit to enable conventional harvest and comparable quality to existing species. Self-regenerating annual species were not considered, as the short growing season in northern Sweden combined with multiple harvest management strategies limit the ability for species to set seed. Information on native range, growth duration (i.e., annual, biennial, or perennial), and habit for the candidate species was extracted from the International Legume Database and Information Service (ILDIS) (https:// ildis. org/ Legum eWeb/). Local flora were also consulted for the collection of these data (Mossberg et al. 1992;Krok et al. 1994). Though not a selection criterion, ethnobotanical data were also collected from ILDIS for each species to gain insight on previous uses. Additionally, occurrence was considered important, as populations would need to be easily located and abundant enough to support seed collection. To ensure this, a list of all specimens from the Swedish Virtual Herbarium of the 75 initial candidate species was compiled, and only species with a minimum of 20 herbarium specimens collected were considered. The list of candidate species was then narrowed to include only those that met the above-mentioned criteria.

Herbarium selection and measurements
Species from the narrowed candidate list were then evaluated for additional characteristics using herbarium specimens from the Umeå University Herbarium. All specimens available for each species from the Västerbottens län collection were used for measurement and data acquisition. For each specimen, the location of collection, accession number, collector, collector number, and latitude and longitude of collection site were recorded. Traits measured included leaf length, leaf width, and plant height (only for specimens with roots and terminal bud). Leaf length was measured for the compound leaf from the leaf base to the leaf apex of the final leaflet(s). Leaf width was measured at the widest point of the compound leaf. The inflorescence location was noted, as well as whether or not the plant was in flower or fruit on the date of collection. A range of flowering period was constructed using the collection date of all specimens in flower. Final candidate species were selected based on flowering period, as the project was particularly interested in early flowering species.

Data analysis
Descriptive statistics of flowering day of year were calculated for each species. Using the minimum and median value for each species, descriptive statistics were then calculated for the combined dataset of all 17 species. Species which were within the minimum and the first quartile of minimum or median flowering day for all species were selected as the final candidate species.

Selection of wild populations and seed collection
Once the list of candidate species was finalized, the collection of seed from wild populations of each species was planned.
As the major objective of this project was to study the quality and establishment of these species when grown in a ley, it was essential to acquire seed for each species. As seed from the selected species is not commercially available and the seed quantity needed for future experiments was not available from gene banks, it had to be collected from populations in the region. Populations of the final candidate species were identified using Artportalen, a database run by the SLU Species Databank within the Swedish University of Agricultural Sciences that is used to report species observations in Sweden (https:// artpo rtalen. se/). Several populations of each species were selected and visited in June to July 2020. During this initial visit, sites for future seed collection were selected based on population size. Large populations were selected, as they could accommodate seed collection without endangering the health of the natural population through over collection. Selected populations were monitored throughout the summer to track seed pod maturity. Once mature, seed pods were collected, allowed to fully dry and then threshed. Seeds were then counted and weighed to determine a thousand seed weight and stored in cool, dry conditions.

Species selection
Information on the habit, growth duration, and native range of the initial candidate list of 75 species was gathered. Of the 75 species, 66 were herbaceous, 54 were perennial, and 40 were native to Sweden. A count of collected herbarium specimens for each species showed that 36 of the candidate species had more than 20 specimens documented in the Swedish Virtual Herbarium (Fig. 2, Supplementary Material   Fig. 2 Flow chart of the initial selection process starting with the extraction of all Fabaceae species collected in Sweden from the Swedish Virtual Herbarium and ending with the selection of 17 species to be further evaluated in the herbarium selection stage. ▸ Table 1). The 17 species that met all of these selection criteria were then included in the narrowed candidate list and further investigated using herbarium specimens from the Umeå University Herbarium. Of the 17 species that were selected for further investigation, 14 were documented to have previous ethnobotanical use. Thirteen had been used as forage, five as a human food source, and nine medicinally (Supplementary Material Table 1). The historic uses of wild species can highlight their potential for domestication and integration into modern agricultural systems. Species with an ethnobotanical history as forage may have beneficial traits in terms of cultivation and nutrition, and such should be highlighted during the selection process. Previous studies have also expressed the importance of ethnobotanical records in agriculture research, particularly when examining wild species for their potential in de novo domestication (Ciotir et al. 2019;Leakey 2019;Frawley et al. 2020). The acquisition of ethnobotanical records can be quite challenging, as much of the available data is not consolidated to a few sources, but is rather scattered throughout many regional sources, sometimes only available in the local language. Recent efforts to consolidate this knowledge through the creation of ethnobotanical databases can assist in diversifying the utilization of plant use data in various areas of research, such as agriculture, pharmacology, and conservation (Ciotir et Table 2). The specimens measured had been collected throughout all of Västerbotten and therefore represented the morphological and phenological variability of the species throughout the region (Fig. 3).
Flowering range data for the species showed that seven species fell within the minimum and first quartile of all 17 species for minimum and median flowering day (Fig. 4). These seven species, Anthyllis vulneraria, Astragalus alpinus, Lathyrus palustris, Lathyrus pratensis, Trifolium spadiceum, Vicia cracca, and Vicia sepium were then selected as the candidate species.
Flowering date was chosen as a major selection criterion, as pollinator resources are an important ecosystem service in grassland systems. As Fabaceae has been cited as one of the most frequently visited plant families by bees, a focus on the diversification of pollinator resources through forage legumes can serve as a way to greatly enhance the sustainability of forage production (Lagerlöf et al. 1992;Lagerlöf and Wallin 1993). Legume species grown in grassland systems are of great importance to pollinators, as they provide a primary pollen source in Fig. 3 Locations throughout the province of Västerbotten, Sweden, of the herbarium specimens from the Umeå University Herbarium measured for the 17 candidate species. agricultural landscapes that are often lacking in floral diversity (Decourtye et al. 2010). A study on both stable and declining bee species in Europe showed that red clover pollen was the most commonly collected for half of the studied species (Kleijn and Raemakers 2008). This was likely due to the abundance of red clover grown as a forage crop in the studied regions. With forages having such a large impact on pollinator diet, the diversification of forage legume species in cropping systems has the potential to provide pollinator resources throughout the entire season.
The management of grassland systems, such as leys, has a major impact on pollen resources. The time and frequency of harvest impact floral diversity, with variable harvest times providing the most continuous supply of floral resources (Johansen et al. 2019). Selection of wild forage legume species should be influenced by the management strategy of the system in which they are to be added. The species selected in this study exhibit relatively early flowering times, as they are intended for inclusion in leys, which can lack floral resources early in the season due to the harvest of red clover prior to its flowering. This potential increase in floral diversity early in the season may in turn increase pollinator diversity, as species with foraging activity early in the season will have greater access to resources (Decourtye et al. 2010;Johansen et al. 2019). The impact of these floral resources will greatly depend on the management regime of the ley. Various harvest frequencies and times should be assessed with these potential wild forage legumes to determine whether they are best included in low-or high-intensity systems.
Selecting persistent species was the other main goal of the project, but data was not available on the persistence of the species when grown in an agricultural grassland system. Due to this lack of data, only flowering date was used to narrow down the candidate list during the herbarium survey portion of the selection process.

Issues with selection
Leaf width and height were measured for each specimen to gain insight on leaf area, as leafiness is an important factor Fig. 4 Flowering day of the herbarium specimens measured. The day of flowering is expressed as x of 365, to represent the day of the year out of the total 365 days. Line a represents the first quartile of the earliest flowering day, b represents the median of the earliest flowering day, c represents the first quartile of the median flowering day, and d represents the median of the median flowering day. Left edge of rectangular plot represents the 1st quartile, black vertical lines represent the median, and right edge represents the 3rd quartile. Black points represent outliers.
to consider when considering nutritional quality of a plant (Table 1). This parameter was not included in the selection of the final candidate species, as more focus was given to early flowering than leaf area.
Leaf area can be an important trait when selecting wild forage species for use in production, as the leaf:stem ratio of forages can be indicative of their quality. Plants with higher leaf:stem ratios at harvest are generally higher in crude protein and digestibility and lower in NDF (Terry and Tilley 1964;Kalu et al. 1988Kalu et al. , 1990. Even with a high leaf:stem ratio, it is unlikely that any of the selected wild species would have a forage quality or yield that could compete with red clover. Current red clover cultivars have been bred extensively to maximize yield and quality, thus making it nearly impossible to replace them with wild species (Geleta et al. 2019). An alternative is to include these wild species only as minor components in leys. In doing so, the yield and forage quality of the wild species must only be high enough not to significantly decrease the yield and quality of the forage harvested. In place of contributing through yield and quality, these wild species may contribute to the system through increased ecosystem services and persistence.
Initial plans also included plant height as an important characteristic, as small plants would be easily outcompeted for light by other species in the ley mixture. This selection criterion was removed, as there is a potential bias towards collecting smaller specimens of a species that will easily fit on a herbarium sheet. Plant height ranges collected from the specimens did not match the ranges found in the Flora Europaea, with the mean plant height from herbarium specimens often being at the minimum end of the range given in the flora (Tutin et al. 1968). Biases in herbarium collections have been documented in several categories, including spatial, temporal, trait, phylogenetic, and collector bias (Moerman and Estabrook 2006;Blonder et al. 2012;Daru et al. 2018). Most of these come about inadvertently to ease collection difficulties or focus efforts on plants of most interest to the collector. The potential of these biases must be acknowledged when utilizing herbarium specimens to draw conclusions about species morphology, phenology, and occurrence. Though these potential biases may skew specimen data when compared to wild populations of a species, herbaria still offer the most extensive collection of plant diversity data available and should be considered invaluable resources in agrobiodiversity research.
The accuracy of data taken from botanical databases must also be considered when extracting data for selection. Melilotus albus, Trifolium aureum, Trifolium spadiceum, and Vicia sativa were listed as perennial in ILDIS but mentioned in other databases and floras as annual or biennial (Ciotir et al. 2019;Tutin et al. 1968;Roskov et al. 2005;POWO 2022). After discussing the species with local experts, it was determined that these species do not have perennial habits in northern Sweden. Data were still collected for these species, and Trifolium spadiceum was initially selected as a final candidate species. Following the first visit to plant populations of T. spadiceum, the inclusion of the species was in question due to plant height. The plants at each population measured under 20 cm and thus would have difficulty competing with other species in the ley mixture. Considering both the lack of perenniality and the plant height, T. spadiceum was removed from the final list of candidate species. Lathyrus japonicus Table 1 Median values of data collected from herbarium specimens of the 17 candidate species. The NA present in the median fruit date signifies that no specimens measured in the survey were in fruit.

Species
Median leaf length (cm) was added to the list in its place, as both its minimum and median flowering date fell between the first quartile and median of the flowering days for the species measured in the herbarium (Fig. 4). Following this change, the final candidate list was edited to include Anthyllis vulneraria, Astragalus alpinus, Lathyrus japonicus, Lathyrus palustris, Lathyrus pratensis, Vicia cracca, and Vicia sepium.
The large-scale compilation of data required for botanical databases presents challenges in data validation and verification. Data quality assessment is often overlooked in biodiversity database curation, as the process is time-consuming, particularly for levels of data validation that can only be done by experts in the field (Dalcin et al. 2012). The importance of data cleaning practices in biodiversity databases was highlighted in reports commissioned by the Global Biodiversity Information Facility, one of the largest biodiversity data infrastructures in the world (Chapman 2005a, b). The guidelines set out in these reports provide a standardized way in which to detect and address errors in biological collection databases. As the level of data validation for individual databases is often unknown to users, it becomes important to cross-reference any extracted data to ensure its accuracy. When utilizing these resources for selection of wild species, expert knowledge of the region or taxonomic group can provide an additional step of verification to confirm selection traits are accurately documented.

Seed collection
Between two and five populations were selected per species for seed collection to ensure that enough seed could be collected for the planned greenhouse and field experiments to study the agronomic potential of each species (Table 2). All populations selected were within a 150 km radius of Umeå, as seed maturation needed to be monitored frequently. Of the 44 populations identified in the region, 26 were selected for seed collection. Additional populations were initially identified in Artportalen, but when visiting the documented locations, no population was found. This discrepancy occurred Table 2 Site collection, seed collection date range, and 1000 seed weight of each accession collected for the seven candidate species. Seed collection dates are from the year 2020.

Species
Latitude ( 1 3 most frequently with potential populations of Vicia sepium. In nearly all incidences of missing V. sepium populations, populations of Vicia cracca were found in its place. This population disparity is likely due to either misidentification of V. cracca for V. sepium, population decline between the date of record in Artportalen and the date of visitation, or incorrect spatial data in the database. Seeds were collected between August 10th and October 10th, 2020, with Anthyllis vulneraria having the shortest duration of collection time (11 days) and Vicia sepium having the longest (61 days) ( Table 2). Collection date was determined by seed pod maturity, which was defined as when seed pods turned brown to black and were near dehiscence. One thousand seed weights varied between accessions for each species. The smallest 1000 seed weight was Astragalus alpinus (mean, 1.12 g; standard deviation, 0.13 g), and the largest was Lathyrus japonicus (mean, 35.16 g; standard deviation, 2.63 g) ( Table 2). The collection of seed is important not only for cultivation of these wild species to study their agronomic potential, but also to conserve their genetic resources through preservation in a gene bank. The inclusion of seeds from these wild legume species can contribute to the effort to conserve genetic diversity of crop wild relatives (Cowling et al. 2017;Fitzgerald et al. 2019).

Future development of selected species
The selection of these seven wild forage legume species is only the first step in a long process to potential cultivation and inclusion in leys. Much is still unknown about the selected species, and as such, extensive study of their agronomic potential is necessary. Understanding characteristics such as hard seededness, soil-type suitability, forage quality, anti-nutritional factors, and response to varying management intensities will be imperative in determining if these wild species can be included in cropping systems. Additionally, it is essential to identify their natural rhizobial symbiont, as commercial inoculants will need to be assessed for their suitability or new inoculants will need to be produced. Some work has been done to identify rhizobia of Swedish legumes through molecular methods, with Anthyllis vulneraria, Astragalus alpinus, Vicia cracca, and Vicia sepium already characterized Huss-Danell 2011, 2016;Ampomah et al. 2017). Perhaps the most important factor to consider will be their potential for seed production. In order for these wild legumes to have a positive impact on the sustainability of ley production, they must first be capable of producing seed on a large enough scale to make their inclusion in leys economically viable (Boelt et al. 2015). Without commercial seed production, these species have no hope of being incorporated into leys on any meaningful scale.
Following the collection of seed from wild populations, additional work is being done to assess the agronomic potential of the seven wild, forage legume species selected. Germination studies, greenhouse experiments, field trials, pollination surveys, and nitrogen fixation analyses are in progress or already completed using the collected seed. The results from this work will help to further narrow down the list of candidate species and focus domestication efforts on the species with the most potential for inclusion in northern Swedish leys.

Conclusions
Here, we have shown that the use of botanical resources allows for the empirical selection of native forage legume species based on specified characteristics of interest. Though the use of herbaria and databases to consolidate data on plant species is not new, the effort to use this data to focus on targeted agronomic traits of interest for selection of wild species to include in a specific agricultural system is novel. Botanical databases provided a time efficient way to sort through key plant traits for selection. Regional floras gave local context to the extracted data, as characteristics of a single species can vary greatly over a geographic scale. Herbarium specimens contributed information on the morphology and phenology of local plant populations growing under similar climatic conditions to agricultural production in the region. Using these resources, seven wild forage legume species native to northern Sweden were chosen due to their potential for inclusion in leys. The utilization of botanical resources as a method of wild species selection for domestication offers an information-rich platform from which previously unconsidered species can be assessed for their agricultural potential.
Additional agronomic traits, such as persistence, rhizobia specificity, and soil-type suitability, could have provided supplementary selection criteria; however, these data were either unstudied or unavailable in database form. The utilization of constructed databases allows this method of selection to be a time efficient way to identify wild species with agricultural potential. Challenges arose during data acquisition but could be resolved through acknowledging potential biases in herbarium specimens and ensuring proper data validation when utilizing botanical databases. Increasing collaboration between agronomists interested in wild species and botanists focusing on economically important plant taxa could assist in ensuring that botanical data is accurately utilized during the selection process. Although applied to an entire taxonomic group in this study, these methods have potential to further narrow down existing crop wild relative inventories by agronomic traits of interest. Additional work to obtain data on the agronomic traits of the seven selected wild, forage legume species is underway and will provide new information on important characteristics such as forage quality, potential anti-nutritional factors, response to management, and persistence when grown in a grassland agricultural system.