1 Background

The INRAE chestnut germplasm collection is located in Villenave d’Ornon near Bordeaux (Gironde, France). This collection consists of two experimental plots called “A” and “E” composed of respectively 29 and 215 chestnut trees from three species: the European chestnut (Castanea sativa), the Japanese chestnut (Castanea crenata), and the Chinese chestnut (Castanea mollissima), as well as their hybrids. We characterized the chestnut trees from these study plots, other chestnut trees planted across the INRAE experimental station, and all remaining chestnut trees found within a radius of 1 km around the chestnut collection (Fig. 1). We genotyped all these trees with SNP markers, identified 113 genotypes (“Multilocus matches” function from GenAlEx, Peakall and Smouse 2012) and assigned them to chestnut species (STRUCTURE analysis, Pritchard et al. 2000; Larue et al. 2021b). We further described the architecture, male catkins and fruit production of all trees in 2018, and repeated seed set measurements in 2019.

Fig. 1
figure 1

Map of INRAE chestnut genetic collection and of all isolated chestnuts found within a radius of 1 km around the collection

2 Methods

2.1 Identification and geolocation of chestnut trees

We identified 275 chestnuts and geolocated them with a Garmin 64st. Tree positions were verified and corrected using QGIS Software (Qgis Desktop 3.16.4) with satellite photos from IGN BdOrtho. Tree coordinates are expressed in Lambert 93. Each tree received a unique identifier according to its position and this ID is used as reference across all files. There is an introduction register of INRAE chestnut germplasm established since 1950, but for this paper, with some few exceptions for illustration purposes, no attempt was made to systematically use common names for the varieties.

2.2 Genotyping

Leaves were sampled from all identified trees and stored at − 20 °C until analysis. DNA isolation was performed with a CTAB custom protocol (Larue et al. 2021b). Samples were characterized using 120 Single Nucleotide Polymorphism markers (SNPs) using Agena MassARRAY Platform (Larue et al. 2021b). We identified all samples having the same multilocus genotype using “Multilocus Matches” function from Genalex 6.503 (Peakall and Smouse 2012) and carefully inspected the results manually. We also computed “Multilocus near matches” to verify that different genotypes differ at multiple markers.

When the unique multilocus genotypes were identified (the genets), we obtained the consensus genotypes by summarizing genotypic data from all ramets of each genet. Finally, we used STRUCTURE software (Pritchard et al. 2000) to assign each ramet to species, as explained in Larue et al. (2021b).

2.3 Phenotyping

The architecture of each ramet was described using the diameter at breast height (at 1.3 m) for all stems > 1 cm, total height and canopy average diameter (in meter). We then calculated the basal surface area (in square meter). We measured the density of male flowers of unisexual catkins (number of flowers per square meter) to estimate the capacity of each ramet to produce pollen. We identified flower type (Fig. 2) according to Solignat and Chapa (1975), measured catkins length (in centimeter) and diameter (in millimeter), and estimated relative stamen density. The phenology of all trees was recorded twice a week in late spring of 2018 (from June to mid-July) using a specifically developed standardized scale (Larue et al. 2021c). Briefly, at each visit, each tree receives three scores, one for male flowers of unisexual male catkins, one for female flowers and one for male flowers of bisexual catkins. We estimated burr production (burr number per square meter) in July by counting the number of burrs in the canopy or on the ground underneath each tree. Finally, we collected burrs in the fall and estimated seed set by counting the number of developed nuts per burr (female inflorescences of chestnut trees are composed of three female flowers located side by side). Each flower, if pollinated, produces a fruit surrounded by the pericarp; if pollination fails, the pericarp is still present but remains empty. If the three flowers of an inflorescence are not pollinated, the burr contains three empty nuts.

Fig. 2
figure 2

Bisexual catkins from a male-sterile tree (astaminate): General view (a) and close-up view (b). Most stamens are aborted and do not protrude from male flowers. Bisexual catkins from male-fertile tree (staminate): General view (c) and close-up view (d). Stamens have long filaments and are clearly visible

3 Access to the data and metadata description

Data is available at: https://data.inrae.fr/dataset.xhtml?persistentId=doi:10.15454/GSJSWW

Associated metadata are described in a template called “original_file_metadata.xlsx”, and are available at: https://metadata-afs.nancy.inra.fr/geonetwork/srv/fre/catalog.search#/metadata/02c5ca07-1536-4f89-9a0c-9e8d44a91287

It can be cited as: Larue 2021, “Intensive study site: INRAE chestnut germplasm collection (Domaine de la Grande Ferrade, Villenave d'Ornon)”, https://doi.org/10.15454/GSJSWW, Portail Data INRAE, V9, UNF:6:ukulwpHEBuz9kZIrsE8SIQ =  = [fileUNF].

This dataset is composed of 17 Excel files: 1 file describing all variables entitled “0_0_Read_Me.xlsx” and 16 data files, described below:

We labeled and mapped all chestnut trees located inside and outside the INRAE plantations:

  • 1_1_List_Chestnuts.xlsx contains a unique identifier for each chestnut tree.

  • 1_2_List_INRAE_Chestnuts_Germplasm_Collection.xlsx is a list of all chestnut trees that are part of the two plantations, making-up the INRAE germplasm collection (excluding isolated trees that were not planted as part of this germplasm collection).

We genotyped all these chestnut trees at SNP markers:

  • 2_1_Genotypes.xlsx is the corresponding raw data.

  • 2_2_Genotypes_Genalex_Input.xlsx includes the SNP genotypes in Genalex format.

We identified trees having the same multilocus genotype (clones):

  • 3_1_Clonal identification.xlsx

We compiled a list of all unique multilocus genotypes (the genets):

  • 4_1_Consensus_genotypes_Genets.xlsx

We then attributed a genet to each ramet (several ramets of the same genet are grafted on different rootstocks).

  • 5_1_Genotypes_Ramets.xlsx

Using a Bayesian approach (Structure), we assigned each genet to the different gene pools corresponding to different chestnut species.

  • 6_1_Species_Identification_Genets.xlsx

Finally, we described the phenotype of all chestnut trees in the intensive study plot:

  • 7_1_Chestnuts_Phenotypes. xlsx is a description of the architecture of the trees.

  • 7_2_Male_Catkins. xlsx includes a description of the male catkins of the trees.

  • 7_3_Phenology. xlsx We scored the phenology of the trees during spring 2018.

  • 7_4_Chestnuts_Production_Estimates. xlsx We estimated burr production of the trees during summer 2018.

  • 7_5_Seed_Set_2018.xlsx We recorded seed set of the trees in fall 2018.

  • 7_6_Seed_Set_2019. xlsx We also recorded seed set of the trees in fall 2019.

We listed 275 chestnut trees: 244 adult trees in the INRAE germplasm collection (A plot = 29 trees/E plot = 215 trees), 24 small trees in the nursery, and 7 adult trees outside INRAE campus. All trees are geolocated, but all the young trees of the nursery are represented by a single GPS point. Of the 275 trees identified, two died before collecting leaves for DNA isolation, so we have the genotypes of 273 individuals. We identified 113 unique genotypes (genet), with an average of 2,4 ramets/genet. All this information is summarized in:

  • 8_1_Summary. xlsx

All chestnuts are located on a map:

  • 8_2_Map. xlsx

4 Technical validation

We validated the dataset first by hand and then using numerical and graphical analyses with R software (R software v4.0.4). Laboratory and measure equipment were regularly calibrated, and standards were used for each analysis. Genotyping errors with SNPs using MassArray platform are extremely rare (Guichoux et al. 2011; Larue et al. 2021b), and it is therefore possible to quickly and reliably characterize a large number of samples at low cost. The genetic characterization of this collection is a first step for the creation of a database to describe chestnut cultivars. Users will be able to genotype their samples with the same markers (or a subset of them) and compare their results with the database.

5 Reuse potential and limits

The reuse of the data presented here is simple. The Excel files can be easily imported into R by saving them as.txt or.csv files with minor modifications. Parts of the data were used successfully in previous studies, demonstrating their usefulness and portability. For instance, we successfully performed Structure analyses (Pritchard et al. 2000) with 68 SNPs and 94 SSRs (Larue et al. 2021b), showing that the SNP markers are very reliable to identify clones, species and interspecific hybrids, including advanced hybrids.

The collected phenology data are also very detailed, allowing inter-varietal and interspecific comparisons, as performed in Larue et al. (2021c). In Larue et al. (2021a), we describe in detail the phenology of two ramets for each of eight genets. We show that whereas the phenology of the different genets can vary greatly, it is very repeatable among ramets of the same genet. Seed set measurements also provide valuable data that allow us to highlight differences in probability of fecundation according to the flower types. In particular, we have shown that astaminate trees have a higher see-set than staminate tree (results not shown).

A limitation of the present work is that we have not attempted to provide common names for all accessions studied. This would require a lot of curation, which is under way. Indeed, by genotyping the chestnut collection, we have highlighted problems of varietal identification. In some cases, a single genet has been designated with several cultivar names, a case of synonymy. In other cases, different genets have received the same cultivar name, a case of homonymy. At this point, trees are therefore named only according to their position in the plot.

In the literature, male catkins are classified in four categories according to the length of stamen filaments: astaminate (no stamens emerging from the flowers), brachystaminate (stamens 1–3 mm), mesostaminate (stamens 3–5 mm) and longistaminate (stamens > 5 mm), with pollen production of the tree strongly depending on flower type. However, variation of pollen production across trees is a continuous trait and these categories have limits.

Note also that when we calculated the seed set, we measured the number of fruits per burr, i.e. the number of flowers in each female inflorescence that give a fruit, but not burr set, i.e. the percentage of female inflorescences that give burrs. The probability of fecundity is therefore overestimated. To better estimate pollination success, it would be necessary to measure both burr set and seed set, which would be very labor-intensive.