Introduction

While landraces are universally recognized as a highly valuable component of plant genetic resources for food and agriculture, their definition has been the subject of extensive discussion (Casañas et al. 2017; Villa et al. 2005; Zeven 1998). Broadly speaking, a crop landrace is a cultivated form of a domesticated plant species that is locally adapted mainly because of geographical isolation. More recently, greater attention has been given to the cultural value of a landrace, for example in terms of visible characteristics of identity, gastronomic value, and specific transformation processes and uses. This is to underline that the distinctive traits of a landrace are due not only to environmental adaptive traits but also to a deliberate selection by farmers (Pusadee et al. 2009), thus allowing them to be distinguished from ecotypes.

In the past, every agricultural production system was based on landraces, but they have been progressively replaced by more specialized germplasm since the nineteenth century. Currently, landraces play a role especially in traditional production systems of developing countries whereas in high-income countries, they are associated with amateur or traditional farming (Galluzzi et al. 2010; Gibson 2009). In advanced economies, landraces usually receive attention as niche products, because of the consumer perception of higher quality, and as a resource for crop improvement yet to be fully untapped. For instance, landraces are considered important germplasm to improve yield stability in the face of climate change, compliance to low input agriculture, resilience to stress, nutritional value, and quality (Bellon and van Etten 2014; Corrado and Rao 2017; Newton et al. 2011; Petropoulos et al. 2018).

The tomato (Solanum lycopersium L.) is among the most widespread and economically important vegetable crops worldwide (https://www.fao.org/faostat/). Following domestication in the Americas, this species was imported to Europe in the sixteenth century to later spread across the globe (McCue 1952). Italy and Spain are considered secondary centres of diversity because of the presence of a morphological variation that has not been documented elsewhere (Kulus 2022). In the cultivated tomato germplasm, genetic diversity has been mostly reduced because of founder effect and breeding (Miller and Tanksley 1990; Williams and Clair 1993), expect for the more recent introgression of variability from wild relatives (Bergougnoux 2014; Liang et al. 2017). Tomato landraces are therefore a valuable genetic material for their yield stability, resilience to stress, adaptability to organic farming, and fruit quality (Andreakis et al. 2004; Boziné-Pullai et al. 2021; Rouphael et al. 2021; Sumalan et al. 2020; Villareal et al. 1978). Moreover, tomato landraces are of increasing commercial importance because of the promotion of local food systems and short food supply chains as leverage towards more sustainable and resilient agricultural systems (Enthoven and Van den Broeck 2021; Samoggia et al. 2021).

Although several authors agree that landraces are dynamic, non-uniform populations, works documenting their genetic intra-variability are limited in comparison with those related to the study of their relationships or comparison with contemporary cultivars. Limiting our attention to a few predominantly selfing-species, the DNA analysis of a collection of 12 barley landraces of the Sardinia region of Italy indicated that intra-landraces variability accounted for 11% of the total variance (Papa et al. 1998). In a regional collection of 25 common bean landraces, the within landraces genetic variability was estimated at 32%, with almost all landraces composed of slightly different genotypes (Scarano et al. 2014). Genetic variability could be higher when confronting landraces from different nations, as demonstrated for Triticum aestivum (Dreisigacker et al. 2005), or from different regions, as in Lens culinaris (Fikiru et al. 2007) and Triticum durum (Pagnotta et al. 2005). In tomato, intra-landrace molecular variability has been found in secondary centers (García-Martínez et al. 2006, 2013; Mazzucato et al. 2010) and non-centers of origin for tomato (Alzahib et al. 2021).

On-farm conservation of landraces refers to the practice of maintaining and preserving traditional crop varieties by active cultivation where they have been traditionally grown and developed over generations, and provides multiple benefits (Bellon and van Etten 2014). Nonetheless, it has been also pointed out that farming cam also associates with contamination of landraces, either with other landraces or cultivars, for instance, because of the own seed production (Hagenblad et al. 2012; Kyratzis et al. 2019; Zeven and Waninge 1989).

The aim of this work was to analyze the genetic variability among and within 16 tomato landraces collected in the Campania region (Italy), representing different fruit shapes. Specifically, our aims included the quantification of the levels of genetic homogeneity, and the assessment of genetic differentiation and structure, considering that previous works indicated the possible presence of spurious and/or contaminant genotypes (Corrado et al. 2014; Rao et al. 2006). Among the different molecular markers available, we used the (GATA)4 minisatellite because of its ability to detect high levels of polymorphism. In tomato this marker type successfully distinguished cultivars and landraces that are otherwise difficult to differentiate (García-Martínez et al. 2013; Kaemmer et al. 1995; Rao et al. 2006).

Material and methods

Plant material

The work was carried out on sixteen tomato (Solanum lycopersicum L.) landraces present in the Campania region of Italy. Table 1 reports the code and the main characteristic of the fruits along with the farmer’s classification. A definition of the terms used in Table 1 to describe the overall fruit shape (e.g., circular, cordate, etc.) can be found in the UPOV guidelines for S. lycopersicum TG/44/11 (https://www.upov.int; accessed the 1 of February 2023). In this work, fruits with a narrow (elongated) cylindric (parallel) shape in lateral view are also defined as “San Marzano type”. Fruits with an average weight below 25 g were classified as "small".

Table 1 Plant germplasm under investigation

DNA analysis

We analysed four plants per landraces, for a total of 64 genotypes. DNA was isolated from the first pair of true leaves of plants germinated in a growth chamber. Leaves were harvested, immediately frozen in liquid nitrogen, and stored at  − 80 °C until use. Total DNA was isolated using the GenElute Plant Genomic DNA Miniprep Kit (Sigma Aldrich, Milan, Italy) and quantified using an agarose gel procedure (Sambrook et al. 1989). Five microg of total DNA was digested with 40 units of Taq I (Promega, Milan, Italy) overnight in a final volume of 450 µl following the manufacturer instructions. After ethanol precipitation and washing with 70% ethanol, the DNA was resuspended, and resolved in a 0.8% agarose gel in 1xTAE buffer for 18 h at 30 V (Sambrook et al. 1989). Following a 30-min denaturation in a 1,5 M NaCl and 0,5 M NaOH solution, and a 45 min neutralization in a 1,5 M NaCl and 1 M TrisHCl (pH 7,4) solution, the DNA was transferred overnight by Southern blotting to a Hybond N membrane (Amersham, Milan, Italy) using a 20 X SSC solution (Sambrook et al. 1989), and then crosslinked with UV (150 mJoule) using the GS Gene Linker (Bio-Rad, Milan, Italy). γ32PdATP 5' terminal labeling (Perkin Elmer, Milan, Italy), probe purification, hybridization with the (GATA)4 probe, washing and visualization on Kodak Bio-Max X-ray films were carried out as previously described (García-Martínez et al. 2013).

Data analysis

Bands were scored for presence (1) or absence (0) and treated as dominant markers. The Shannon index was calculated with the vegan package (Dixon 2003). AMOVA was carried out with the poppr package, using 9999 permutations for testing statistical significance (Kamvar et al. 2014). Hierarchical classification of the individuals and of the populations was performed with the ape package (Paradis and Schliep 2019), using the Provesti distance and the Unweighted Pair Group Method with Arithmetic (UPGMA) clustering algorithm. The cophenetic correlation coefficient between the distance matrix and the corresponding cophenetic matrix was calculated using the Mantel’s test (with 9999 permutations) implemented in the ape package.

Possible population structure was estimated using the software STRUCTURE v2.3 (Pritchard et al. 2000). The analysis was carried out using a burning period of 25,000 iterations and a run length of 250,000 MCMC replications. We tested a continuous series of K, from 1 to 12, in 10 independent runs each to verify the consistency across runs. The most informative K was identified using the ΔK method (Evanno et al. 2005) with the aid of Structure harvester (Earl and VonHoldt 2012). The estimated cluster membership coefficient matrices of each run were permuted (n = 1,000) so that all replicates have as close a match as possible, and then averaged across the 10 runs using the Largely Greedy algorithm of the software CLUMPP (Jakobsson and Rosenberg 2007).

Results

Genetic diversity among and within landraces

(GATA)4 DNA fingerprinting of the 16 landraces revealed a total of 29 reproducible scorable bands (Fig. 1).

Fig. 1
figure 1

(GATA)4 DNA fingerprinting. The figure reports as example a landrace with a single (STRL), two (CI) and three (CB) hybridization patterns

Main genetic parameters of the scored markers calculated on the whole population (n = 64) are reported in Supplementary Table 1. There were 27 polymorphic bands (93% of the total) considering all the individuals. The number of bands per landrace ranged from a maximum of 18 to a minimum of 8, for an average value of 12.4 ± 2.8 (mean ± s.d.). In total, we detected 22 multilocus genotypes. Eleven landraces were genetically homogeneous (i.e., they display a single DNA profile), while five had intra-landraces variability (i.e., bands were polymorphic within the landrace) (Table 2).

Table 2 Main genetic information of the (GATA)4 molecular analysis of the tomato landraces under investigation

Two different genotypes were detected for the CB and the SMC landraces, while only one plant of the CI and TM was different. The most variable landrace was the CB, with three different profiles (Shannon Index = 1.04). The hierarchical classification of the individuals indicated that larger intra-landrace difference was present for the SMC, a San Marzano type. Specifically, while two plants clustered along with landraces with similar fruit type (i.e., SMMU and SMMO), the other two SMC plants were in a distinct position, joining the tree at the last node (Figure S1). The number of private alleles was low, with only one specific band for the CB landrace and another specific for the homogeneous landrace TD. The number of less common bands among landraces (i.e., with a frequency less or equal to 25%) was also low, on average 14% of the total number of bands (Table 2).

As landraces were collected and named according to the information provided by farmers, we employed an analysis of molecular variance (AMOVA) based on the polymorphic molecular markers to identify a possible genetic differentiation. This analysis considered the pre-defined landraces as hierarchical level for partitioning genetic variation. There was a significant (p < 0.001) differentiation among populations (Table 3). Specifically, most of the variance was attributed to differences among populations, with 14.4% of the molecular variance occurring among samples within populations.

Table 3 Summary statistics of the analysis of molecular variance of the tomato landraces

To categorize landraces based on levels of genetic relatedness, we calculated pairwise genetic distances between pre-defined populations. Distances were employed to build a dendrogram using the UPGMA algorithm (Fig. 2).

Fig. 2
figure 2

Hierarchical clustering of the tomato landraces. The hierarchical clustering was performed using the UPGMA method based on Provesti distance between populations, on the horizontal scale

The distribution of the landraces indicated that all the pre-defined populations were well separated. The high Fst values indicated a significant pairwise differentiation between landraces, which was not surprising given the level of homogeneity observed within landraces (Table S2). The agreement between the genetic distance and the hierarchical classification was evaluated with the Mantel’s test. The cophenetic correlation coefficient was 0.81 and significant (p < 0.001), also suggesting the presence of a genetic structure in the data.

Inference of population structure

To classify the individuals in populations without using a priori knowledge, we used the model-based clustering method developed by Pritchard et alii (Pritchard et al. 2000). This analysis was also carried out to clarify the features of divergent samples identified in the hierarchical clustering approach, as they may be admixed genotypes (e.g., deriving from the cross of two landraces populations) or contaminants genotypes (e.g., not attributable to a particular landrace population). For the whole set of markers, the Evanno’s test indicated that the most informative number of populations was nine (Figure S2). The Q-matrix with the individual membership coefficients indicated that the samples were strongly assigned to the clusters (Table S3). At K = 9, a biological interpretation of the sub-division can be performed considering the fruit shape and folk taxonomy (this being also dependant on the morphological traits of the plants and its use). As illustrated in Fig. 3, the landraces with a small, circular, pointed fruits (PB and PBI) grouped together (C1, brown). The landraces with an elongated, San Marzano-type fruit (namely SMMO, SMMU, and two plants of the SMC) also clustered together (C2, grey). The other two plants of the SMC were considered a different population from all the others (C3, light green). The largest population was made of landraces with a medium size (around 40–50 g) circular fruit (TL, VD, CI, and GD) (C7, orange), with one plant of the CI (CI3) which appears to be admixed with another landraces (TI; C6, azure) with circular fruits. Another cluster was made of two landraces sharing the same fruits shape in lateral view (C8, blue). Nonetheless, the plants of the C9 group (dark blue) did not have a clear similarity considering fruit shape and size.

Fig. 3
figure 3

Estimated population structure of the tomato genotypes. Each genotype (reported on the X-axis) is represented by a horizontal line, which is partitioned into colored segments that represent the estimated membership fractions in the 9 clusters (C). See Figure S2 for the determination of the most informative K and Table S3 for the membership coefficient

Discussion

The appreciation of landraces has gained momentum in tomato because of their possible use in breeding as a source of adaptive and quality traits (Dwivedi et al. 2016). One of the main issues is that pedigree information, or more generally (co-)ancestry, is usually not available, and collections are usually made on the sole basis of farmers’ knowledge. The origin, differentiation, and relationships among and within landraces are often undetermined but this information is necessary to rationalize germplasm management and to guide future users in their sampling (Engels 2003; Spooner 2005).

This work evaluated the molecular diversity among and within locally cultivated tomato accessions using a highly informative DNA marker (Caramante et al. 2009). Many landraces were genetically uniform, as expected considering the predominantly selfing nature of tomato and the farmers’ maintenance selection (Zeven 2000). Variability within landraces was found in around 31% of the germplasm, although this estimation is based on markers scored as dominant.

The hierarchical classification of the individual suggested that, when present, the less frequent DNA profile of one landrace was not identical to those of others, expressing true variability rather than possible contamination or seed mixture. It was noteworthy the distinct position of two genotypes of the SMC landrace, because they joined the dendrogram at the last node, while the other different profiles were scattered within the dendrogram.

Considering the a priori knowledge (i.e., the predefined groups), the genetic analysis indicated that all the landraces' populations could be separated by the (GATA)4 analysis. Hierarchical classification based on population distances is useful to understand the molecular diversity of plants because it does not require assumptions (e.g., Hardy–Weinberg equilibrium) and can be applied to a wide range of dataset regardless of the biological properties of the species (being solely based on the chosen pairwise distance and clustering algorithm). At the high node level, some landrace groups (e.g., those with the San Marzano type fruits, or some with small round fruits) clustered together. Moreover, the high correlation coefficient between genetic distance and the cophenetic matrices is also considered a clue of the presence of a population structure (Odong et al. 2011).

We then assessed the possible population structure without using predefined groups. This allowed us not only to understand the genetic diversity and relationships among different landraces but also to infer the origin and nature of the detected intra-landrace variability. The data analysis indicated that in different instances there is a correspondence between fruit type and clusters, which is consistent with previous studies that have primarily focused on cultivated tomato varieties. These studies have suggested that tomato cultivars can be grouped based on their fruit shape and market classes, which are largely fixed by breeding in contemporary cultivars (Corrado et al. 2013; Sim et al. 2012, 2011).

The number of identified populations was high and this was supported by the high value of pairwise genetic differentiation between landraces (measured as Fst). Although Fst also depends on the sample size, natural plant populations tend to have lower Fst values compared to cultivated plant populations, especially predominantly autogamous crops like S. lycopersicum. This indirectly suggests that the presence of the Bayesan groups with a similar fruit shape is likely to be indicative of a common origin because gene flow among tomato landraces should be limited. Nonetheless, as the genotypes have been collected in a culturally, gastronomically, and geographically homogenous area, an effect of the environment (including selection for common traits) over subpopulations cannot be ruled out. We used neutral DNA markers, which have limited (if any) obvious function and should not be under adaptive selection pressure. (GATA)4 tandem repeats are expected to be inherited independently of traits under selection because of their genomic organization, hence they should provide an unbiased estimate of genetic variation (Vosman and Arens 1997). However, further studies, based for instance on more extensive genome scans, will have to determine whether the sub-populations of landraces identified by the Bayesian analysis mostly reflect an evolutionary relationship related to a single founder population or are mainly due to a common human selection pressure.

The identity of a plant landrace can be altered by cross-pollination, seed mixing, or accidental or intentional introduction of unrelated genotypes (Zeven 1999). This may be due to the proximity to other varieties or the farmers’ seed production from hybrid cultivars (Zeven 1999). The latter is likely to occur more frequently for species like the tomato, whose seed production is dominated by hybrid genotypes. It has been discussed the possibility that in Greece tomato landraces were initially derived from introduced germplasm and later, hybrid material (Gonias et al. 2019). Moreover, the landrace from the Italian region Umbria called ‘Conserva’ is suspected to derive from recent crosses between genetically diverse lines because of the highly negative Fixation Index, while the Observed Heterozygosity was close to zero as the other five landraces analyzed (Castellana et al. 2020). On the other hand, the DNA analysis of the Spanish landrace ‘Pera’ revealed the presence of two related sub-populations, which may represent true intra-landrace variability (García-Martínez et al. 2013). The hierarchical clustering of the individuals and the Bayesian analysis of the subpopulations indicated the presence of genotypes that can be considered contaminant (e.g., highly unrelated to the alleged and other landraces), as in the case of the SMC, and those deriving from admixture, as in the case of the CI landraces.

In conclusion, this study offers valuable insights for the characterization and conservation of cultivated tomato genetic resources in a secondary centre of diversity, indicating the strong differentiation of the landraces and the possible presence of landrace groups based on fruit shape. The result highlights the importance of the molecular characterization of traditional tomato varieties that are still being grown as a tool to enhance the credibility of conservation strategies, facilitate sound decision-making (especially at regional level), and promote further actions for a genetic-based selection and purification. To protect the genetic variability of this species, our work also underlines the importance to introduce strategies that encourage the cultivation and marketing of landraces that are adapted to local conditions, while aligning with traditional gastronomy. Finally, our study also highlights the need for comprehensive and rigorous testing of tomato landraces in secondary centers of diversity, to understand the role and impact of intra-landrace variability, identify undesired genotypes, preserve appropriate genetic diversity and ultimately, develop improved plant material.

Ackwoledgments

This study was partly carried out within the Agritech National Research Center and received funding from the European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR)–MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.4–D.D. 1032 17/06/2022, CN00000022). This manuscript reflects only the authors’ views and opinions, neither the European. Union nor the European Commission can be considered responsible for them.