Introduction

World food security depends on only a small fraction of the thousands of crops available (Jaenicke et al. 2009; Martin et al. 2019). Many of the underutilized crop species, also known as orphan crops, have never been extensively cultivated locally or regionally. One underlying reason is that large investments in research and breeding are driven mainly by the commercial interests of developed countries and with little profit from orphan crops (World Bank 2007). Underutilised crops do not appear to be sufficiently competitive with staple food crops grown in the same agricultural area. Therefore, farmers prefer already well-established crops to provide for the world food market (Padulosi and Hoeschle-Zeldon 2004).

The term “underutilised” may refer to an intrinsic lack of competitiveness, or to a crop that has not yet realized its full agricultural potential for particular stakeholders or in specific regions. For example, chickpeas (Cicer arietinum) are largely cultivated in Asia, but they are considered to be underutilised in Italy (Padulosi et al. 2002; Padulosi and Hoeschle-Zeldon 2004). Many orphan crops have great potential, to serve as an innovative, sustainable and safe food source under climate change conditions, especially for developing countries (Dawson et al. 2009; Raheem 2011; Chivenge et al. 2015; Mabhaudhi et al. 2019; Tadele 2019). Their nutrients, medicinal effects and biodiversity can contribute a great value to achieving the Millennium Development Goals and act against unbalanced diets. Since the large diversity of underutilised crops that are known regionally and locally is often also of great cultural value and supports social heterogeneity (Jaenicke and Hoeschle-Zeledon 2009), orphan crops adapted to their area of origin may outperform dominant world crops in many circumstances (Fahey 1998). Despite their huge potential, orphan crops have not received enough attention in terms of initial improvement in their quantity and quality, which would not only increase awareness and investment, but also enhance food security in underdeveloped countries (Tadele 2019).

Fluted pumpkin (Telfairia occidentalis Hook. F.), commonly known as Ugu, is one of many orphan crops with great potential and for which increased knowledge and resources would be beneficial to improve food security. Due to their high productivity, cultivated landraces of T. occidentalis are maintained by small farmers as a major nutritional food and source of income for their livelihood. It is a dioecious flowering plant in the Cucurbitaceae family (Fayeun et al. 2016a; Okoli and Mgbeogu 1983), whose sex expression is likely genetically determined (Grumet and Taft 2011). Thought to originate from Southern Nigeria, the species is mostly cultivated in West Africa for its nutritious and edible leaves as well as healthy oils and protein-rich seeds (Akoroda 1990; Akoroda et al. 1990; Okoli and Mgbeogu 1983; Badifu 1993). Individuals of T. occidentalis can reach a vine length of up to 4.7 m when flowering (Nwonuala and Obiefuna 2015) and female plants can produce two to five large fruits weighing 2–20 kg containing up to 200 flat, round seeds around 5 cm in diameter (Adeyemo and Tijani 2018; Okoli and Mgbeogu 1983). Since only female plants produce useful fruits containing seeds and larger succulent leaves than the males of T. occidentalis, they are considered more beneficial and male plants tend to be regarded as a waste of energy (Akoroda 1990; Chukwurah and Uguru 2010; Fayeun et al. 2016b). Despite reports of morphological and floral differences between females and males (Chukwurah and Uguru 2010; Fayeun et al. 2016a; Okoli and Mgbeogu 1983), neither morphological nor molecular traits have yet been identified to successfully support reliable sex determination of immature males versus females of T. occidentalis (Ndukwu et al. 2005; Fayeun and Odiyi 2015).

A few studies have used site-associated DNA sequencing data (RADseq) to determine plant sex (Gamble et al. 2015; Jeffries et al. 2018; Scharmann et al. 2019; Morgan et al. 2020), including Cucurbitaceae (Matsumura et al. 2014). Undoubtedly, RADseq appears to be versatile and robust enough to characterize genome-wide polymorphisms in non-model species that do not have an existing reference genome and therefore enable the reliable identification of loci associated with sex determination (Feron et al. 2021; Palmer et al. 2019; Peterson et al. 2012). In this study, we used single nucleotide polymorphisms (SNPs) to examine the structure of genetic variation within and among populations of selected landraces of T. occidentalis grown in Nigeria. We also identified possible sex-determining loci in this orphan crop and used consistent sequence variation between males and females among the landraces to develop, design and validate molecular assays for determining the sex of immature T. occidentalis plants.

Materials and methods

Plant samples and DNA extraction

For each cultivated landraces of T. occidentalis from five states in Nigeria (Lagos, Anambra, Cross River, Oyo, Ekiti; Table 1), which are representatives of the species distribution, a whole fruit was collected and between 32 and 124 seeds were obtained (Table 1). Seeds were germinated and grown in nursery bags and transplanted in a protected field in the Post and Telecommunication area of Baruwa/Ipaja, Alimosho Local Government, Lagos, and in the field at University of Lagos, Nigeria in between December 2019, and July 2020. The sampled plants were allowed to grow to maturity and young leaves were collected from a total of 139 individual plants for DNA extraction. Leaf samples were immediately preserved and stored in silica gel for later transfer from Nigeria to Switzerland. For molecular analyses, we genotyped 139 plants, out of which a total of 100 plants reached sexual maturity and were phenotyped as male vs female. The sexes were identified by the morphology of fully developed distinct male/female flowers in the field.

Table 1 Samples of Telfairia occidentalis collected from cultivated landraces in five States of Nigeria used in this study

Genomic DNA was extracted from 10 to 20 mg of dried leaf tissue for each of the 139 plant samples using the QIAGEN DNeasy plant kit as recommended by the manufacturer (Qiagen, Hombrechtikon, Switzerland). The quality and quantity of each DNA sample were checked using NanoDrop ND-1000 Spectrophotometer and Qubit 3 Fluorometer (both by Thermo Fischer Scientific). Electrophoresis on a 1% agarose gel was used to perform the DNA integrity of a subset of samples.

Generating the ddRAD sequencing data set

Three ddRAD libraries (Table S1) were prepared from the 139 DNA samples of T. occidentalis, together with one randomly selected technical replicate from each landrace (total of 144 samples), following the protocol of Peterson et al. (2012) as modified by Grünig et al. (2021). In brief, 150 ng of DNA was digested with the restriction enzymes EcoR1 and Mse1 (New England Biolabs) for 1 h at 37 °C. Reads from each sample were individually tagged by ligating digested DNA to EcoR1 adapters including one of 48 barcodes of five base pairs (bp), and to biotin-tagged MseI adapters including three different indexes (Table S2). Mse1 indexes also contained four degenerated bp which allowed PCR duplication to be identified. For each library, fragments around 550 bp were selected with AMPure XP beads (Agencourt) and stored with Dynabeads M-270 Streptavidin (Invitrogen). Each library was PCR amplified in a thermal cycler (Eppendorf Mastercycler) with initial denaturation at 98 °C, followed by ten cycles of 10 s at 98 °C, 30 s at 65 °C and 30 s at 72 °C.

The concentration of PCR-products was estimated using Qubit high-sensitivity assays (Thermo Fischer) and the distribution of fragment sizes in the library was determined using an Agilent 2100 Bioanalyzer (Agilent Technologies) following the manufacturer’s protocol. The three libraries with a total of 144 randomly distributed samples ended up being pooled at equimolarity and paired-end sequenced in 300 cycles (2 × 150 bp) on an SP flow cell of the Illumina NovaSeq 6000 at the next-generation sequencing (NGS) platform of the University of Bern.

Sequenced raw reads were examined for quality, demultiplexed and checked for intact restriction associated DNA cut sites and barcodes using process radtags (Catchen et al. 2011, 2013). PCR clones were identified by clone filter and removed (Catchen et al. 2011, 2013). Illumina adapters were trimmed, and reads were quality-filtered using trimmomatic (Bolger et al. 2014). Only reads of 100 bp or more, which also met an average Phred quality score of 15 within a four base sliding window, were kept.

Genotyping was performed following the dDocent pipeline (Puritz et al. 2014). Accordingly, a de novo catalogue was created based on reads that appeared at least twice and in four individuals (first and second cutoffs) and that were clustered into contigs using Rainbow based on at least 10 reads with a minimum similarity set to 0.5 (Chong et al. 2012). The Rainbow assembly was checked for overlaps between newly assembled forward and reverse reads using PEAR (Zhang et al. 2014). Contigs from Rainbow were then aligned with cd-hit-est, allowing for 80%, 85% and 90% sequence similarity (Li and Godzik 2006) and resulting in three different de novo catalogues upon which the one based on 85% sequence similarity was selected as most robust for downstream analyses (Table S4). Raw reads were aligned to the de novo catalogue with the programme Burrows-Wheeler Alignment tool (BWA; Li and Durbin 2009) whereas mapping quality was filtered using sambamba (Tarasov et al. 2015). Samtools (Li et al. 2009) indexed the reference sequence and HaplotypeCaller and GatherVcfs from the gatk4 package (McKenna et al. 2010) were then used for SNP calling and the gathering of generated VCF files into one VCFtools (Danecek et al. 2011) was then applied for filtering of what were earlier called SNPs.

The ddRAD sequencing error rate was calculated as the number of allelic differences between the technical replicates and the corresponding original sample divided by the total number of comparisons (Bonin et al. 2004).

Genetic diversity of the five landraces

Genetic variation from SNPs (18,507 SNPs on 8657 contigs) among individual samples of T. occidentalis was checked for population structure within and among landraces with a principal component analysis (PCA) using the R libraries “vcfR”, “adegenet”, “adegraphics”, “ggplot2” and “reshape” in R Studio (version 1.3.1093).

Genetic variation was pruned from SNPs in linkage disequilibrium using plink2 (Purcell and Chang, www.cog-genomics.org/plink/2.0/; Chang et al. 2015) and the pruned dataset (9077 SNPs out of 18,507) was used in STRUCTURE (version 2.3.3; Porras-Hurtado et al. 2013; Pritchard et al. 2000) with the admixture model of ancestry to infer the partitioning of genetic variation within T. occidentalis. Ten iterations of STRUCTURE for K = 1, 2, 3, 4 and 5 were performed with a burn-in period of 100,000 and 1,000,000 iterations. Results were processed using STRUCTURE HARVESTER (Earl and vonHoldt 2012) and the K-value that best describes data was evaluated as in Evanno et al. (2005). STRUCTURE outputs were firstly aggregated with CLUMPP (version 1.1.2; Jakobsson and Rosenberg 2007) and then used to create the bar plots in R studio.

Identifying sex-determining loci

Genetic variation associated with sex phenotypes (i.e., putative sex-determining loci) was estimated using the hierarchical AMOVA framework in ARLEQUIN (version 3.5; Excoffier and Lischer 2010). Samples from the Anambra landrace were taken out of this analysis because the low number of females and males would not have contributed to strong evidence (Table 1). The partitioning of pruned genetic variation among 96 samples of T. occidentalis with known sex (51 females + 45 males) was thus estimated among four groups (i.e., landraces), with individuals of different sexes as nested populations (Excoffier and Lischer 2015). As a result, FSC-values relate to the differentiation between males and females within and among landraces, identifying loci with high FSC-values indicative of SNPs that are strongly differentiated between sexes and that may be part of or tightly linked to sex-determining loci (Charlesworth 2016; Holsinger and Weir 2009). The top 0.1% of most differentiated loci between sexes among landraces (i.e., SNPs with FSC-values >  = 0.4) were identified. Such a threshold appears rather conservative when heterozygosity is considered (as could be expected in the case of an XY sex-determining system, where a maximum FSC of 0.5 can be reached). The SNPs that reached an FSC-value >  = 0.4 were checked for their location along contigs and consistent association with either male or female samples (checked on the unpruned genetic dataset with 18,507 SNPs on 8657 contigs).

PCR-based sex-determining assay

As is clear from the results, one contig was identified as a top candidate associated with sex determination in T. occidentalis and the locus was thus further characterized.

PCR primers specific to that locus of interest were designed using primer3.ut.ee (version 4.1.0) and synthetized by microsynth.ch. Primers (each 10 mM) were used for PCRs in 23.625 μl, with 5 × GoTaq buffer (Promega), 10 mM dNTPs (Sigma Aldrich), ddH2O and 0.125 × GoTaq polymerase (Promega) on DNA templates from the 100 phenotyped samples. PCR amplification was carried out in a thermal cycler (Eppendorf Mastercycler) with the following conditions: 3 min of denaturation at 95 °C followed by 30 cycles of 30 s at 95 °C, 30 s at 54 °C and 30 s at 72 °C. To complete the sequence, the top candidate contig left ambiguous by paired-end 2 × 150 bp sequencing reads, PCR products from five females and one male were Sanger sequenced (service provided by Mycrosynth, Balgach, Switzerland).

To develop an assay based on cleaved amplified polymorphism sequences, the PCR product was digested by adding 0.1 × of the restriction enzyme Mfe1-HF (NEB), 10 × CutSmart Buffer (NEB) and ddH2O resulting in a final volume of 25 μl for 2 h at 37 °C. Mfe1 cuts the sequence motif 5′CAATG 3′ that is mostly shared by female samples on the candidate locus, but fails to do so on the variant that is common to most male plants (i.e. 5′CCATG 3′). Digested PCR products were separated by electrophoresis on a 3% LE agarose gel in 1 × TAE buffer for 2 h 15 min at 50 Volts with a 100 bp ladder, stained by GelRed and visualized under ultraviolet light.

A faster and cheaper assay based entirely on PCR was subsequently developed, relying on sex-specific primer pairs designed based on the different sets of SNPs reported along the candidate locus in the female and male plants investigated here. PCR conditions were similar to the assay described above, with 5 × GoTaq Buffer (Promega), 10 mM dNTPs (Sigma Aldrich), 10 mM of each sex-specific primer (forward/reverse female, forward/reverse male), ddH2O and 0.125 × GoTaq polymerase (Promega) in a final volume of 25.625 μl, amplified by 3 min of denaturation at 95 °C followed by 30 cycles of 30 s at 95 °C, 30 s at 54 °C and 30 s at 72 °C. PCR products were visualized under ultraviolet light after electrophoresis on a 2% LE agarose gel in 1 × TAE buffer for 2 h 15 min at 50 Volts with a 100 bp ladder and staining by GelRed.

A detailed protocol for both tests can be accessed in Supplementary Information 2.

Following visualization of PCR products, the two assays were ascertained by counting the number of times the phenotype (i.e., sex) predicted by the observed genotype matched the observed phenotype. This ratio of observed sexes versus expected sexes was investigated for all 53 tested females and once for all 47 tested male T. occidentalis plant samples.

Results

Double digest restriction-site associated DNA sequencing data and genetic diversity

All 139 samples of T. occidentalis were successfully sequenced and genotyped following the ddRAD pipeline that resulted in 18,507 SNPs called on 8657 contigs. Although one technical replicate (A16r) was filtered out, the comparison of the four remaining ones estimates an average error rate of 4.4% (Table S5).

By using PCA (Fig. 1), variation based on those SNPs reveals three main genetic clusters among the five landraces of T. occidentalis investigated here. Samples belonging to the Anambra landrace cluster together and are differentiated from other samples along the first principal component of maximum genetic variation. The second principal component mainly differentiates Ekiti from the Cross River, Oyo and Lagos samples that thus appear more genetically coherent. One Anambra plant sample (A100) clusters together with Cross River, Oyo and Lagos (Fig. 1, supplementary material Figures S1 and S2) due to a likely mix-up of labels during either sowing of seeds or ddRAD library preparation.

Fig. 1
figure 1

Principal component analysis of SNPs from ddRAD sequencing among 143 samples of Telfairia occidentalis. The different landraces are coloured differentially according to the panel. The first principal component axis (PC1) explains 8.3% and the second principal component axis (PC2) explains 5.6% of the overall genetic variation

Population genetics

Model-based analyses of the pruned dataset with 9077 SNPs in low linkage disequilibrium using STRUCTURE indicated that genetic variation within T. occidentalis can be divided into three main clusters, confirming insights from PCA. As shown in Fig. 2, K = 3 shows the most closely the genetic variation as supported by the delta K-value. Genetic structure among individual samples under K = 3 revealed that Anambra and Ekiti each form a distinct group that can be genetically distinguished from others. Cross River, Oyo and Lagos appear to have a coherent genetic background and thus form a third genetically distinguishable population. All these findings provide strong evidence that the genetic data of T. occidentalis plant samples used in this study were can be divided into three main populations. As already mentioned, plant sample A100 (Anambra) presents very similar genetic data to samples from Cross River, Oyo and Lagos and is thus unambiguously assigned to this cluster.

Fig. 2
figure 2

Genetic structure within Telfairia occidentalis based on STRUCTURE analyses of a pruned dataset of 9077 SNPs in low linkage disequilibrium. a Mean L(K) ± SD. Visualized with STRUCTURE HARVESTER. The value of the likelihood distribution L(K) levels off as soon as the real K-value is reached. b Visualized with STRUCTURE HARVESTER. Delta K showing a peak at an optimal K = 3. c Genetic data were obtained from the STRUCTURE HARVESTER processing and clustered with CLUMPP for visualisation in R Studio as a bar plot of the genetic data from every individual. All 143 individual names are listed along the X-axis. The Y-axis shows how much genetic variation is shared with the other individuals in 3 genetic clusters

Sex-determining locus

The genetic data of 96 phenotyped plant samples (without Anambra landrace) was used to search for a sex determining locus. The programme ARLEQUIN produced FSC-values for each of the 18,507 called SNPs (average FSC-value ~ 0.05). Twenty SNPs with an FSC-value >  = 0.4 were identified of which ten were on contig number 4813 of the de novo reference catalogue. The other ten SNPs were distributed on nine different contigs. Contig 4813 is therefore the most promising sex-specific locus. All the SNPs of contig 4813 with their corresponding FSC-values are listed in Table S6b. The FSC-value of ~ 0.46 (mean FSC-value of the ten SNPs on contig 4813) is consistent with heterozygosity at the candidate locus in male plant samples. The mean expected heterozygosity is 0.08 for females and 0.53 for males (SNPs with an FSC-value >  = 0.4; Table S6a).

Figure 3 shows the most promising candidate contig 4813 as a sex-determining locus identified in this study. The alternative base for the SNP on contig 4813 at position 53 is a cytosine (Table 3). According to the genetic data, this SNP occurs in 94.3% of all plants of which the sex of the female phenotype is known in advance (50/53) and the alternative occurs in 93.6% of all plants of which the sex of the male phenotype is known in advance (44/47), which were used to generate the ddRAD libraries. The nucleotide sequence of the sex-associated candidate contig 4813 in Fig. 3 showed no significant matches on Web BLAST (last search 22.06.21).

Fig. 3
figure 3

Sex-associated candidate locus in Telfairia occidentalis (contig 4813) derived from ddRAD sequencing. Letters that are in bold and capital represent the ten SNPs with FSC-values >  = 0.4 and small, bold letters are SNPs with FSC-values < 0.4. The female variant is 422 bp long, whereas the male variant is of unknown length and is represented as dashed lines. The boxed letters indicate the Mfe1 recognition site. The primers Tocc-CAPS-F and Tocc-CAPS-R (straight underlined sequences) do not include any sex-specific SNPs and result in a 328 bp PCR product to be digested using Mfe1, with males being expected to present three bands of 34 bp, 294 bp and 328 bp (heterozygosity), whereas females are expected to present one band of 34 bp and one of 294 bp. The primers Tocc-female-F and Tocc-female-R (wavy lines) include SNPs that are specific for most female samples of T. occidentalis and result in PCR products of 297 bp in length. The primers Tocc-male-F and Tocc-male-R (dotted lines) include male specific SNPs and result in PCR products of 120 bp in length

Female-specific PCR primers designed for contig 4813 (Table 2) cover five SNPs that must be present in female plants (Fig. 3b). This is true in 49 out of 53 female plant samples (92.5%) which means that four plants have at least one variant that is most common in contig 4813 of male T. occidentalis plant samples. Genetic data shows that three of these four female plant samples carry all the male base variants. One female plant sample (C_I_29) carries just one SNP which most male plant samples share (cytosine instead of guanine) at position 41 (Table 3). In the case of the male specific PCR primers (Table 2), they cover four SNPs (Fig. 3c). The genetic data shows that 45 out of 47 male plants carry these four SNPs (95.7%). Two male plant individuals in this study have all four nucleotide variants that most female fluted pumpkin plant samples share.

Table 2 PCR primers used for sex-determining assay using Mfe1 as the restriction enzyme and two primers used for sex-determination via sex-specific PCRs in Telfairia occidentalis, with oligonucleotide sequences and corresponding names, ending with either -F (forward) or -R (reverse)
Table 3 Genetic variation (SNP) along with the top candidate contig 4813 as a sex-determining locus in Telfairia occidentalis and differentiation between sexes shown by FSC values >  = 0.4 (marked with *)

PCR-based sex-determination

The assay using cleaved amplified polymorphism sequences showed visually clear patterns of PCR products. Tested female plants all show a banding pattern consistent with homozygosity, with one clear band at 294 bp and a lighter band at 34 bp (Fig. 4a). In contrast, tested male samples are all confirmed as heterozygotes at the targeted locus (contig 4813), with two strong bands (around 294 bp and 328 bp) and a lighter third band (around 34 bp). If the males were homozygous for contig 4813, only one band around 328 bp would be visible because Mfe1 does not detect the SNP variant at position 53 (Table 3) like in most male plants.

Fig. 4
figure 4

Banding patterns of molecular assays illustrating the sex determination of Telfairia occidentalis samples for 4 males and 4 females, along with a 100 bp ladder. a Cleaved amplified polymorphism sequences on 3% agarose gel electrophoresis b Direct PCRs on 2% agarose gel electrophoresis. This is an assemblage cropped from a larger gel picture used to genotype several samples and shown as supplementary information Fig. S4

Sequence data predicts that the test could reveal the right sex 50 times out of 53 (94.3%) among T. occidentalis female plant samples and 44 times out of 47 (95.7%) among male plant samples. This indicates that one plant sample (L_I_80, Lagos) was considered male based on the flower phenotype, although its genotype predicts a female. Furthermore, the assay failed on two male samples (L_I_76 from Lagos and C_I_40 from Cross River) and the three female samples (Ib_I_52 and Ib_I_49 from Oyo, C_I_39 from Cross River). In the case of those two male plant individuals, their sequence data were the same as most female T. occidentalis plant individuals, but they were phenotyped as males. Vice versa, this applies to the three female plants where the assay failed.

The molecular assay using direct PCR via sex-specific primers successfully produced distinguishable banding patterns, with female samples showing one clear band at 297 bp, whereas males show one strong band at 120 bp and a lighter one at 297 bp, supporting a heterozygous genotype in males. Unlike the banding pattern generated via restriction enzymes, this assay via sex-specific primers generated a more complex pattern since primer dimers and other nonrelevant bands are visible. Nevertheless, sex-specific patterns could be unambiguously identified.

Sex-determination via sex-specific primers successfully tested 50 out of 53 female samples. The female individual C_I_29 (Cross River) that had one SNP at position 41 as in most male samples (i.e., cytosine instead of guanine) was genotyped as a female here. The sex of 45 out of 47 male samples was correctly determined. These results suggest an assay certainty by sex-specific primers of 94.3% for females and 95.7% for male T. occidentalis plants.

Both sex-determining tests failed to adequately predict the phenotype based on the genotype on the same samples (i.e., L_I_76 (Lagos) and C_I_40 (Cross River) that should have shown a male genotype and Ib_I_52, Ib_I_49 (Oyo) and C_I_39 (Cross River) that should have presented a female genotype). The male individuals, on which the assays failed, have shown to carry all the SNPs that are usually expected in female T. occidentalis plant samples, while the converse is true for the female plant individuals Ib_I_52, Ib_I_49 (Oyo) and C_I_39 (Cross River). All the inconsistencies listed above can be seen in the supplementary information in Fig. S4.

Noticeably, both assays were performed accurately on the two female and two male samples from the Anambra landrace (Table 1) that were not used to initially identify contig 4813 as a candidate sex-determining locus.

Discussion

Genetic structure within T. occidentalis based on ddRAD sequencing

Reducing genome complexity for genotyping an underutilized crop using ddRAD sequencing based on rare (EcoR1) and frequent (Mse1) cutter restriction enzymes has here successfully captured the genetic variation of T. occidentalis that does not yet benefit from an already existing reference (Peterson et al. 2012; Palmer et al. 2019). Here, candidate sex-determining loci were detected through the partitioning of genetic variation between sexes while taking the variation due to genetic backgrounds of different landraces into account (i.e., FSC-values in a hierarchical AMOVA framework), which advantageously relies on limited assumptions regarding the presence of sex chromosomes. Unlike approaches that assume the presence of sex-determining loci on X chromosomes but absence on Y chromosomes (or vice versa; as in Jeffries et al. 2018; Scharmann et al. 2019; Morgan et al. 2020), the analysis of FSC values also highlights sex-specific loci that suppress the development of the opposite sex, as is the case in dioecious plants (Charlesworth and Charlesworth 1978).

Landraces of Telfairia occidentalis

Samples of T. occidentalis used to create ddRAD libraries were all collected from five different states of Nigeria and they were therefore expected to be genetically structured into five distinct groups. Multivariate PCA on SNPs (Fig. 1) as well as model-based inferences using STRUCTURE rather indicated three main genetic clusters (K = 3) within the species. The Anambra and Ekiti landrace samples were observed to be genetically homogenous and most distinct from the other genetic clusters, indicating little to no genetic exchanges for a longer period of time. In contrast, the remaining samples of Cross River, Oyo and Lagos landraces appear to share more genetic variation than other landraces and therefore group into a third genetic cluster. Although such genetic similarity may seem counter-intuitive at first glance, since the Cross River state is geographically distant from Lagos state and Oyo state, it should be kept in mind that T. occidentalis has probably been long indigenous in the regions of the Delta state, Imo state, Anambra state and up to Cross river state (7°–8° east and 5°–6° north). Plants cultivated by the Igbo people native to these regions and spread Telfairia to other parts of Nigeria (Akoroda 1990), so that the introduction of T. occidentalis in Lagos and Oyo could be too recent (Akoroda 1990) to yet be marked as patent genetic differentiation among samples from Cross River, Lagos and Oyo backgrounds. Given that Lagos and Oyo are geographically closer to each other, gene flow is more likely to have contributed to the genetic similarity of Oyo and Lagos (Wang et al. 2013). On the other hand, farmers in Lagos and Oyo may have collected seeds from Calabar (capital of Cross River state). Also, these varieties from Cross River, Lagos and Oyo states exhibited similar morphology in forms of growth habit and leaf (colour, shape, size), fruits and productivity and could be preferred by farmers (Oyenike Arike Adeyemo, personal observations). In contrast, diverse morphological traits were observed in the Anambra and Ekiti landraces. More comprehensive sampling of T. occidentalis landraces across its distribution range would provide valuable light on the domestication of fluted pumpkin, the significance of divergence among landraces and the potential for genetic resources in this underutilized crop.

Sex phenotype versus genotype

The sex-determining assays developed here accurately predict the male vs female phenotype in at least 94% of all cases. In some cases, the genotype (and thus results of the sex-determining assay) predicted the opposite sex than revealed by the phenotype. Five samples (i.e., three females L_I_76, C_I_40, Ib_I_52 and two males Ib_I_49, C_I_39) revealed consistent incongruencies between genotype and phenotype. The underpinnings of such failure remain unclear and could either be due to mislabelling or be of biological origin.

Although there is only limited evidence that T. occidentalis presents sex chromosomes, a XY system was postulated based on cytogenetics, which seems here supported by the heterozygosity of the candidate contig 4813 associated with male determination that matches expectations of heterogametic XY males and homogametic XX females (Uguru and Onovo 2011). Therefore, it remains unclear to what extent heteromorphic sex chromosomes with inhibited recombination have yet evolved in a dioecious plant such as T. occidentalis (Ming et al. 2011; Renner 2014) and recombination within the candidate sex-determining locus should accordingly not be ruled out (Bergero and Charlesworth 2009; Charlesworth 2016). Similarly, it remains to be clarified to what extent sex determination in T. occidentalis is exclusively under genetic control. Genes that control the expression of ethylene, the sex-determining hormone in Cucurbitaceae, could likely be involved in T. occidentalis, as they are in melon (Cucumis melo), watermelon (Citrulus lanatus) or cucumber (Cucumis sativus; Boualem et al. 2015; Grumet and Taft 2011; Li et al. 2019; Zhang et al. 2020) that collectively support a conserved pathway of sex expression among Cucurbitaceae. A possible influence of environmental factors cannot be excluded (Golenberg and West 2013; Renner 2014) and may have had an impact on those five T. occidentalis samples showing inconsistency of genotype and phenotype. Finally, monoecious life forms of fluted pumpkin have already been reported (Akoroda et al. 1990) and, although not recently confirmed (Fayeun et al. 2016a, b), it should be considered that unlinked and undetected sex-determining loci could be involved that could have led to the unexpected phenotypes of the five inconsistent fluted pumpkin individuals from this study.

Development of a sex determination assay

The two different assays described here to genetically establish the sex of T. occidentalis samples yielded similar results (94.3% for female and 95.7% for male plants) and appear equally time-consuming. Despite the fact that the restriction enzyme approach is slightly more laborious, it also leads to clearer banding patterns than the approach based on sex-specific primers (Fig. S4). Such molecular sex-determining assays are of great importance for farmers owning medium to large Telfairia plantations as to optimize the proportion and cultivation space devoted to female plants. Although sex-determining assays were designed based on Cross River, Ekiti, Oyo and Lagos landraces, assays were performed accurately in the landrace Anambra, suggesting transferability among landraces. Further phenotyping and genotyping analyses of other T. occidentalis landraces as well as the sister species, Telfairia pedata (Hook. F.) that is also a dioecious plant growing across eastern and south-eastern regions of Africa (Okoli 2007; Okoli and Mgbeogu 1983), would be valuable to foster the potential of underutilized Telfairia crops and the use of universal sex-determining tools for their plantlets.