LsMybW-encoding R2R3-MYB transcription factor is responsible for a shift from black to white in lettuce seed

Key message We identified LsMybW as the allele responsible for the shift in color from black to white seeds in wild ancestors of lettuce to modern cultivars. Abstract Successfully selected white seeds are a key agronomic trait for lettuce cultivation and breeding; however, the mechanism underlying the shift from black—in its wild ancestor—to white seeds remains uncertain. We aimed to identify the gene/s responsible for white seed trait in lettuce. White seeds accumulated less proanthocyanidins than black seeds, similar to the phenotype observed in Arabidopsis TT2 mutants. Genetic mapping of a candidate gene was performed with double-digest RAD sequencing using an F2 population derived from a cross between “ShinanoPower” (white) and “Escort” (black). The white seed trait was controlled by a single recessive locus (48.055–50.197 Mbp) in linkage group 7. Using five PCR-based markers and numerous cultivars, eight candidate genes were mapped in the locus. Only the LG7_v8_49.251Mbp_HinfI marker, employing a single-nucleotide mutation in the stop codon of Lsat_1_v5_gn_7_35020.1, was completely linked to seed color phenotype. In addition, the coding region sequences for other candidate genes were identical in the resequence analysis of “ShinanoPower” and “Escort.” Therefore, we proposed Lsat_1_v5_gn_7_35020.1 as the candidate gene and designated it as LsMybW (Lactuca sativa Myb White seeds), an ortholog encoding the R2R3-MYB transcription factor in Arabidopsis. When we validated the role of LsMybW through genome editing, LsMybW knockout mutants harboring an early termination codon showed a change in seed color from black to white. Therefore, LsMybW was the allele responsible for the shift in seed color. The development of a robust marker for marker-assisted selection and identification of the gene responsible for white seeds have implications for future breeding technology and physiological analysis. Supplementary Information The online version contains supplementary material available at 10.1007/s00299-023-03124-4.


Introduction
In lettuce (Lactuca sativa L.), the major seed colors are black and white.Black-colored seed is the wild-type trait observed in Lactuca serriola L., a wild lettuce species distributed worldwide.The white-colored seed trait was possibly discovered during the process of domestication.We infer that individual plants with extremely reduced pigments in the seed pericarp, where the pigments are localized in lettuce, were accidentally discovered (Thompson 1942).Black seeds are difficult to identify on the soil but white seeds could be easily identified on the soil (Fig. 1a, b).Lettuce seeds are small; therefore, the easy visibility that a white seed offers is an advantage in seeding and harvesting.White seeds are desirable for agricultural production.Lettuce seeds are sown near the soil surface as a standard practice (Woolley and Stoller 1978) because the germination of lettuce seeds is promoted by light radiation (Borthwick et al. 1952).Light transmission at 660 and 730 nm induces the germination of black and white seeds, respectively.The transmission spectra of black seeds show transmissions of less than 20% of the incident light, whereas that of white seeds indicate a transmission of more than 50% of the incident light (Widell and Vogelmann 1988).In addition, white seeds are more sensitive to temperature than black seeds, allowing white seeds to germinate with appropriate temperature even in darkness (Borthwick et al. 1952).In maize, it has been reported that Communicated by Hiroyasu Ebinuma.Kousuke Seki and Kenji Komatsu share first authorship.
Extended author information available on the last page of the article the germination rate of light-colored seeds is higher than that of dark-colored seeds under optimum temperature conditions and vice versa in high temperature (Deng et al. 2015).It is considered that darker colored seeds can adapt to poor environments due to the antioxidant capacity of their pigments (Slavin et al. 2009).However, this trait is a rather negative factor in artificially controlled moderate environments, as it leads to, e.g., slower initial imbibition rates during germination (Chachalis and Smith 2000).Therefore, white seeds are believed to be more advantageous and exhibit better agronomic performance and germination than black seeds.The white seed trait is common and is present in well-known cultivars, such as cv "New York."Several white seed varieties are available in the database; therefore, we believe that the varieties were bred through artificial selection (Table S1).Though the white seed trait is recessive (Thompson 1942;Ryder 1999;Wang et al. 2016), the data of seed lists of the Centre for Genetic Resources, the Netherlands (CGN: https:// www.wur.nl/ en/ Resea rch-Resul ts/ Statu tory-resea rch-tasks/ Centre-for-Genet ic-Resou rces-the-Nethe rlands-1.htm) and the Germplasm Resources Information Network (GRIN: https:// www.ars-grin.gov/) show that the number of white seed cultivars is significantly higher than that of black seed cultivars (Table 1).This fact implies that lettuce breeders around the world intentionally have introduced the trait of white seed into new breeding cultivars.The white seed is an important agricultural trait for lettuce breeders; however, the molecular mechanism underlying the shift from black to white remains incompletely understood (Thompson 1942;Waycott et al. 1999;Kwon et al. 2013).White seed trait is controlled by a recessive single gene (Waycott et al. 1999) located on LG7 (Kwon et al. 2013).We applied the double-digest RAD sequencing (ddRAD-seq) method to analyze the genetic details of the white seed.Further mapping was performed using 84 and 131 cultivars to narrow down the gene, and the only candidate gene identified was validated for involvement in seed color using a knockout mutant through genome editing.

Plant material
The crisphead-type lettuce cultivar, "ShinanoPower" was bred by the Nagano Vegetable and Ornamental Crops Experiment Station, and "Escort" was bred by Takii & Co., Ltd."ShinanoPower" and "Escort" were crossed to produce an F 1 , which was selfed.Approximately 96 F 2 individuals were investigated for seed color in a greenhouse.The oilseedtype lettuce cultivar "Oilseed" derived from upper Egypt was introduced from CGN; the original strain number is "CGN04769."

Analysis of the pigment in seed
White and black seeds were frozen in liquid nitrogen and powdered.

Genotyping using publicly available genome resequencing data
The publicly available L. sativa genome sequencing data were obtained from the NCBI SRA as shown in Table S2.
FASTQ files were imported into the CLC Genomics Workbench (QIAGEN, USA) for subsequent analysis.The trim sequence tool in the suite was used to filter out low-quality bases (< Q30), and only reads that showed a quality score of ≥ 30 were retained.Filtered sequence reads were mapped onto the L. sativa v8.0 genome (https:// phyto zome.jgi.doe.gov/ pz/ portal.html# !info? alias= Org_ Lsati va_ er) using the Map Reads Reference tool, and local realignment was performed using the Local Realignment tool.Based on the mapping results, the genomic polymorphisms associated with the white seed trait were identified.

Designing PCR-based markers and their amplification
Polymorphisms near the locus at 48-50 Mbp in LG7, including insertions, deletions, and single-nucleotide polymorphisms (SNPs), were evaluated as potential markers.The primer names were formatted as (linkage group) _ (genome version) _ (genome position) _ (restriction enzyme, in case of CAPS).Primers for locus amplification were designed using Primer3 (http:// bioin fo.ut.ee/ prime r3-0.4.0/), and KOD FX (TOYOBO, Japan) was used for amplification.PCR was performed using 0.5 μL of DNA template, 0.4 μL of each primer (50 μM), 2 μL of dNTP (2 mM), 5 μL of 2 × PCR Buffer, 0.2 μL of KOD FX (1 U/μL), and distilled water (dH 2 O) to a final reaction volume of 10 μL.The PCR conditions were as follows: 94 °C for 5 min, 30 cycles of 94 °C for 30 s and 62 °C for 30 s, followed by one cycle at 72 °C for 4 min.After amplification, electrophoresis was performed using 9 μL of the PCR products on a 2% agarose gel (Takara Bio, Japan) at 100 V.In case of CAPS, the PCR products were digested at 37 °C for 1 h in 20 µL total volume with 5 − 15 units of the appropriate restriction enzyme before electrophoresis.

Genome editing
The 20-nt gRNAs specific for the target gene, LsMybW were designed for the first exon (Fig. S1 and Table S3).Primers with the BbsI restriction site were annealed and ligated into the entry vector, 1480_MluI-1433_pUC19_AtU6oligo (Shimatani et al. 2017).The core partial fragments were excised from the entry vector using I-SceI and subcloned into the T-DNA region of the destination vector, 1432_ pZD_OsU3gYSA_HolgerCas9_NPTII (Shimatani et al. 2017).All vectors were constructed through standard cloning and verified through sequencing.Lettuce "Oil seed" was transformed with Agrobacterium harboring the destination vector.Regenerated plants were selected using kanamycin, and genome integration of T-DNA was confirmed through PCR using NPTII.To determine the genome-editing ability of LsMybW, the target sequence was amplified from gDNA through PCR and cloned into the standard sequencing vector, pSKII − .Eight clones of each T 0 strain were used in this study.

Analysis of pigment accumulation in seeds
The absorbances of the supernatants extracted from black and white seeds at 500 nm were 0.061 (SE = 0.015) and 0.038 (SE = 0.008), respectively.The proanthocyanidin content in the white seed was 0.62 times lower than that in the black seeds.Some phenylpropanoids such as chlorogenic acid were found in both seed colors; however, common anthocyanins and other flavonoids were not detected in the HPLC analysis (Figs. S2, S3).White seeds appear to have reduced function with regard to the accumulation of proanthocyanidins.

Inheritance of seed color
An F 2 population was derived from an initial cross between "ShinanoPower" (white seed) and "Escort" (black seed) to elucidate the inheritance of the white seed trait.The F 1 plants produced black seeds.Among the 96 F 2 individuals, 74 possessed black seeds and 22 possessed white seeds, fitting 3:1 ratio with the Chi-square test (p = 0.64).These results suggest that the white-seed trait is determined by a single recessive locus.

Double-digest RAD sequencing analysis of the F 2 population and linkage map development
The ddRAD-seq analysis was used to genotype the F 2 population for genetic mapping of the locus for the white seed trait derived from "ShinanoPower."The genomic DNA polymorphisms between the two parental lines, "Shinano-Power" and "Escort," were assessed through ddRAD-seq analysis using PacI and NlaIII restriction enzymes (Seki et al. 2020).Illumina HiSeq sequencing of the ddRADseq libraries produced 9,281,482 and 8,107,399 single reads (100 bp) for "ShinanoPower" and "Escort" plants, respectively.RAD-tags were extracted from the sequence reads of individual samples.In total, 346,396 and 302,738 RAD-tags with more than 2 read counts were obtained for the "ShinanoPower" and "Escort" samples, respectively.Comparing the RAD-tags of the 2 parental lines, 135,129 and 91,471 unique tags were identified as either "Shinano-Power"-or "Escort"-specific tags, respectively, whereas 211,267 RAD tags appeared in both samples.Read mapping was performed with the unique RAD tags of each parent against the reference lettuce genome sequence.A total of 2871 pairs of RAD tags (designated as biallelic tags) harboring SNPs or InDels from the two parental lines were described, and these biallelic tags were employed as co-dominant markers for further genetic mapping.By summarizing co-segregated biallelic tag loci, 1,038 loci were regarded as the co-dominant markers.The genotypes of these 1038 biallelic tagged loci were also determined through the ddRAD-seq analysis of 96 individuals in the F 2 population.Genotypes of the biallelic tag loci in the 96 F 2 individuals were determined based on the presence or absence of each allelic tag.After excluding loci with missing data, genotyping data from 856 biallelic tag loci of 96 F 2 individuals were used for linkage map construction (Fig. S4 and Table 2).Based on the grouping analysis, the marker loci were distributed into nine linkage groups.
Ordering the marker loci in each linkage group resulted in a linkage map comprising 988.8 cM (Fig. S4).Summary statistics for the linkage maps are presented in Table 2. Marker density ranged from 0.5 cM per marker (LG7) to 4.2 cM per marker (LG9).The number of markers ranged from 22 (LG9) to 177 (LG7).

Fine mapping of the white seed locus and candidate gene analysis
For genetic mapping of the locus for white seed trait, ddRAD-seq analysis was conducted for constructing a linkage map by RAD-R scripts using F 2 population.A single locus tightly linked to the white-seed trait was located in LG7 and flanked by two markers (LG7_v8_48.055Mbp and LG7_v8_49.864Mbp) based on the genotypes of the biallelic RAD-tags.The marker designated as LG7_v8_49.398Mbp exhibited complete co-segregation with the white seed trait within the F 2 population (Fig. 2).Moreover, fine mapping of the target locus was performed using 5 markers   S4) and 84 cultivars (Table S5).Forty-five cultivars had white seeds and the rest had black seeds.Only the LG7_v8_49.251Mbp_HinfImarker was associated with the white seed phenotype (Tables 3, S5).Based on the marker data, it was predicted that the responsible gene was located between 49.173 and 49.326 Mbp in LG7.Eight open reading frames (ORFs) were positioned in this region according to the annotated reference genome sequence of L. sativa V8 (Table 4).The sequences of these eight ORFs were compared between "ShinanoPower" and "Escort."There were no small InDels or nonsynonymous substitutions in these seven ORFs between the two parental lines; however, a singlenucleotide mutation in a stop codon was found in ORF 7, which is referred to as Lsat_1_v5_gn_7_35020.1.The allele of the white seed cultivar encoded an additional 78 bp at the 3′ end that was not present in the black seed allele (Figs. 3,  S5).Based on the analysis using publicly available resequencing data from 131 cultivars, only the stop codon polymorphism showed a complete correlation with seed color in 21 polymorphisms in the genomic region from 49.173 to 49.326 Mbp (Tables 5, S2, S6).Phylogenetic analysis revealed that Lsat_1_v5_gn_7_35020.1 is closely related to the TRANSPARENT TESTA 2 (TT2) gene encoding the R2R3-MYB transcription factor (Fig. 4), which is involved in the regulation of seed color in Arabidopsis thaliana (Nesi et al. 2001); we named it as LsMybW (Lactuca sativa Myb White seeds).R2R3-MYB forms the MYB-bHLH-WDR (MBW) ternary protein complex together with bHLH-type transcriptional regulators and the WD repeat protein (Lepiniec et al. 2006).The MBW complex regulates the transcription of gene subsets related to anthocyanin and proanthocyanidin synthesis, thereby modulating the pool size of these metabolites.In addition, the paralog with the greatest similarity, Lsat_1_v5_gn_5_135961.1, was related to the regulation of anthocyanin biosynthesis in the leaf of red leaf cultivars and was named as Red Lettuce Leaf 2 (RLL2) (Su et al. 2020).

Validation of LsMybW function through CRSPR/Cas9-based genome editing
To obtain knockout strains with a loss-of-function mutation in the LsMybW gene, gRNA was designed at the first exon to generate the early stop codon.CRISPR/Cas9 vector was introduced via Agrobacterium into the lettuce, "Oilseed," which normally produces black seeds.The transformed plants were selected for both antibiotic resistance and PCRpositive results for foreign genes.Nine acclimated T 0 individuals were used for the analysis of the target sequence.The resulting sequence variations of the eight clones allowed us to predict the genotype of the strains: monoallelic or biallelic, heterozygous or homozygous, in-frame or out-of-frame.Six strains produced white seeds in the T 0 phenotype (Table 6).Among these, three strains (4-1, 15-1, 19-2) could be biallelic homozygous mutants with a singlebase insertion.The other three strains (5-1, 30-4, 22-2) that produced white seeds were biallelic heterozygous mutants with insertions of one or two bases.All edits generated early stop codons and 13 (MGRSPCLFKDWSE*) or 12 (MGR-SPCLVQRLV*) short peptides.The three remaining strains (15-2, 25-4, 19-1) had black seeds, despite genome editing.This could be attributed to the presence of biallelic heterozygous mutants, including in-frame editing with a three-base deletion.These results confirmed that LsMybW controls the dominant traits of seed color.The inner parts when observed without the pericarp were brown in all genotypes (Fig. 5), suggesting that LsMybW is involved only in achene color.

Discussion
In this study, we succeeded in the genetic mapping and identification of the genes responsible for white seeds in lettuce.According to genetic mapping using ddRAD-seq and PCRbased markers, the locus of white seeds was located between 49.173 and 49.326 Mbp in LG7 (Fig. 2 and Table 3).Eight predicted genes were identified in this region (Table 4).Resequence analyses in "ShinanoPower" (white seed) and "Escort" (black seed) revealed that the candidate genes of the coding sequences were identical except for LsMybW, gene model name Lsat_1_v5_gn_7_35020.1 (Table 4).LsMybW has a single-nucleotide mutation in the stop codon of cultivars with white seeds, and the LG7_v8_49.251Mbp_HinfI marker, which employs this mutation, was completely linked to the white seed phenotype (Fig. 3 and Table 3).Analysis of the publicly available resequencing data revealed a full correlation between stop codon polymorphisms and seed color (Table S2).In addition, this marker position overlapped that of a previously reported locus for white seed color (Kwon et al. 2013;Simko et al. 2013).Pigment accumulation in plants is controlled by two gene subsets: early and late biosynthetic genes (LBGs) (Kubasek et al. 1992;Quattrocchio et al. 1993;Nesi et al. 2000).The transcription factor complex, consisting of AtTT2, AtTT8, and AtTTG1, controls the expression of LBGs (Gonzalez et al. 2008(Gonzalez et al. , 2016)), and AtTT2 mutants have an altered seed color (yellow) due to the absence of proanthocyanidin production in Arabidopsis (Shirley et al. 1995;Nesi et al. 2001).AtTT2 controls the expression of BAN (anthocyanidin reductase gene), which is involved in the divergence of proanthocyanidins and anthocyanins during flavonoid biosynthesis (Debeaujon et al. 2003).Therefore, as it occurs in Arabidopsis, proanthocyanidins may be responsible for seed color in lettuce rather than anthocyanins (Fig. S2).
LsMybW shared 28% of their identity with AtTT2 and highly conserved a DNA-binding domain-containing R2 and R3 repeats, consisting of '-W-(X19)-W-(X19)-W-' and '-F/I-(X18)-W-(X18)-W-', respectively.Therefore, LsMybW is a biologically plausible candidate gene.Genome editing of the target LsMybW showed that knockout mutants harboring an early termination codon produced white seeds (Table 6,   49,216,366 49,221,539 49,240,810 49,252,085 49,253,112 49,279,929 49,286,547 49,299,431 49,325,952 49,325 ,326,246 49,326,534 49,326,551 49,326,571 49,326,672 49,326,707 49,326,711 49,326,749 49,326,756 LG7_ ).We infer that the orthologous proteins in lettuce probably have a conserved function; the mutation of the stop codon of LsMybW in white seeds causes a significant conformational change and interferes in complex formation and other interactions.Dominant-negative effects of MYB due to deletions in the C-terminal end have been reported in Arabidopsis (Velten et al. 2010).Within the amino acid sequence of an additional 26 residues at the C-terminus, we could not find any typical repressor motifs for MYB, such as the ERF-associated amphiphilic repression (LxLxL or DLNxxP), Sensitive to ABA and Drought 2 protein interact motif (GY/FDFLGL), or TLLLFR (Wu et al. 2022).Analyzing additional sequences is worth exploring, including the possibility of identifying novel inhibitory motifs.The white seeds exhibited reduced accumulation of proanthocyanidins compared to the black seeds based on the vanillin-sulfuric acid assay, which is in agreement with the conclusion that the white seed cultivars have a reduced-function mutation in LsMybW that controls the expression of LBGs in lettuce seeds.In conclusion, LsMybW is the allele responsible for the shift in seed color from black to white.
During the artificial crossing of lettuce flowers, the breeder must remove the maternal parent pollen from the flower.However, it is impossible to completely remove pollen from lettuce flowers; therefore, seeds of both the selfed progeny and the F 1 hybrid are produced unintentionally owing to a compound autogamous floral structure (Simko et al. 2011).Therefore, breeders would like to utilize the Fig. 4 Phylogenetic analysis of the MYB domain-containing proteins in L. sativa.The evolutionary tree was built with deduced amino acid sequence from orthologs encoding MYB transcription factor in let-tuce and Arabidopsis.The red font indicates the candidate gene controlling seed color in lettuce inheritance patterns of easily recognizable traits to distinguish F 1 hybrids from selfed plants in the following progeny.Following a cross between white seed pure line (♀) and black seed pure line (♂), selfed progenies produce the next generation of white seed and F 1 hybrids produce the F 2 generation of black seed because of the dominant trait (Thompson 1942;Ryder 1999).Therefore, seed color can be effectively used to distinguish between selfed and hybrid plants (Thompson 1942).The marker used to distinguish between F 1 hybrids and selfed plants in populations derived from the white seed × black seed cross could also contribute to validating the phenotype of the F 1 generation.The verification of inconspicuous traits that require bioassays, such as disease resistance, has been difficult in the F 1 generation.Male sterility makes it possible to produce only F 1 hybrid seeds (Hayashi et al. 2011;Seki 2022); however, it has not been widely used because of limited cross combinations.The inheritable characteristics of the F 1 generation can be examined using a bioassay with only the seeds of the hetero genotype.From the F 2 generation onward, it was possible to intentionally select the seed color using the marker.These approaches are valuable for the development of breeding methods that accelerate the development of lettuce cultivars.Therefore, LG7_v8_49.251Mbp_HinfImarker-targeted LsMybW could be used to distinguish almost all white seeds of lettuce worldwide and could be applied to significantly enhance lettuce breeding programs (Tables S1, S2, S4).
Lettuce was first domesticated near the Caucasus after the loss of seed-shattering by spontaneous mutation (Wei et al. 2021).Therefore, cultivated lettuce has a non-seedshattering characteristic owing to the same qSHT locus.Because the domestication time is estimated to be around 4000 BC, the change in seed color is even later.Considering that lettuce was depicted on wall paintings of Egyptian tombs around 2500 BC as one of the major vegetable crops (De Vries 1997), it is reasonable to assume that the change in seed color occurred before the global spread of lettuce seeds (Tables S2, S5).The discovery of white seeds, which have been used as an important agronomic trait for thousands of years, was indeed a great achievement.

Conclusion
The development of a robust marker for marker-assisted selection and identification of the gene responsible for white seeds has implications for lettuce breeding and agricultural aspects regarding seed color.This study not only identified a gene responsible for the white seed phenotype, but also revealed an important gene regulating a key agronomic trait for lettuce cultivation and breeding.These findings could be useful for future lettuce breeding endeavors.

Fig. 1
Fig. 1 Comparison of black and white seeds of lettuce.a Black seeds "Escort" and white seeds "ShinanoPower" shown on the soil surface.b Black and white seeds shown on a background of intermediate color

Fig. 2
Fig. 2 The mapped location of the white seed locus on LG7.Genetic distances (cM) are shown between the markers."White seed" indicates the position of the responsible gene for the trait of white seed.The black bar indicates the white seed locus genomic position in LG7 (bp) 49 Fig.5).We infer that the orthologous proteins in lettuce probably have a conserved function; the mutation of the stop codon of LsMybW in white seeds causes a significant conformational change and interferes in complex formation and other interactions.Dominant-negative effects of MYB due to deletions in the C-terminal end have been reported in Arabidopsis(Velten et al. 2010).Within the amino acid sequence of an additional 26 residues at the C-terminus, we could not find any typical repressor motifs for MYB, such as the ERF-associated amphiphilic repression (LxLxL or DLNxxP), Sensitive to ABA and Drought 2 protein interact motif (GY/FDFLGL), or TLLLFR(Wu et al. 2022).Analyzing additional sequences is worth exploring, including the possibility of identifying novel inhibitory motifs.The white

Table 1
Number of black seed and white seed cultivars of lettuce germplasm

Resequence analysis of parent genomes and sequencing of a candidate locus
(Robinson et al. 2011e sorted BAM files were visualized using the IGV software(Robinson et al. 2011).

Table 2
Summary of integrated lettuce linkage groups

Table 3
Seed color and genotype of multiple markers in 84 lettuce cultivars

Table 4
Candidate genes in the genomic region between 49.173 and 49.326 Mbp in LG7

Table 5
Seed color and genotype of multiple markers in 131 lettuce cultivars

Table 6
Seed color and sequence variation of genomeedited LsMybW gene in T 0 plants Red characters indicate InDel sequences by genome-editing a Percent of sequenced clones to total (n = 7-8)