A Single Nucleotide Polymorphism in an R2R3 MYB Transcription Factor Triggers ms6 (Ames1) Male Sterility in Soybean

Soybean [Glycine max (L.) Merr.] is an important crop providing vegetable oils and proteins. Increasing demand on soy products heightens the urgency of soybean yield improvement. Hybrid breeding with male sterility system is an effective method to improve crop production. Cloning of genic male sterile (GMS) gene combined with biotechnology method can contribute to constructing GMS-based hybrid Seed Production Technology (SPT) to promote soybean performance and yield. In this research, we identied a soybean GMS locus, GmMS6, by combining bulked segregant analysis (BSA)-sequencing and map-based cloning technology. GmMS6 encodes an R2R3 MYB transcription factor, whose mutant allele in ms6 (Ames1) harbors a single nucleotide polymorphism (SNP) substitution, leading to the 76 th Leucine to Histidine change in the DNA binding domain. Phylogenetic analysis demonstrates GmMS6 is a homolog of Tapetal Development and Function 1 (TDF1)/MYB35 that is an anther development key factor co-evolved with angiosperm. It has a recently duplicated homolog GmMS6LIKE (GmMS6L), both of which can rescue the male fertility of Arabidopsis homologous mutant attdf1 while GmMS6 L76H cannot, denoting that both proteins are functional and L76 is a critical residue for TDF1’s function. However, compared to anther specic expressed GmMS6, GmMS6L is constitutively expressed at a very low level, explaining deciency of GmMS6 alone causes pollen abortion. Moreover, the expression levels of major regulatory and structural genes for anther development are signicantly decreased in ms6, unveiling that GmMS6 is a core transcription factor regulating soybean anther development.


Introduction
Soybean (Glycine max) is the major crop to provide plant proteins and oils in food supply, but has a relatively low yield compared to other major crops. Hybrid breeding technology that signi cantly improves crop yield has a great potential in soybean seed production (Kim and Zhang 2018;Palmer et al. 2001).
HybSoy1, the rst o cially approved and commercially applicable hybrid soybean, could increase the yield by 20.8% (Zhao et al. 2004). In hybrid breeding system, male sterile lines are indispensable for avoiding the time-consuming and tedious arti cial emasculation process.
The general hybrid breeding systems used today are three-line and two-line systems derived from cytoplasmic male sterility (CMS) and environmental sensitive genic male sterility (EGMS), respectively (Kim and Zhang 2018). However, both systems have some limitations to hinder their broad applications.
For example, it is di cult for CMS to nd suitable restorer lines. Some CMS cytoplasm types even have negative effects on crop performance such as leading to disease susceptibility (Levings 1990). As to EGMS, the sterility is relatively unstable for its high reliance on the environmental conditions, which severely affects hybrid purity sometimes . Although stable genic male sterility (GMS) can overcome these defects, lack of maintainer line has restricted its usage in hybrid production for a long time until the seed production technology (SPT) is developed recently (Perez-Prat and van Lookeren Campagne 2002; Weber et al. 2009). The main idea of this technology is to create a transgenic-based maintainer line in a recessive sporophytic male sterile mutant (ms) background by introducing a gene cluster besides the resistance gene for screening transgenic lines, and the transgenic component in SPT maintainer line is kept in heterozygote status. The gene cluster is composed by at least three fundamental genes (Wu et al. 2016). The rst one is the wild-type MS allele controlled by its native promoter for rescuing ms's detrimental effects on anther sporophytic cells. The 2nd one is a male gametophyte-killer gene, for killing the microspores carrying the transgenic component, so that only nontransgenic ms pollens are viable for hybrid production (Chang et al. 2016; Song et al. 2020). The 3rd one is a phenotypic reporter gene for monitoring the purity of obtained ms seeds, such as uorescence gene expressed in aleurone layer for monocots rice and maize (Chang et al. 2016;Zhang et al. 2018b) or anthocyanin synthesis gene expressed in early seedling stage for dicots tomato (Du et al. 2020).
SPT system broadens the germplasm choices of parental lines to breed hybrids of superior heterosis, reduces the risk caused by weather changes, and is regarded as the third generation of hybrid technology.
However, it has not been applied in soybean as lack of cloned GMS gene. So far, 13 non-allelic genic loci Mutations at these loci all confer recessive sporophytic male-sterile phenotype. Among these mutants, ms6 displays a stable non-pollen phenotype, making it an ideal material for developing soybean SPT system. There are two independent and spontaneous ms6 mutants maintained as heterozygotes in Soybean Genetic Type Collection as T295H (ms6 (Ames1)/+) (Skorupska and Palmer 1989) and T354H (ms6 (Ames2)/+) (Ilarslan et al. 1999), respectively. Comparative microscopic study of the anther development of fertile and sterile plants from T354H has showed that the cytological abnormalities of ms6 anther rstly appear at microspore mother cell (MMC) stage on tapetal and parietal layers, which possess more vacuoles in cells compared to fertile anther (Ilarslan et al. 1999). Then, tapetum in ms6 anther is severely degenerated, forming condensed tissues from meiosis to late tetrad stage, and completely degraded in the late microspore stage when tapetum in fertile anther just starts enlargement and vacuolation (Ilarslan et al. 1999). By contrast, the parietal layer in ms6 anther keeps enlarging during the later development stages and shows completely vacuolated at the end, while it remains in a rather consistent shape in fertile anther (Ilarslan et al. 1999). The reproductive cells in ms6 anther show aberrations since telophase II. Meiocytes fail cytokinesis and form partially separated microspores, which would be completely collapsed in the late microspore stage when the fertile microspores are processing the rst mitosis (Ilarslan et al. 1999). Similar phenomenon was observed during the microsporogenesis in ms6 mutant from T295H (ms6 (Ames1)/+) (Skorupska and Palmer 1989). Multi-nucleic microspores are generated after meiosis, and they are completely crushed later so that no pollen is produced in the sterile plants (Skorupska and Palmer 1989).
The ms6 locus has been mapped into a 3.7 Mb region on chromosome 13 (Chr13) between two SSR markers Satt030 and Satt149 (Yang et al. 2014), closely linked to a ower-color gene W1 (Skorupska and Palmer 1989;Lewers and Palmer 1993;Ilarslan et al. 1999). In this study, we further narrowed down the genetic region of ms6 and identi ed the mutation corresponding to ms6 via BSA-sequencing, map based cloning and complementation experiments. It is a missense mutation in the gene, named as GmMS6, which encodes a homolog of TDF1, an R2R3 MYB transcription factor critical for anther development in

Plant materials and growth conditions
The ms6 mutant used in this study is derived from T295H (PI 533601, ms6 (Ames1)/+), which was achieved from the collection of National Plant Germplasm System (NPGS) in United States. Allele ms6 (Ames1) is referred as ms6 hereafter. The BC 5 F 2 segregating population was developed for narrowing down the genetic region of ms6, by using T295H as ms6 donor and a wild-type (WT) cultivar 'JiuB', from Jilin, China, as a recurrent male parent. The mapping population was planted in the farm of Fanjiatun, Jilin in summer. For cytological and morphological studies, soybean materials were grown in pots (two plants per pot) outdoors in summer and in the greenhouse in winter at 28°C with the photoperiod of 16 h light /8 h dark, in Xi'an, China.
Arabidopsis and Nicotiana benthamiana plants were grown in soil in the greenhouse at 22°C with the photoperiod of 16 h light /8 h dark, in Xi'an, China. The Arabidopsis germplasms used in this study were WT Columbia (Col), heterozygous attdf1 (in Col background), transgenic lines in homozygous attdf1 background carrying desired transgenes for the complementary experiment. As noted, attdf1 mutant used here was obtained by the introgression of attdf1 locus from Landsberg erecta-0 into Col background through several rounds of backcrossing.

Morphological and cytological analysis
For general morphological observation of ms6 anthers, soybean owers one day before blooming were collected from fertile and sterile descendants of T295H. Stamens were dissected and imaged under the stereo microscope Nikon SMZ25. Pollens were squeezed out, stained with 1% I 2 -KI solution, and photographed under light microscope Leica DM2500. For Arabidopsis, mature anthers before anthesis collected from WT, attdf1 mutant, and various transgenic lines were stained with Alexander staining buffer (Peterson et al. 2010), and photographed under Leica DM2500.
For cytological analysis, owers at late tetrad and late microspore stages were collected from fertile and sterile descendants of T295H, and immediately immersed into the FAA xation solution. After dehydration, ower samples were imbedded into resin with Technovit H7100-GMA kit (Heraeus Kulzer, Germany), following manufactory instruction, and sliced into 2-µm transverse sections with Leica RM2265. Sections on slides were stained with 0.5% toluidine blue staining buffer, and imaged under Leica DM2500 after sealed.
DNA extraction, BSA-sequencing (BSA-seq), and ne mapping Genomic DNA samples were extracted from the young leaves with the Nuclean Plant Genomic DNA Kit (CWBIO, China) for regular PCR analysis and BSA-seq experiment. For BSA-seq analysis, two bulks were constructed from the BC 5 F 2 mapping population. One was composed of 20 homozygous WT plants and the other 20 homozygous ms6 plants. Genomic DNA isolated from each bulk was fractioned to build a 350-bp pair-end sequencing library, and sequenced on Illumina Hiseq PE150 platform at Novogene Company (China). SNP (single nucleotide polymorphisms) and InDel (insertions-deletions) of each bulk were annotated using Wm82.a2.v1 genome as the reference. The SNP-index of each bulk was calculated as previously described (Takagi et al. 2013). The SNP-index differences between two bulks, ∆(SNPindex), were calculated and plotted against their genomic positions.
Polymorphic SSR markers in the genetic window of ms6 locus identi ed by BSA-seq were further used to screen the individual ms6 plants in the BC 5 F 2 population via canonical PCR. The fragments ampli ed with SSR primers were resolved on 8% polyacrylamide gel in 1xTAE buffer by electrophoresis, and visualized by silver staining method (Bassam et al. 1991). The genetic map was constructed from the data with MAPMAKER 3.0 (Lander et al. 1987).

Bioinformatics and phylogenetic analysis
The conserved structural domain in GmMS6 was predicted by SMART (http://smart.embl-heidelberg.de/), showing it contained a typical R2R3 MYB DNA-binding domain. The conservancy in R2 motif was further analyzed by aligning the DNA-binding domain of GmMS6/GmTDF1a to the ones of well-characterized MYB proteins in Arabidopsis, and the results were drawn with Bioedit software (Tom Hall). By using BLASTP, the homologs of GmMS6 in Glycine max (GmMS6L) and Arabidopsis (AtTDF1) were identi ed in NCBI Database. The sequence conservancy of these three proteins was assessed by multiple sequence alignment with Bioedit software (Tom Hall).
For phylogenetic analysis, we used the protein sequences of GmMS6, GmMS6L, and homologs of TDF1 in 15 representative species from different land plant evolutionary lineages, including ve dicots (G. max, Medicago truncatula, Vitis vinifera, Arabidopsis thaliana and Solanum lycopersicum), six monocots (Oryza sativa, Zea mays, Sorghum bicolor, Ananas comosus, Musa acuminate and Zostera marina), one basal angiosperm (Amborella trichopoda), two gymnosperm species (Ginkgo biloba and Pinus taeda), one lycophyte (Selaginella moellendor i), and one moss (Physcomitrella patens). The sequences of TDF1 were retrieved from NCBI with BLASTP by using AtTDF1 (NP 189488.1) and OsTDF1 (XP 015630216.1) as query peptides, and the one with highest bit-score in each species was selected.
AtMYB80 (NP 200422.1) and OsMYB80 (XP 015635420.1) were used as outgroup sequences. All the protein sequences were subject to multiple sequence alignment analysis with the ClustalW2 algorithm, and the phylogenetic tree was constructed by neighboring-joint (NJ) method with bootstrap resampling (1000 replicates) by using MEGA 6 (Tamura et al. 2013).
RNA extraction, RT-PCR, and qRT-PCR analysis Total RNA was extracted from desired tissues with RNAprep Pure Plant Kit (Tiangen, China). After removing genomic DNA contamination with TURBO DNA-free™ Kit (Invitrogen, United States), 1 µg of RNA was reversely transcribed into the cDNA by PrimeScript™ II 1st Strand cDNA Synthesis Kit (Takara, Japan). Sequential PCR was conducted with rTaq DNA polymerase (Takara, Japan) for analyzing expression level and with 2x PrimeSTAR Max Premix (Takara, Japan) for cloning purpose following manufactory instructions. For quantitative RT-PCR, cDNA samples after 1:10 dilution were used as templates. The PCR reactions were conducted as previously described (Zhang et al. 2018a). For assessing the expression patterns of GmMS6 and GmMS6L, relative gene expression levels were calculated by using the 2 −∆Ct method. For assessing the differential expression of speci c genes in WT and ms6 young owers, fold changes in gene expression were calculated by using the 2 −∆∆Ct method. All data were normalized against the expression level of GmActin11 (Glyma.18g290800). For each sample, three replicates were performed.

Subcellular localization analysis
The full length coding sequence (CDS) without stop codon of GmMS6, GmMS6L, and mutant GmMS6 L76H were in frame cloned into the XbaI site upstream of the GFP gene in the binary vector of pLM-35S-GFP to create pLM-35S-GmMS6-GFP, pLM-35S-GmMS6L-GFP, and pLM-35S-GmMS6 L76H -GFP. Vectors were transformed into Agrobacterium tumefaciens strain GV3101 and in ltrated into 4-week-old N. benthamiana leaves. GFP signals were observed and imaged at 48 hours post in ltration under the Olympus Fluoview FV1000 confocal laser scanning microscope (Olympus, Japan).

Transactivation activity assay in Yeast
Full-length CDSs corresponding to GmMS6, GmMS6 L76H and GmMS6 DBD (1-191 aa of GmMS6, DNA binding domain) were in frame cloned into the pGBKT7 vectors downstream of the CDS of GAL4-BD, respectively. The obtained vectors of pGBKT7-GmMS6, pGBKT7-GmMS6 L76H , and pGBKT7-GmMS6 DBD were subsequently transformed into Saccharomyces cerevisiae stain AH109 via one-step transformation method (Chen et al. 1992). After selected on synthetic dropout medium lack of trypsin (SD/-Trp), the positive colonies were diluted into the same concentration with autoclaved ddH 2 O. Then, 5 µL of cell suspensions were placed on selective medium SD/-Trp/-His, and grown for 3-4 days under 30°C to evaluate the activation activities of target proteins.

Result
Phenotypic characterization of ms6 mutant The offspring of heterozygous ms6 plants (T295H) were planted in a greenhouse condition. During vegetative stage, all plants grew well just like wild types (WT), while in reproductive stage about a quarter of plants (ms6) were male-sterile and unable to develop pods after blooming (Fig. 1a). Compared to the anthers in WT plants, the anthers of ms6 plants were more whitish and shrinking (Fig. 1b). Pollen grains released from WT anthers were round and turned dark blue after stained by I 2 -KI solution (Fig. 1c), but no pollen grains were produced in ms6 sterile anthers (Fig. 1d). These results are consistent with the previous report (Skorupska and Palmer 1989).
WT and ms6 owers at late tetrad and late microspore stages were cross-sectioned and observed under light microscopy. At late tetrad stage, WT anther wall was composed of 5-layers, including epidermis, endothecium, middle layer, parietal layer, and tapetum from outside to inside, in which the cytoplasm of tapetum cells is highly condensed; meanwhile, callose surrounding the tetrads in locule appeared to start degeneration (Fig. 1e). Anthers at same stage in Arabidopsis and rice show similar cytological features except that their anther walls lack of the parietal layer (Sanders et al. 1999;Zhang et al. 2011). On the other hand, ms6 anther had radically enlarged and highly vacuolated parietal and tapetal layers; in the locule, callose encompassed partially or non-separated irregular microspores with multiple nuclei, indicating that cytokinesis II of meiocytes was abnormal (Fig. 1f). This is similar to the phenotype of ms6 (Ames2) (T354H), but tapetal layer of ms6 (Ames2) was degenerated more rapidly and was almost completely degraded at this stage (Ilarslan et al. 1999). By the time of early pollen stage, enlarged pollens with thick walls were observed in the locule of WT anther, and the anther wall was composed by an epidermis, an enlarged endothecium, and a narrow parietal layer with attachment of remnant tissue from degraded tapetum cells (Fig. 1g). Comparatively, in ms6 anther, tapetum cells were completely dissolved and pollens were crushed, while parietal layers were abnormally vacuolated and swollen, similar to the situation reported in ms6 (Ames2) (T354H) (Fig. 1h; Ilarslan et al. 1999 A SNP mutation of Glyma.13g066600, an R2R3-MYB transcription factor encoded gene, is likely responsible for the male sterility in ms6 The ms6 loci was mapped previously within a 3.72 Mb region on Chr13 between two SSR markers Satt149 (Chr13:13,134,055 bp) and Satt030 (Chr13:16,855,019 bp) (Yang et al. 2014). To narrow down the region, WT and ms6 bulks were constructed from a BC 5 F 2 mapping population derived from T295H and 'JiuB', and subject to BSA-seq analysis. Plotting the ∆SNP-index values between two bulks against their genomic positions showed that ms6 was associated with a 1.5 Mb region (15,853,267 − 17,349,424 bp) on Chr13 (Fig. 2a), consistent to the previously reported interval (Yang et al. 2014). In addition, 249 variations between two bulks were identi ed in this region, including 214 SNP and 35 InDel.
To verify the mutation, a 126-bp region covering the SNP site was ampli ed by PCR from homozygous WT (+/+), heterozygotes (+/ms6), and homozygous ms6 (ms6/ms6), and subsequently digested with MseI. As expected, the amplicon of WT was cleaved to a 99-bp band and a 27-bp band that was invisible on the agarose gel. Comparatively, the amplicon from ms6 plants could not be digested at all, while about half of the amplicons from heterozygotes could be cut (Fig. 2e), evidencing the mutation of T to A at Chr13:16,641,429 bp in ms6. Moreover, the 126-bp segment could be used as a CAPS marker to differentiate the plant genotype at ms6 locus at any stage.
Glyma.13G066600 encodes a typical R2R3-MYB transcription factor with two MYB motifs close to the N terminus served as DNA binding domain (Fig. 2d). This protein showed a strong homology to TDF1, amino acid sequence exhibiting 48% identity and 59% similarity to AtTDF1 in Arabidopsis (Fig. S2). TDF1 is known as a key transcription factor in regulating tapetum development and function. Null mutant of TDF1 in Arabidopsis and rice both promoted vacuolization in tapetal cells and suppressed the degradation of the callose surrounding the tetrads, resulting in squeezed microspores and no pollen formation (Zhu et al. 2008; Cai et al. 2015), which was similar to ms6. The SNP in Glyma.13g066600 CDS in ms6 leads to the amino acid substitution of leucine to histidine at the residue 76 (L76H) (Fig. 2d), which is a well-conserved residue in the R2 MYB motif (Fig. 2f). These results suggested that mutation at Glyma.13g066600 was responsible for the male sterility of soybean ms6, and therefore its wild-type allele was termed as GmMS6.

GmMS6 is a homolog of TDF1 that is only present in angiosperm
Blast search showed that GmMS6 had a homolog GmMS6L with 92% amino acid sequence identity, encoded by Glyma.19g017900 (Supplementary Fig. S2). Similar to AtTDF1, GmMS6L doesn't have the N30 extension in the sequence (Supplementary Fig. S2). To further con rm the relationship between GmMS6 and TDF1, we conducted a phylogenetic analysis to GmMS6, GmMS6L, and the homologs of AtTDF1 and OsTDF1 in different land plant evolutionary lineages. AtMYB80 and OsMYB80 were used as outgroup sequences for MYB80 is the closest homolog of TDF1, also known as MYB35 (Dubos et al. 2010).
The result showed that TDF1 homologs in angiosperm were clustered into a monophyletic group with two branches. All the dicots were grouped in a branch with the basal angiosperm species A. trichopoda, while all the monocots were grouped in the other branch with the basal or near basal monocot species A. comosus and Z. marina (Fig. 3). Comparatively, the TDF1 homologs with highest bit-score in lycophyte S. moellendor i and moss P. patens were clustered with MYB80 and those in gymnosperm species, G. biloba and P. taeda, had even a further evolutionary relationship. These data suggested that TDF1 is only present in angiosperm and has diverged at a very early stage in angiosperm evolution before monocots and dicots were differentiated.
In the phylogenetic tree, GmMS6 and GmMS6L were positioned together in the TDF1 clade, evidencing that they were homologs of TDF1 and evolved from a recent duplication (Fig. 3). It is interesting to know whether L76H in GmMS6 (GmMS6 L76H ) could lead to male sterility, and if so, why GmMS6L fails to compensate the GmMS6's function in ms6 mutant.
GmMS6 but not GmMS6 L76H could complement the attdf1 male sterile phenotype in Arabidopsis The function TDF1 was likely conserved in all the angiosperm species. Firstly, phylogenetic results showed that TDF1 in angiosperm were clustered into a monophyletic group (Fig. 3). Secondly, depletion of TDF1 in Arabidopsis and rice caused similar detrimental effects on tapetum development as lack of GmMS6 did (Zhu et al. 2008;Cai et al. 2015). Therefore, we speculated that GmMS6 could substitute the role of AtTDF1 in Arabidopsis although GmMS6 is 30 amino acids longer at the N terminus, while GmMS6 L76H could not.
GmMS6/GmTDF1a is the major functional TDF1 in soybean Complementation assay above exposed that GmMS6 and GmMS6L were both functional proteins, which rose up the question why ms6 would exhibit male sterility when there is another TDF1 coding gene in the genome. To answer that, we analyzed the expression patterns of GmMS6 and GmMS6L by qRT-PCR in roots, stems, leaves, young owers, siliques, and immature seeds. GmMS6 displayed a typical tissuespeci c expression pattern, with a much higher expression level in young owers, which is about 6-fold higher than the 2nd highest level shown in leaves (Fig. 5). We further analyzed the expression level of GmMS6 in petals, sepals, and pistils, and found that GmMS6 is barely expressed in these oral parts (Fig. 5). Therefore, the high expression level detected in young owers should be contributed by the gene expression in anthers, demonstrating that GmMS6 is an anther-speci c gene, similar to the TDF1 proteins in Arabidopsis and rice (Zhu et al. 2008;Cai et al. 2015). On the contrary, GmMS6L was expressed at a super low level in all examined tissues, indicating it is likely in the process of pseudolization (Fig. 5). Different expression patterns of GmMS6 and GmMS6L illustrated that GmMS6 was the major functional TDF1 in soybean anther development, and explained why mutation at ms6 locus would lead to male sterility.

L76H does not alter the subcellular localization or transactivation activity of GmMS6
The subcellular localization of GmMS6, GmMS6 L76H and GmMS6L were analyzed by transiently expressing their GFP fusion proteins driven by 35S promoter in N. benthamiana leaves. Free GFP was also expressed as a control, which showed signals all over the cells (Fig. 4a). Comparatively, the uorescence from GmMS6-GFP and GmMS6L-GFP were restricted in nucleus, the general subcellular distribution of transcription factors (Fig. 4a). Additionally, GmMS6 L76H -GFP was also present in nucleus, showing that L76H would not vary the subcellular localization of GmMS6.
TDF1 is a transcriptional activator in Arabidopsis and rice, therefore, GmMS6 that could compensate AtTDF1's function should possess transcription activation activity as well. We then performed a transactivation activity test in yeast AH109 strain (Fig. 4b). Yeast clones expressing GAL4 DNA binding domain (BD) could only grow on SD medium lack of Trp (SD/-Trp) but not the selective medium (SD/-Trp-His) due to no transactivation activity in BD region. Similar phenomenon was observed for yeast strain expressing BD fused GmMS6 DBD (BD-GmMS6 DBD ), which was a truncated form of GmMS6 only containing the N-terminal 191 residues (the DNA binding domain, 39-146 aa). In contrast, yeast clones expressing BD-GmMS6, BD-GmMS6L and BD-GmMS6 L76H could grow well on both SD/-Trp and SD/-Trp-His medium, showing that these three proteins all possessed transactivation activity. Therefore, L76H doesn't affect the transactivation activity of GmMS6. As L76 is a conserved residue in R2 motif of the DNA binding domain (Fig. 2d and 2f) and L76H has no effects on protein's subcellular location or the transactivation activity, we suspected that L76H mutation likely disrupted GmMS6's function by altering its DNA binding capacity. . AtTDF1p-driven GmMS6/GmTDF1a was able to recover the fertility of attdf1 mutant like AtTDF1p-driven OsTDF1 did (Fig. 4d, 4k; Cai et al. 2015). These showed that TDF1's function was conserved in Arabidopsis, rice, and soybean, and implied that DYT1-TDF1-AMS-MYB80/MYB103/MS188-MS1 genetic pathway was likely present in soybean as well. Therefore, we performed a homology search of these transcription factors in soybean genome, and found that the whole pathway was indeed present in soybean. Moreover, all the members in this cascade had multiple paralogs, particularly, four for DYT1 and two for the others (Fig. 7a). Their expression levels in WT and ms6 young owers were assessed by qRT-PCR analysis. As a result, the expression level of GmDYT1(Glyma.13g250200) increased ~ 40% in ms6 (Fig. 7b), which is consistent to that TDF1 negatively feedback-regulates the expression of DYT1 in Arabidopsis (Cai et al. 2015). However, the expression levels of the other three GmDYT1s were not changed signi cantly in ms6, indicating the functions of DYT1s are diverged. Similarity, the expressions of two GmTDF1 were not affected in ms6 (Fig. 7b), suggesting that GmTDF1a is not involved in its own expression. On the other hand, the expressions of genes encoding the downstream transcription factors, like AMS, MYB80/MYB103/MS188, and MS1, were mostly downregulated in ms6 compared to WT (Fig. 7b), which is similar to the reports in attdf1 and ostdf1 (Zhu et al. 2008;Cai et al. 2015). One exception was Glyma.01g047400, encoding a MS1 homolog, which expressed similarly in WT and ms6 (Fig. 7b). This suggests that the expression of Glyma.01g047400 is not controlled through the DYT1-TDF1-AMS-MYB80/MYB103/MS188-MS1 genetic pathway and implies that Glyma.01g047400 is functionally diverged from its paralog Glyma.02g107600 or even might lost the function in anther development. Noticeably, Glyma.02g107600 is located at Gm02:10270908...10267934 in the region downstream of Satt157 (Gm02:9240028), which is rich of fertility controlling candidate genes, including ms3, msMOS, and female-partial sterile-1 (Fsp1) (Cervantes-Martinez et al. 2009).
Additionally, we examined the expressions of two enzyme coding genes that are regulated by this transcription factor cascade and related to microspores releasing and pollen wall formation, that is, A6 and MALE STERILE 2 (MS2) (Zhang et al. 2007;Zhu et al. 2008;Wang et al. 2018). A6 protein is proposed as a member of callase complex required to degrade the callose encompassing tetrads, as its protein sequence is highly similar to β-1,3-glucanase and its gene expression is correlated to callase synthesis temporally and spatially (Hird et al. 1993). The results showed that A6 gene only had one copy in soybean genome (Fig. 7a) and was barely expressed in ms6 (Fig. 7b), explaining why the callose surrounding tetrads was not dissolved in ms6 mutant. MS2 encodes a fatty acyl reductase catalyzing the palmitoyl acyl-carrier protein into a fatty alcohol, a precursor of sporopollenin (Aarts et al. 1997;). These precursors are subsequently transported from tapetum cells into locule and deposited onto the microspore surface to produce the sexine layer (Wang et al. 2018). MS2 has two paralogs in soybean genome (Fig. 7a), both downregulated drastically in ms6 (Fig. 7b). These results showed that the DYT1-TDF1-AMS-MYB80/MYB103/MS188-MS1 genetic pathway is present and functions conservatively in soybean. . Amongst these mutants, two ms6 mutants identi ed decades ago exhibit no-pollen phenotypes (Fig. 1c-d; Skorupska and Palmer 1989;Ilarslan et al. 1999), which makes them ideal genetic materials for soybean improvement by facilitating the canonical recurrent selection (Lewers et al. 1996) or the novel GMS-based hybrid-seed production technology (SPT) (Perez-Prat and van Lookeren Campagne 2002; Weber et al. 2009). Identifying the ms6 gene is helpful to its application in recurrent selection and critical to its application in SPT development.

Discussion
In the present study, we revealed that ms6 is correlated to the mutation at the Glyma.13g066600 locus (GmMS6), which encodes a TDF1 homolog (GmMS6/GmTDF1a), an R2R3 MYB transcription factor speci cally expressed in anther and required for appropriate tapetum development (Fig. 2, 3, 4, and 5).
The ms6 allele present in T295H is caused by a point mutation, which leads to the substitution of L76 to H, a conserved residue on the R2 DNA-binding motif of GmMS6/GmTDF1a, suggesting that L76H likely alters DNA binding activity to disrupt the protein's function (Fig. 2). We also found that the transactivation activity and subcellular distribution of GmMS6 L76H were not disturbed (Fig. 6) and the mutant gene in ms6 was expressed at the WT level (Fig. 7), both supporting the above assumption from the other side. Phylogenetic and complementation analyses showed that GmMS6/GmTDF1a has a recently diverged and functional paralog GmMS6L/GmTDF1b, but the GmMS6L/GmTDF1b gene is expressed constitutively at a low level so that it cannot compensate the defective of GmMS6/GmTDF1a ( Fig. 3;  Fig. 5).
TDF1 is conservatively present in angiosperm species (Fig. 3 In ms6, the parietal layer is also vacuolated and obsessively enlarged, indicating that GmMS6/GmTDF1a plays an important role in regulating parietal layer's development progress. Moreover, mutant attdf1 and ostdf1 can process meiosis successfully to generate tetrads, whilst both ms6 mutants (Ames1 and Ames2) showed aberrations in cytokinesis following telophase II, resulting in partially-or non-separated multi-nucleic microspores ( Fig. 1f; Skorupska and Palmer 1989;Ilarslan et al. 1999).
Compared to the major crops rice and maize with dozens ms mutants, many of which have been cloned and well characterized ( One major reason should be that soybean is a paleopolyploid with two recent rounds of whole genome duplication (WGD), occurring ~ 13 and 59 million years ago, and about 75% of the genes exist with multiple copies (Schmutz et al., 2010). For example, amongst the genes we investigated in the present study, including the DYT1-TDF1-AMS-MYB103-MS1 pathway and two downstream target genes, all but A6 have ≥ 2 paralogs in the nuclear genome (Fig. 7). Therefore, it is a big chance that nature spontaneous mutation at one microsporogenesis-related gene would not affect anther development due to another functional redundant paralog(s) in the genome. However, angiosperm genomes usually undergo diploidization soon after WGD and tend to retain only a single functional copy for most duplicated genes, and the alternative gene copies might be lost, silencing, or evolving new functions (Lynch and John 2000; Soltis et al., 2015). Similar process is ongoing for the soybean genome evolution. GmMS6L coding for functional protein but constantly with a minimum expression level is likely a sign of gene silencing. Another identi ed male sterile gene in soybean, MS4, also have a nonfunctional homologous copy, MS4_homolog, which is transcribed at a low level and codes for dysfunction protein (Thu et al., 2019). Furthermore, two paralogs of a member in TDF1 regulatory pathway, MS1, exhibited different expression in ms6. Only the expression of one copy, Glyma.02g107600, is suppressed in ms6 while the other copy, Glyma.01g047400, is expressed similarly in WT and ms6 ora tissue, suggesting that the latter gene is no longer involved in the conserved DYT1-TDF1-AMS-MYB80/MYB103/MS188-MS1 pathway. Further study is needed to assess whether it become a non-functional gene or develop some new functions.
Anther development mechanism researches in model plants like Arabidopsis, rice and maize have achieved signi cant progresses over the past decades, which have shed light on the regular genetic basis regulating angiosperm anther development (Guo and Liu 2012; Wan et al. 2019). However, soybean anther develops a distinct morphological characteristic with the parietal layer in anther wall and possesses a more complexed regulatory network due to a large extent of gene duplication. Therefore, conserved transcription factors, like GmMS6 (GmTDF1a), may evolve some new regulatory pathways in anther development process. Future investigation in GmMS6 downstream network by using RNA-Seq and ChIP-seq technologies can help to further dissect its function and enrich our understanding and knowledge in soybean anther development. Moreover, identi cation of the ms6 gene in the present study provides an essential element in establishing the GMS-based SPT technology for soybean hybrid seed production.