Genome-Wide Investigation and Expression Profiling Under Abiotic Stresses of a Soybean Unknown Function (DUF21) and Cystathionine-β-Synthase (CBS) Domain-Containing Protein Family

Cystathionine-β-synthase (CBS) domain-containing proteins (CDCPs) constitute a large family in plants, and members of this family have been implicated in a variety of biological processes. However, the precise functions and the underlying mechanisms of most members of this family in plants remain to be elucidated. CBSDUF proteins belong to the CDCP superfamily, which contains one domain of unknown function (DUF21) and an N terminus that is adjacent to two intracellular CBS domains. In this study, a comprehensive genome database analysis of soybean was performed to investigate the role(s) of these CBSDUFs and to explore their nomenclature, classification, chromosomal distribution, exon–intron organization, protein structure, and phylogenetic relationships; the analysis identified a total of 18 putative CBSDUF genes. Using specific protein domains and phylogenetic analysis, the CBSDUF gene family was subdivided into eight groups. The soybean CBSDUF genes showed an uneven distribution on 12 chromosomes of Glycine max. RNA-seq transcriptome data from different tissues in public databases revealed tissue-specific and differential expression profiles of the GmCBSDUFs, and qPCR analysis revealed that certain groups of soybean CBSDUFs are likely involved in specific stress responses. In addition, GmCBSDUF3 transgenic Arabidopsis was subjected to phenotypic analysis under NaCl, PEG, and ABA stress treatments. The overexpression of GmCBSDUF3 could enhance tolerance to drought and salt stress in Arabidopsis. This study presents a first comprehensive look at soybean CBSDUF proteins and provides valuable resources for functionally elucidating this protein subgroup within the CBS domain-containing protein family. Electronic supplementary material The online version of this article (10.1007/s10528-020-09991-w) contains supplementary material, which is available to authorized users.


Introduction
Expression profiling studies in different organisms have suggested that proteins with unknown functions play important roles in many biological processes (Gollery et al. 2006). These proteins have been divided into two types: one includes proteins with obscure features that lack defined motifs or domains (POFs) and the other includes proteins with defined features that contain at least one previously defined domain or motif (PDFs). Among the latter, a group of proteins containing the cystathionine-β-synthase (CBS) domain might play important roles in stress response/tolerance in Arabidopsis under various stress conditions ). Since the CBS domain was first identified in the Archaebacterium Methanococcus jannaschii (Bateman 1997), CDCPs have been found to represent a large superfamily of evolutionarily conserved proteins. Kushwaha et al. identified CDCPs in whole-genome analyses of Oryza sativa and Arabidopsis thaliana and found that the CBS domain coexists with other functional domain(s) in most of these proteins, which may indicate their probable functions. Based on whether they have additional domain(s), these proteins were further classified into different subclasses: CBSX, CBSCLC, CBSSIS, CBSPPR, CBSIMPDH, CBSCBS, CBSCBSPB and CBSDUF. These subclasses possess various functions, including cytoplasmic targeting, subcellular localization of chloride channels (CLC), protein-protein interaction, protein regulation, sensing of cellular energy status, and maintenance of intracellular ion gradients (Bateman 1997). For example, the highly conserved structure of CBS domains from CLC plays a role in regulating the common gate (Estevez et al. 2004). AKINbc, a CDCP containing four CBS domains, contributes to SnRK1 heterotrimeric complexes and interacts with two proteins implicated in plant pathogen resistance (Gissot et al. 2006). OsCBSX4, a CDCP, could improve abiotic stress tolerance in plants (Singh et al. 2012). OsBi1, a CDCP, could be induced by BPH and is related to resistance to brown plant hopper in rice plants (Wang et al. 2004). OsCBSX3, a CDCP, is involved in rice resistance to M. oryzae (Singh et al. 2012).
However, very few studies have been reported on the CBSDUF subgroup. The CBSDUF subgroup protein contains one domain of unknown function (DUF21) (PF01595) and an N terminus that is adjacent to two intracellular CBS domains. This transmembrane region has no known function. Many of the sequences in this family are annotated as hemolysins because of their similarity to Q54318 (HLYC_BRAHO), which does not contain this domain. Therefore, the functions of DUF21 are still unknown. DUF21 often exists together with CBS domains and plays important roles in plant growth and development. The characteristics of the CBSDUFs in this subgroup are not yet clear. In our previous study, we identified CDCPs in soybean, but there was no detailed analysis of the CBSDUF subgroup. We found that overexpression of soybean GmCBS21, which belongs to the CBSDUF subgroup, possesses a novel function to improve low nitrogen tolerance in A. thaliana in our previous study (Hao et al. 2016). In addition, Sinharoy et al. found that a protein containing the CBS-DUF21 domain from Medicago Biochemical Genetics (2021) 59:  truncatula is required for rhizobial infection and symbiotic nitrogen fixation (Sinharoy and Liu 2016). Therefore, considering the above studies, we speculate that proteins in the CBSDUF subgroup may play an important role in regulating biotic and abiotic stress, especially in legumes, and are worthy of further exploration. Soybean is one of the most important oil crops in the world and provides a large proportion of the protein used by humans and animals (Kereszt et al. 2007). However, to date, few data (Hao et al. 2016) are available about proteins in the CBS-DUF subgroup in soybean. In this study, we took advantage of bioinformatics and publicly available data to identify and analyze soybean CBSDUF genes on a genome-wide scale. A total of 18 CBSDUFs were identified, and their phylogenetic relationships, gene structures, protein structures, conserved motifs, and expression patterns were analyzed in detail. Furthermore, the expression of CBS-DUFs in response to various abiotic stresses as well as low nitrogen treatments in a low N-tolerant soybean variety (Pohuang) was determined. Our results provide a basis for further investigation of the evolution and functions of CBSDUFs.

Identification and Phylogenetic Analysis of the Soybean DUF21and CBS-Domain-Containing Proteins
Eighteen putative GmCBSDUF members were found in the NCBI database and used as queries to conduct BLAST searches against the public genome database (https ://phyto zome.jgi.doe.gov/pz/porta l.html#). If more than one transcript existed, the primary transcript was selected as a representative. Using the same approach, 8,10,10,4,9,4, and 4 putative CBSDUF members were identified from common bean (Phaseolus vulgaris), M. truncatula, Lotus japonicus, sorghum, Arabidopsis, rice, and maize, respectively. Table 1 shows the information of CBSDUF genes. Based on available information in the Phytozome 12 database, functional annotations for soybean CBSDUFs were obtained. Less information about the functions of the CBSDUF genes was found. The main functional annotations showed that most of the CBSDUF genes were predicted to be ancient conserved domain proteinrelated, metal transporter CNNM, or hemolysin-related. The specific functions of these genes remain to be discovered.
A phylogenetic tree was built with 67 protein sequences from eight plant species to investigate the phylogenetic relationships among CBSDUFs from soybean, three other legumes, Arabidopsis, and three gramineous plants (Fig. 1). The soybean CBSDUFs were named GmCBSDUF1 to GmCBSDUF18 according to their chromosomal positions. The genes from the other plant species were named by the same method. Based on the results of phylogenetic tree analysis, we divided these CBS-DUFs into eight groups: Group A to Group H (Fig. 1). Group A included 21 members, and it covered eight species. All members of Group B and Group E were dicotyledonous plants. Group C was monocot-specific. Group D did not include legume members. Group F and Group G were legume-specific. The legume CBSDUFs show a very close evolutionary relationship, and the CBSDUFs from gramineous plants show a close evolutionary relationship. Compared to other species, the soybean CBSDUF gene family is extensively expanded. The number of soybean CBSDUFs was almost as many as those from rice, maize, sorghum, and Arabidopsis combined ( Table 1). The number of GmCBSDUF genes is approximately two times more than those of Arabidopsis, common bean, M. truncatula, or L. japonicus and four times more abundant than those of rice, maize, or sorghum. The reason for this increase may be the multiple whole-genome duplication events of the soybean genome (Schmutz et al. 2010). The number of CBSDUF genes in dicotyledonous plants is much greater than that in monocotyledonous plants. Therefore, we speculate that CBSDUF plays an important role in dicots than monocots. The phylogenetic relationships may reflect some distinction between legume plant CBSDUFs and the four nonlegume plant CBSDUFs and indicate that the potential biological functions of some CBSDUFs are specific to legume plants.

Gene Structure and Protein Structure of GmCBSDUFs
Exon-intron structural diversity often plays a key role in the evolution of gene families. To investigate the exon-intron organization of GmCBSDUFs, gene structures were mapped on the basis of the genomic and coding region sequences. The results showed that GmCBSDUFs have 8-15 exons and highly similar gene structures in the conserved region (Fig. 2). The size of GmCBSDUF genes is mainly affected by their intron size. GmCBSDUF12 is the largest gene and has the longest total intron length. The soybean genome has undergone significant changes in the long-term evolutionary process. Some CBSDUF proteins are highly homologous in the terminal nodes, suggesting that they are putative paralogous pairs. In the study, a total of seven putative paralogous pairs (4/6, 10/14, 11/13, 5/17, 2/3, 8/16, 1/12) were identified, with sequence identities ranging from 60.47 to 99.26%.
To some extent, functional information can be derived from structural similarity. Knowledge of the structure is often essential for interpreting functional data. GmCBSDUF protein structures are shown in Fig. S1. It is clear that GmCBSDUF proteins have a highly conserved hydrophobicity profile, with one hydrophobic segment located at the N terminus. SMART allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. The results are shown in Fig. 3. The major domains are the DUF21 and CBS domains. The DUF21 domain is found in the N terminus of each protein, adjacent to two intracellular CBS domains, and has no known function. In addition, most GmCBSDUF proteins possess 3-4 transmembrane helices except for GmCBSDUF10, GmCBS-DUF11, and GmCBSDUF13, which have 2, 1, and 5, respectively. Interestingly, all GmCBSDUFs transmembrane domains pass through the DUF21 domain. Therefore, we speculate that the domain of unknown function DUF21 may play a role in ion channel or signal transduction. In this study, the secondary and tertiary structures of GmCBSDUF proteins were predicted (Fig. 4). The structures were analyzed and compared to the results of Fig. 2. Proteins with high identities also have similar secondary structures, such as GmCBSDUF4/6, GmCBSDUF11/13, GmCBSDUF10/14, GmCBSDUF5/17, GmCBSDUF2/3, GmCBSDUF8/16, and GmCBSDUF1/12. Interaction with a ligand molecule is essential for many proteins to carry out their biological function. This interaction is generally specific, not only in terms of the molecules involved in the interaction but also in the location (i.e., the site of ligand binding) in which the interaction takes place. The results showed that although most GmCBSDUF proteins have similar structures, they have different binding sites, suggesting that they may display different functions.

Tissue-Specific Expression Profiling of GmCBSDUFs
Based on the publicly available soybean RNA-Seq data , the expression patterns of 18 GmCBSDUFs were investigated in various tissues, including (1) root hair cells isolated at 84 h after sowing (HAS), (2) root hair cells isolated at 120 HAS, (3) root tips, (4) roots, (5) mature nodules, (6) leaves, (7) shoot apical meristems, (8) flowers, and (9) green pods. An expression heat map was constructed (Fig. 5a). The results showed that (1) all GmCBSDUFs were expressed in at least one tissue; (2) GmCBSDUF2/3/5 were expressed in all tissues, and their expression levels were relatively high; (3) GmCBSDUF9 had the lowest expression under all conditions; (4) GmCBSDUF8 was expressed only in the underground tissues; and (5) GmCBSDUF9 was expressed only in one shoot apical meristem. In addition, GmCBSDUF1/12 as well as GmCBSDUF16/13 showed similar expression patterns. Moreover, based on the publicly available soybean RNA-Seq data , expression heat maps of 14 GmCBSDUFs (except GmCBSDUF7/11/13/16, which were not or barely expressed in roots) in root hairs harvested at 12, 24, and 48 h after Bradyrhizobium japonicum inoculation (HAI), in mock-inoculated root hairs at 24 HAI, and in stripped roots at 48 HAI were also constructed (Fig. 5b). Based on the rhizobial inoculation method according to Libault et al. (2010), a B. japonicum cell suspension or water (mock inoculation) was sprayed on soybean protein binding region, polynucelotide-binding region, helix, strand, disordered region, buried, exposed, helical transmembrane region. b The tertiary protein structures were predicted by using Phyre2 (Color figure online) seedlings growing on B&D agar medium. The results showed that inoculation with B. japonicum significantly increased the expression of GmCBSDUF8/9, but not other GmCBSDUFs, in root hairs. Therefore, we suspect that GmCBSDUF8/9 may be required for bacterial recognition, nodulation, and nitrogen fixation.
Furthermore, the soybean (Glycine max) genome database (Phytozome 12) provides high-resolution gene expression data for a diverse set of 17 soybean GeneAtlas tissue samples, such as flower (open and unopened), lateral root (standard), leaf (ammonia, nitrate, urea, standard and symbiotic condition), nodule (symbiotic condition), root tip (standard), root (ammonia, nitrate, urea, standard and symbiotic condition), shoot tip (standard), stem (standard), and 9 soybean normal tissue samples (flower, leaf, nodule, pod, root, root hair, seed, SAM, and stem). These data were also analyzed and represented as heat maps (Fig. S3). Expression analyses of all GmCBSDUF genes revealed that the different members have different tissue-specific expression. Among all 18 analyzed genes, GmCBSDUF5 showed the highest level of constitutive expression in all tissues, followed by GmCBSDUF3, GmCB-SDUF2, and GmCBSDUF12. This high level of constitutive expression indicates a significant role in all these soybean tissues (Fig. S3). A cluster of genes showed low levels of expression in all tissues. They are GmCBSDUF8/9/11/13. GmCBSDUF16 is highly expressed only in root nodules, but its expression is very low in symbiotic conditions. These results are basically consistent with the results in Fig. 5, which makes the analysis of tissue expression patterns of GmCBSDUF genes more sufficient and meaningful. Analysis of the expression patterns of these genes will be helpful to the study of their function. All these expression profiles suggest functional redundancy and divergence among the soybean GmCBSDUFs during plant growth and development.

Promoter Analysis
Based on the soybean genome database (https ://www.phyto zome.net/soybe an), the promoter regions located 2 kb upstream of the translation start codons of the GmCBSDUF genes were analyzed using the PlantCARE promoter analysis program (https ://bioin forma tics.psb.ugent .be/webto ols/plant care/html/). Multiple elements were identified, and the stress and hormone signaling-related sites are shown in Table 2. The table describes Table 2, ABRE, MBS, TCA element, GARE-motif, and HSE were all present in the promoters of most of the GmCBSDUF genes, while the WUN-motif was found only in GmCBSDUF2; P-box in GmCBSDUF10; CE1 in GmCBSDUF16; TATCbox in GmCBSDUF4/6; LTR in GmCBSDUF9/13/18; and GATA-motif in GmCB-SDUF5/6/10/13/15/17. The prediction of promoter elements provided some clues to the responses of GmCBSDUFs to various abiotic stresses.

Expression Profiles of GmCBSDUFs Under Low Nitrogen Stress Conditions
Our previous studies have shown that GmCBS21, which contains the DUF21 and CBS domains, can improve plant low nitrogen tolerance (Hao et al. 2016). To further understand the low nitrogen responses of GmCBSDUF genes, the transcript levels of these genes in soybean seedlings under low and normal nitrogen conditions were analyzed using real-time PCR. Figure 6a-c shows their expression in leaves, stems, and roots, respectively, at 0.5 h, 2 h, 6 h, and 12 h (short-term) and 3, 6, and 9 days (long-term) post-treatment. As shown in Figs. 6a and S4, 17 soybean GmCBSDUF genes were differentially expressed in the leaves of low-nitrogen-treated seedlings and untreated control seedlings. The expression patterns of the soybean GmCBSDUF genes in leaves were very different from those in stems and roots. As shown in Fig. 6a, (1) GmCBSDUF16 and GmCBSDUF17 were upregulated after low nitrogen treatment at all time points, (2) GmCBSDUF18 was downregulated after low nitrogen treatment at most of the time points, only 3 and 6 days were slightly increased (3) GmCBSDUF5 was upregulated after 0.5 h to 6 days of low nitrogen treatment but slightly downregulated after 9 days of low nitrogen treatment, and (4) GmCB-SDUF10/15 was downregulated after short-term treatment but upregulated after long-term treatment. These results may indicate that these genes play different roles in different time periods. Figures 6b and S4 show the expression of GmCBSDUF genes in stems. It was clear that 17 GmCBSDUF genes were differentially expressed in stems after low nitrogen treatment. Among them, the expression of GmCBSDUF5 and GmCBS-DUF11 was significantly upregulated at low nitrogen conditions at any given time point; the expression of GmCBSDUF6 and GmCBSDUF9 was significantly upregulated at most time points, and the difference was not significant only at the 6 h point. The expression of GmCBSDUF15 was significantly upregulated at 6 time points except at 0.5 h (downregulated). The expression levels of GmCBSDUF4 and GmCBSDUF12 were upregulated at most time points except 12 h (downregulated). The expression levels of 4 GmCBSDUF genes (GmCBSDUF5/7/8/11) were upregulated after short-term treatment, and 7 GmCBSDUF genes (GmCBS-DUF1/3/4/5/12/15/16) were upregulated after long-term treatment. Figures 6c and S4 show the expression of GmCBSDUF genes in roots. In detail, low nitrogen conditions significantly upregulated the expression of GmCB-SDUF2/8/11 but downregulated the expression of GmCBSDUF4/6/7/14. Moreover, GmCBSDUF3/10/15/18 increased after long-term treatment (6, 9 days) while expression of GmCBSDUF16 decreased.
The above results clearly showed that most GmCBSDUF genes were significantly induced in response to low nitrogen stress treatment. Therefore, we speculate that, in addition to the GmCBS21 gene, the other genes in the family are also associated with plant nitrogen utilization. We also found significant gene expression changes in leaves at the early time point (0.5 h) after stress treatment. This may indicate that these genes play a major role in nitrogen assimilation. Future studies are needed to demonstrate the functional roles of genes responsive to low N stress in relation to N metabolism.

Effect of Abiotic Stresses on the Expression of GmCBSDUFs
As described in Table 2, most soybean GmCBSDUF genes have stress and hormone signaling-related responsive elements. Some studies have also found a role for plant CDCPs in abiotic stress response ). To investigate whether GmCBSDUFs also have similar roles in soybean, the expression patterns of GmCB-SDUFs in response to cold, dehydration, H 2 O 2 , ABA, and salinity stress were examined. The raw expression values for the genes are shown in Table S2.
Two-week-old soybean seedlings were exposed to cold stress at 4 °C for 0, 0.5, 5, or 12 h, and the expression of GmCBSDUFs was detected. The results revealed that cold stress altered the expression of GmCBSDUFs, which could be grouped into 3 categories. As indicated in Fig. 7a, category I contained genes that showed increased transcript accumulation under stress, including GmCBSDUF7/8/11/13/16, and the expression of GmCBSDUF7/8/11 decreased slightly at 12 h. All four gene Fig. 7 Expression analysis of GmCBSDUF genes in response to abiotic stresses. Two-weekold soybean seedlings were exposed to stress treatments as indicated below. Gene expression analysis was conducted by qRT-PCR using gene-specific primers. a Cold stress, b dehydration stress, c H 2 O 2 stress, d ABA stress, e salinity stress. The transcript levels of GmCB-SDUF genes in plants at 0.5, 5, and 12 h poststress treatments were plotted as the relative expression (fold change) of the nonstressed control plants. The transcript level of actin was used as a reference family members were expressed to their highest level either at 5 or 12 h after cold stress. Category II contained genes (GmCBSDUF2/3/4/5/6/17/18) that showed a gradual decrease in transcript accumulation with prolonged cold treatment. In addition, the expression of GmCBSDUF10 reached its lowest level at 0.5 h, and GmCB-SDUF12 reached its highest at level 0.5 h. The expression levels of genes in category III (GmCBSDUF 1/9/14/15) showed no obvious change. Figure 7b shows the effects of dehydration treatment on the transcription of GmCBSDUFs in soybean seedlings. It is clear that (1) the transcript levels of 18 GmCBSDUFs gradually increased with prolonged stress. Among the 18 GmCBS-DUF genes, GmCBSDUF1/2/3/4/5/12/18 were only weakly upregulated (no more than threefold) under dehydration treatment. GmCBSDUF6/8/9 peaked at 5 h, and 2/5/10/17 decreased at 0.5 h. By comparison, GmCBSDUF7/8/9/10/11/13/14/16 showed notable changes. (2) The transcript levels of GmCBSDUF15 and GmCBS-DUF17 were slightly downregulated under dehydration treatment. These results further suggest that GmCBSDUF genes play a role in plant drought resistance. Figure 7c shows the effects of H 2 O 2 on the transcription of GmCBSDUFs in the roots of soybean seedlings. It is clear that H 2 O 2 treatment (1) increased the transcript levels of GmCBSDUF6/7/8/10/11/12/16, (2) decreased the transcript levels of GmCBSDUF2/3/14/15/17/18, and (3) did not change the transcript levels of other GmCBSDUFs. Figure 7d shows the time-course effects of 100 μM ABA on the transcription of GmCBSDUFs in soybean seedlings. The results show that (1) the expression levels of GmCBSDUF9 and GmCBSDUF11 were significantly increased by 100 μM ABA treatment at 0.5 h but gradually decreased with prolonged ABA, (2) the expression levels of GmCBSDUF2/4/10 were significantly increased with ABA treatment, and (3) the expression levels of GmCBSDUF15/17/18 were significantly decreased after ABA treatment. Figure 7e shows the time-course effects of salt stress on the transcription of GmCBSDUFs in soybean seedlings. The results show that (1) the expression levels of GmCBSDUF 1/2/3/4/8/9/10 gradually increased as the stress was prolonged, and GmCBSDUF8/9/10 reached their highest levels at 12 h of salt stress, while GmCB-SDUF1/2/3/4 reached their lowest levels at 0.5 h of salt stress; (2) the expression levels of GmCBSDUF7/11/13/14 increased considerably at one or more stress time points (0.5 h, 5 h, or 12 h), and (3) the expression levels of GmCBSDUF15/17/18 decreased compared to the 0 h treatment.

Phenotypes of GmCBSDUF3 Transgenic Arabidopsis
CBSDUFs may be involved in multiple stress responses in plants. As described above, when induced with some stresses, the expression of GmCBSDUF genes is significantly altered. Our previous study found that GmCBSDUF3 could improve plant nitrogen use efficiency. Therefore, we chose GmCBSDUF3 for further functional exploration. Two homozygous constitutively overexpressing Arabidopsis lines (GmCBSDUF3-1 and GmCBSDUF3-2) with higher GmCB-SDUF3 expression were selected for phenotypic analysis under NaCl, PEG, and 1 3 Biochemical Genetics (2021) 59:83-113  Fig. 8, on MS medium alone, no obvious difference was observed between the transgenic and wild-type (WT) seeds. However, when sown on MS medium containing 50 mM NaCl, WT seeds germinated much later than transgenic GmCBSDUF3 seeds. After sowing on MS medium containing 2% PEG for 5 days, transgenic plants grew better than WT plants and had well-developed root systems. The germination rate on MS medium containing 1.5 μM ABA was also analyzed. Treatment with ABA delayed the germination of both transgenic and WT seeds and led to no significant difference between the transgenic and WT plants. The transgenic plants and control plants were also sensitive to ABA stress. After 10 days of treatment, the growth status of GmCBSDUF3 transgenic Arabidopsis seedlings was also investigated. As shown in Fig. 9a, when the seedlings were grown on MS medium supplemented with 50 mM NaCl or 2% PEG, transgenic plant growth was superior to that of WT. Transgenic plants had well-developed root systems to absorb nutrients and water. Groups of ten seedlings per strain were used to measure the whole plant weight (fresh weight). The fresh weights of the transgenic seedlings were higher than those of WT (Fig. 9b). Because of the well-developed root system under NaCl or PEG conditions, the transgenic seedling weight is higher than under normal conditions. These results revealed that overexpressing GmCBSDUF3 in plants could increase tolerance under NaCl and PEG stress conditions.

Discussion
Although some CDCPs, such as IMPDH (Collart et al. 1996;Wang et al. 2004) and ClC (Hechenberger et al. 1996;Diédhiou and Golldack 2006;Lv et al. 2009), have been characterized in plant systems, the majority of members in this family remain uninvestigated, especially the CBSDUF subgroup. Many sequences related to CBSDUF genes have been uploaded in GenBank, but only a few of them have been well described in terms of their expression pattern, biochemical characteristics, subcellular locations, and particularly their biological functions. Transcriptomic and proteomic analyses of CDCPs have revealed differential expressional profiles in plants challenged with virus (Espinoza et al. 2007), fungi (Fabro et al. 2008), salinity stress (Kumari et al. 2009;Sahu and Shaw 2009), and oxalic acid treatment (Wang et al. 2009). All these data indicate that the members of this family in different plant species may play important roles in diverse developmental processes, including developmental programmed cell death, and responses to different biotic and abiotic stresses. These works present the necessity of extensively investigating CBSDUF genes in plants, especially in crops, with the expectation of improving crop yield and resistance. They have identified, classified, and suggested the nomenclature of CDCPs in Arabidopsis and rice and performed a brief analysis of expression patterns for CDCPs using the already existing transcriptome profiles and the MPSS database ). However, the detailed expression characteristics of CBSDUF subgroup genes in plants, especially in soybean, are still largely unknown. In this study, 18 CBSDUF genes were identified in the soybean genome through the public genome database. The characteristics of CBSDUF genes were analyzed in detail in our study.

Characteristics of CBSDUF Genes in Soybean
Bioinformatics analysis has become the first and most important method for the study of new gene functions. By bioinformatics analysis, researchers can often obtain important information about the functions of new genes and then make a plan for further experimental research. Therefore, we analyzed the structures and molecular evolution of GmCBSDUF genes as well as their coding products and structures. The relatively higher number of CBSDUF-family genes in soybean is consistent with the suggestion that gene duplication has been universal in the soybean genome during its evolution (Schmutz et al. 2010). By domain analysis, we found that a highly conserved DUF21 domain exists only with the CBS domain. This domain may be crucial for GmCBSDUF gene function. To carry out research on the functions of new genes, we must first clarify their regular gene expression patterns in vivo. Thus, the expression patterns of GmCBSDUF genes were analyzed in different developmental stages and tissues of soybean (Fig. 5). The results revealed the tissue-specific expression patterns of CBSDUF genes in soybean. Some GmCBSDUF genes were maintained at high expression levels in some plant tissues, followed by moderate expression levels in other tissues (Fig. 5a). For example, GmCBSDUF14 was highly expressed in the root tip, while GmCBSDUF2 was highly expressed in the root. In contrast, some GmCBSDUF genes, such as GmCBSDUF8, GmCBSDUF11, and GmCBSDUF16, showed low expression levels in only the underground tissues with no expression in other tissues. This implies that different GmCBSDUF genes may have different functions in different tissues. A M. truncatula CBSDUF protein, MtCBS1, was found to be required for rhizobial infection and symbiotic nitrogen fixation (Sinharoy and Liu 2016). GmCBSDUF8 is the closest homolog of MtCBS1 in soybean and is expressed in only roots. After inoculation of B. japonicum, its expression was induced in root hairs, suggesting a potential role of GmCBSDUF8 in symbiosomes capable of fixing nitrogen. We will further verify this function by experiment.

Potential Roles of CBSDUF Genes in Response to Different Stress Treatments
It is well known that plant responses and stress-activated signaling pathways are largely overlapping. Kushwaha et al. (2009) reported that some AtCBS genes, such as AtCBSX2, AtCBSX3, and AtCBSCBS1, were stably expressed under any stress conditions, while some, such as AtCBSX1 and 15, were more sensitive to all stress conditions in both roots and shoots, and some, such as AtCBSDUFCH2, AtCBS-DUF1, AtCBSDUF2, and AtCBSCBS2, were sensitive to stress conditions only in roots. In this study, the expression patterns of soybean CBSDUF genes under abiotic stresses were analyzed (Fig. 6). In contrast to other subgroup members, the results showed that GmCBSDUF7/8/11/16 was upregulated after exposure to cold, drought, salt, and H 2 O 2 , while GmCBSDUF17/18 was downregulated by cold, H 2 O 2 , salt and ABA, suggesting that these GmCBSDUF genes may play a role in crosstalk between signaling pathways responding to drought, H 2 O 2 , salinity, cold, and ABA. The results presented here will be helpful for future studies of the biological functions of GmCBSDUF proteins. Remarkably, we found that GmCBSDUF7/8/11/13/16 showed significant differences in expression under stress treatments. Therefore, we speculate that these genes are inducible and may play an important role in stress response. We will further examine this prospect in subsequent studies.
In conclusion, we performed a comprehensive bioinformatics analysis and provided detailed information on the soybean CBSDUF gene subgroup. Specifically, our results show that the soybean genome contains 18 CBSDUF genes, the largest subgroup among the identified CBSDUF gene subgroups in the study. Our analysis revealed the possible function of each GmCBSDUF gene in response to cold, salt, H 2 O 2 , ABA, dehydration, and low nitrogen, identified their potential clients and functional interactions, and revealed the specific responses of some GmCBSDUF genes to specific stresses. By interaction network prediction, some candidate interacting genes were found. At the same time, we preliminarily explored the function of GmCBSDUF3, which might improve the ability to resist abiotic stress in plants. This result provides an impetus for additional investigation of the biological roles and interacting proteins of the CBSDUF protein family in soybean, and a functional analysis of the genes in this family will be carried out systematically. In the future, we will use functional genomics in combination with a transgenic approach to verify the utility of those proteins with defined features as tools to improve stress tolerance in crop plants. Based on the present research and the characteristics of each family member, the research on functional analysis was classified and summarized. We will use gene knockout and transgenic technology to study the functions of the GmCB-SDUFs. At the same time, the functions of the two domains, CBS and DUF21, will be studied by site-directed mutagenesis. In addition, due to the lack of information about this family of proteins, the biological pathways involving these genes are still unknown. We will screen for interacting proteins with yeast two-hybrid technology and provide evidence for their mechanisms of action. We will also determine the expression of transgenic plants under specific conditions by high-throughput sequencing technology and infer the gene regulatory network. The ideas provided here would also have a way for expounding the definite role of CBSDUF proteins in plants.

Identification of DUF21 and CBS Domain-Containing Proteins in Soybean
The known DUF21 and CBS domain-containing protein sequences from soybean, Arabidopsis, common bean, M. truncatula, L. japonicus, rice, maize, and sorghum were obtained from the NCBI database and used as queries to conduct BLAST searches against the public genome database (https ://phyto zome.jgi.doe.gov/pz/ porta l.html#) and L. japonicus genome database (https ://www.kazus a.or.jp/lotus /). Sequences with an E value < 1.0 were selected for further analysis. A search with the keywords PF00571 for the CBS domain and PF01595 for the DUF21 domain was conducted for putative soybean CBSDUFs by searching ontologies against the Phytozome (v12.0) database (https ://www.phyto zome.net). If more than one transcript existed, the primary transcript was selected as a representative.

Phylogenetic, Gene, and Protein Structure Analyses
Multiple alignment analysis was performed with ClustalX 1.83 software (Thompson et al. 1997). Phylogenetic trees were generated by the neighbor-joining (NJ) method and bootstrap analysis (1000 replicates), and phylogenetic analysis was performed using MEGA6 software (Hall 2013). The exon/intron structures of the CBS genes were determined by comparing the coding sequences and corresponding genomic sequences in the gene structure display server (GSDS, https ://gsds.cbi.pku.edu. cn/) (Guo et al. 2007). The protein transmembrane topology was predicted using TMHMM Server v2.0, and tertiary protein structures were predicted using Phyre. Domain architecture was analyzed by SMART (a Simple Modular Architecture Research Tool).

Plant Materials and Treatments
For low nitrogen treatment, seeds of a low N-tolerant soybean variety (Pohuang) were germinated. After 7 days, the seedlings were grown hydroponically in halfstrength modified Hoagland solution until the first trifoliate leaf was fully developed and then grown in normal nitrogen solution (2 mM  O) at 25 °C in a chamber with a 12-h light and 12-h dark photoperiod. All treatments were performed over a continuous time course (0 h, 0.5 h, 2 h, 6 h, 12 h, and 3, 6, and 9 days). Roots, stems, and leaves from control and stress-treated plants (five plants were collected as mixed samples at each time point) were collected as samples in three biological replicates for RNA preparation, and the samples were quickly frozen in liquid nitrogen and stored at − 80 °C until use.
Soybean seeds were geminated in water at 25 °C in the dark under conditions of a 12--h light and 12-h dark photoperiod and 70% humidity. Salt, dehydration, cold, H 2 O 2 , and abscisic acid (ABA) stresses were applied to 2-week-old soybean seedlings. For salt stress, the roots of seedlings were dipped into solutions of 200 mM NaCl. For dehydration, the root systems of whole plants were placed onto filter paper with 70% humidity at room temperature for induction of a rapid drought treatment (Feng et al. 2015). For H 2 O 2 stress, the roots of seedlings were dipped into solutions of 25 mM H 2 O 2 . For ABA treatment, soybean seedlings were sprayed with 100 μM ABA. For cold treatment, soybean seedlings were subjected to 4 °C. All stress treatments lasted from 0 to 12 h. Each treatment contained three independent replicates. At 0, 0.5, 5, and 12 h after each treatment, soybean seedlings were harvested, and five plants were collected as mixed samples at each time point, frozen in liquid nitrogen, and stored at − 80 °C until extraction of total RNA for qRT-PCR assays.

Expression Analysis of GmCBSDUFs
Total RNA was isolated from soybean tissues using TRIzol reagent (Invitrogen) and treated with DNase I (Invitrogen) to avoid genomic DNA contamination. Firststrand cDNA was synthesized using Superscript II reverse transcriptase (Invitrogen). Gene-specific primers were designed according to gene sequences using Primer 5.0 software (Table S1). The quantitative RT-PCR was performed with a CFX96TM real-time system (Bio-Rad) in a 20 μl system containing 2 μl of a tenfold diluted cDNA, 10 μl of 2 × SYBR green real-time PCR master mix (Takara), and 1 μl each of 10 μM forward and reverse primers. β-actin was used as the internal control. Statistical analyses were performed using the t-test, and p < 0.05 and < 0.01 were considered significant and extremely significant differences, respectively.

Vector Construction, Arabidopsis Transformation, and Stress Treatment
The full-length coding sequence (the primers 5′ ATG GCG GCA GAG ATA CCG 3′ and 5′ CTA TTG ATT CCT TAG TGA CTC ACT 3′.) of GmCBSDUF3 was TA cloned into the plant expression vector pCXSN. The recombinant construct containing the 35S::GmCBSDUF3 (Fig. S2A) cassette was introduced into Agrobacterium tumefaciens strain GV3101 and then transformed into Arabidopsis (Columbia) via the floral dip method. The transgenic plants were screened on MS medium with 100 mg/L hygromycin and confirmed by PCR analyses. The expression levels of GmCBS-DUF3 in transgenic plants were determined by qPCR (Fig. S2B).
Seeds of transgenic overexpressing Arabidopsis and WT plants were grown on 10 × 10 cm MS agar plates. They were routinely kept for 2 days in darkness at 4 °C to break dormancy and transferred in a light growth chamber under a day/night 16/8 h cycle at 23 °C. For stress treatment, the seeds of transgenic lines or WT were kept on MS media supplemented with 50 mM NaCl, 2% PEG, or 1.5 μM ABA. Each treatment contained three independent replicates.