Introduction

Solanum nigrum is a wild black nightshade species belonging to the Solanaceae family, native to Eurasia, and introduced to America, Australasia, and South Africa1,2. S. nigrum is a common perennial plant found in roadsides, wooded areas, and disturbed habitats. Both ripe fruits and leaves of S. nigrum have been used for culinary and traditional medicine purposes in many countries1. Previous studies have reported the presence of many beneficial compounds, such as anthocyanidins, glycoproteins, glycoalkaloids, and polyphenolics in S. nigrum3,4,5. S. nigrum is also rich in amino acids such as arginine, aspartic acid, alanine, isoleucine, L-proline, serine, and valine6. Therefore, S. nigrum has great potential to be used as a beneficial food source. However, solanine, a toxic steroidal glycoalkaloid (SGA), is found in many parts of S. nigrum7. The concentration of this alkaloid is the highest in young leaves and green unripe fruits, and the levels decline with maturation2,8. Only the ripe fruits or cooked leaves of S. nigrum are consumed to avoid toxicity.

Two major classes of fruit secondary metabolites commonly found in the Solanaceae family are carotenoids and flavonoids. Carotenoids are red/yellow pigments that play important roles in photosynthesis and photoprotection, attraction of pollinators and seed dispersers, and biosynthesis of plant hormones such as abscisic acid (ABA) and strigolactones9. The metabolic pathways of carotenoids are highly conserved in many plant species and have been extensively studied in tomato (Solanum lycopersicum) fruit. During tomato fruit ripening, the expression of enzyme-coding genes, including geranylgeranyl pyrophosphate synthase (SlGGPS), phytoene synthase (SlPSY), phytoene desaturase (SlPDS), zeta-carotene desaturase (SlZDS), and carotene isomerase (SlCRTISO) are upregulated and are primarily involved in the accumulation of lycopene9,10,11,12. The expression of these enzyme genes is controlled by environmental (e.g., light, temperature) and internal (e.g., hormones) regulators. Some MADS-box ripening regulators, such as TOMATO AGAMOUS-LIKE1 (TAGL1), Ripening Inhibitor (RIN), FRUITFULL1 (FUL1), and FUL2, and other types of transcription factors (TFs) are involved in this process13. Unlike in tomato, little is known about carotenoid metabolism in the fruits of other wild Solanaceae species, including S. nigrum.

Flavonoids are important molecules responsible for the color of flowers that attract pollinator animals. Anthocyanins are important flavonoids that play multiple roles in plant development, including protection against biotic and abiotic stresses. The metabolic pathways of anthocyanins are highly conserved in plants, and they are synthesized by a series of enzymes involved in the phenylpropanoid pathway14. These biosynthetic enzyme genes are subdivided into two groups: early biosynthetic genes (EBGs: CHALCONE SYNTHASE (CHS), CHALCONE ISOMERASE (CHI), and FLAVANONE 3-HYDROXYLASE (F3H)) and late biosynthetic genes (LBGs: FLAVONOID 3-HYDROXYLASE (F3H), FLAVONOID 35-HYDROXYLASE (F35H), DIHYDROFLAVONOL 4-REDUCTASE (DFR), ANTHOCYANIN SYNTHASE (ANS), and UDP-GLUCOSE FLAVONOID-3-O-GLUCOSYLTRANSFERASE (UFGT))14. In many Solanaceous vegetables, the expression levels of LBGs and anthocyanin content are reported to be positively correlated14,15,16,17,18. The expression of anthocyanin biosynthetic genes is regulated mainly by the MYB-bHLH-WD40 (MBW) transcription factor complex. In S. nigrum, anthocyanin accumulates in significant quantities only in fully ripened purple fruits, and not in leaves, stems, or green unripe fruit19.

Currently, genomes of many members of the Solanaceae species, such as tomato, potato, pepper, and eggplant, have been sequenced, and metabolic enzyme gene expression regulation has been reported to be directly associated with the production of beneficial metabolites. For example, a rare allele in the TomLoxC promoter was identified in the tomato pan genome and was selected during domestication. Quantitative trait locus (QTL) mapping and analysis of transgenic plants revealed a role for TomLoxC in apocarotenoid production, which contributes to tomato flavor20. Furthermore, genome-wide analysis in potato identified 77 genomic loci encoding enzymes involved in starch metabolism, including starch biosynthesis and degradation21. Moreover, the chromosome-scale reference genome of black pepper provided insights into piperine biosynthesis, and comparative genomic analyses further revealed specific gene expansions in the glycosyltransferase, cytochrome P450, shikimate hydroxycinnamoyl transferase, lysine decarboxylase, and acyltransferase gene families22. Additionally, 121 basic helix–loop–helix (bHLH) transcription factors that are related to anthocyanin biosynthesis in eggplant were identified in the recently released eggplant genome23. Unfortunately, only limited genetic resources, such as genome and transcriptome, are available for S. nigrum, and studies on metabolic pathways have rarely been conducted.

Here, we profiled the S. nigrum transcriptome from mature leaves, reproductive shoot apices, and ripe fruits using the Illumina paired-end platform. The sequencing reads were assembled to create reference unigenes of S. nigrum, and we explored the phenotypic differences between S. nigrum and S. lycopersicum using expression analyses of the unigenes. Moreover, we identified and characterized DEGs and differentially expressed TFs among samples. The results provided an understanding of molecular variations in the metabolic pathways of S. nigrum and S. lycopersicum and could assist further molecular research of S. nigrum.

Results

Tissue-specific gene expression profiles of unigenes in S. nigrum

To develop a transcriptome of S. nigrum, we performed RNA sequencing (RNA-seq) using three different tissue samples: mature leaves, reproductive shoot apices, and mature black fruits (Fig. 1a and Supplementary Table S1), with three biological replicates of each tissue. We primarily focused on the fruit of S. nigrum, due to its potential to be used as food. We investigated the shoot apex and the mature fruit which are important for reproductive transition and used leaf, which is the central photosynthetic tissue, as the control. A total of 47,470 unigenes were identified with a transcripts per million (TPM) value greater than 0.3 (Supplementary Table S2). Data quality was validated by correlation assays (Supplementary Fig. S1), and the unigenes were assessed using BUSCO24 (Supplementary Fig. S2). Subsequently, 37,223 (78.4%) unigenes were functionally annotated using BLASTP25 (Supplementary Tables S3 and S4). The workflow of the entire procedure is summarized in Supplementary Fig. S3. We also noticed that the GC content of the transcripts of S. nigrum was 42–43% (Supplementary Table S1), which is in a range similar to that of other GC-poor dicots Arabidopsis and tomato26. Compared to the GC-rich monocots such as rice (45–50% of GC in transcriptome), S. nigrum showed lower GC level, which might imply that S. nigrum did not experience any extreme cold or drought conditions during evolution, owing to the low thermal stability27.

Figure 1
figure 1

Unigene expression dynamics and enriched GO terms in S. nigrum DEGs. (a) Representative images for RNA-seq samples in S. nigrum. Red arrowheads indicate the collected positions. Scale bars, 6 cm. (b) Nine clusters (C1–9) of S. nigrum DEGs according to the expression patterns. The numbers in parentheses represent the number of DEGs. X-axis indicates each sample (L, leaf; SA, shoot apex; BF, black fruit) and Y-axis indicates standardized TPM values. (c) Enriched GO terms in each cluster. The heat-map color scale represents − log10(p value) for each GO terms.

Based on the normalized read counts of unigenes in all tissues, we identified a total of 18,860 DEGs across the tissue samples using DESeq228 with cut-off criteria: log2-fold change ≥ 2, false discovery rate (FDR) < 0.05, and TPM value ≥ 3. The DEGs were clustered into nine clusters (C1–9) according to the expression dynamics in three tissues (Fig. 1b, see “Methods”). Cluster 1–4 (7620 genes) was grouped as a leaf meta-cluster (L) containing genes mainly expressed in the leaf tissue. Genes in cluster 5–7 (6386 genes), grouped as a shoot apex meta-cluster (SA), were highly expressed in the shoot apex. Cluster 8–9 (4854 genes) was grouped as a black fruit meta-cluster (BF), in which gene expression peaked in the black fruit. (Fig. 1b and Supplementary Table S5).

To functionally categorize each cluster, we performed a GO enrichment analysis using topGO29. Photosynthesis-related GO terms were highly enriched in the leaf meta-cluster (L, C1–4), consistent with leaf tissue function. Genes related to cell proliferation such as microtubule-based movement and translation were enriched in the shoot apex meta-cluster (SA, C5–7), markedly in C5, and GO terms, including catalytic activity, oxidoreductase activity, and DNA-binding transcription factor activity were highly enriched in the black fruit meta-cluster (BF, C8–9), reflecting tissue-specific functions (Fig. 1c). These data confirmed that gene expression is tightly controlled in a tissue-specific manner in S. nigrum.

Tissue-specific functions of differentially expressed transcription factors

To explore the transcriptional regulation that causes differential gene expression profiles in each tissue at the transcriptomic level, we analyzed transcription factors (TFs) in DEGs using the Plant Transcription Factor Database (PlnTFDB)30. A total of 1,323 TFs were identified in the DEG set; 554 (41.9%), 456 (34.5%), and 313 (23.6%) TFs of them were included in the leaf meta-cluster (L), shoot apex meta-cluster (SA), and black fruit meta-cluster (BF), respectively (Fig. 2a). To ascertain whether certain specific types of transcription factors play major roles in specific tissues, we categorized all the TF DEGs based on the protein families of PlnTFDB classification (Supplementary Table S6). EIL (Ethylene-Insensitive 3-Like), C2H2-type Zinc finger, and WRKY types of TFs were highly enriched in L; C2C2-Dof (C2C2-type Zinc finger-DNA binding with one finger), TUB (TUBBY), and SNF2 (Sucrose Non Fermenting 2) types were mostly enriched in SA; HSF (Heat Stress Transcription factor), Trihelix, and LOB (Lateral Organ Boundaries) types were enriched in BF compared with the distribution of total TF DEGs (Fig. 2a). These data suggest that tissue-specific control of certain types of transcription factors induces differential expression patterns in downstream networks.

Figure 2
figure 2

Transcription factor analysis in S. nigrum DEGs. (a) Top-three enriched transcription factor families in each meta-cluster (L, leaf; SA, shoot apex; BF, black fruit). The gene groups containing a minimum of 10 TFs were tested, and three highest enriched groups are presented for each meta-cluster. All the groups showed hypergeometric p value < 0.05. (b) qRT-PCR validation of TF DEGs. Expression levels of each gene were normalized with the value of shoot apex and UBIQUITIN was used as the endogenous control. Data are shown as mean ± standard deviation: n = 3, biological replicates. N.D, not detected.

We then validated the expression profiles of some TF DEGs in planta using qRT-PCR. SnTCP4 and SnHB8 were newly identified S. nigrum genes homologous to the Arabidopsis TCP FAMILY TRANSCRIPTION FACTOR 4 (TCP4) and HOMEOBOX-LEUCINE ZIPPER PROTEIN 8 (HB8), respectively, showing leaf tissue-specific expression enrichment. TCP4 regulates leaf cell proliferation31 and HB8 functions in leaf vascular formation in Arabidopsis32. Consistent with the transcriptome data, these two genes were highly upregulated in leaves compared with other tissues, and SnHB8 showed moderate expression levels in the shoot apex (Fig. 2b). We identified two shoot apex-specific TFs, SnARF4 and SnKNAT2, homologs of Arabidopsis AUXIN RESPONSE FACTOR 4 (ARF4) and HOMEOBOX PROTEIN KNOTTED-1-LIKE 2 (KNAT2), respectively. ARF4 is an auxin signaling component that regulates leaf polarity33,34 and promotes flower initiation35, showing high expression levels in both the leaf and shoot apex. KNAT2, together with KNAT6, plays an important role in meristem activity and maintenance in Arabidopsis36,37. The expression pattern of SnARF4 showed enrichment in the leaf and shoot apex, and SnKNAT2 was specifically expressed in the shoot apex (Fig. 2b). Furthermore, we tested two black fruit-specific TFs, SnAP2 and SnAN2, homologs of Arabidopsis APETALA 2 and tomato ANTHOCYANIN 2 (SlAN2), respectively. AP2 plays a central role in the specification of floral organ identity and development of the floral meristem and seeds38,39, the expression pattern of which was enriched in both meristem and fruit tissues (Fig. 2b). SlAN2 is a key regulator of anthocyanin biosynthesis majorly expressed in the black fruit of S. nigrum, and the tomato fruit turned purple when it was ectopically expressed40, (Fig. 2b). Taken together, these data suggest that the functions of well-known transcription factors identified in model organisms are also probably well conserved in S. nigrum and transcriptional regulation of the transcription factors possibly cause tissue-specific gene expression profiles.

Comparison of S. nigrum with S. lycopersicum

As there are limited genetic or genomic resources for the study of S. nigrum, we performed a comparative analysis with a closely related species. To identify the plant evolutionarily closest to S. nigrum, we constructed a phylogenetic tree with five most representative Solanaceae species. We used the complete chloroplast protein sequences of tomato (Solanum lycopersicum), potato (Solanum tuberosum), eggplant (Solanum melongena), pepper (Capsicum annuum), and tobacco (Nicotiana tabacum) obtained from GenBank and added the chloroplast protein sequences of Arabidopsis thaliana as a reference for the outgroup. We also used two more outgroup controls, Oriza sativa, a monocot and Selaginella moellendorffii, a lycopodiophyta, to confirm the evolution of the tracheophytes (Supplementary Table S7). As shown in Fig. 3a. S. lycopersicum appeared to be the closest relative to S. nigrum. Therefore, S. lycopersicum, an extensively studied domesticated fruit crop, is a good standard for comparative studies of S. nigrum.

Figure 3
figure 3

Evolutionary conservation and variation in S. nigrum and S. lycopersicum. (a) Phylogenetic tree constructed using chloroplast proteins of S. nigrum with Solanaceae, Arabidopsis, rice, and primitive species. Distances in the tree represent percent accepted mutation (PAM) units. Red box, dicot; blue box, monocot; green box, primitive species. (b) Comparison of sympodial shoot structure. Red arrowheads indicate inflorescences. Brackets and numbers represent the number of leaves (L) between inflorescences. ID, indeterminate growth. Scale bars, 5 cm. (c) Comparison of ripe fruits with inflorescence between S. nigrum and S. lycopersicum. Scale bars, 1 cm. (d) Venn diagram showing orthologs between S. nigrum unigenes and S. lycopersicum-expressed genes based on hierarchical orthologous groups analysis. Sn, S. nigrum; Sl, S. lycopersicum.

In aerial organs, S. nigrum and S. lycopersicum showed similar indeterminate growth with different sympodial indices (SPIs); two in S. nigrum and three in S. lycopersicum (cv. M82) (Fig. 3b). Although they showed similar inflorescence structure, the fruit size of S. nigrum is much smaller than that of S. lycopersicum and is comparable to the fruit of S. pimpinellifolium, a wild tomato species41,42 (Fig. 3c). The most conspicuous difference between the fruits is their color upon maturation; S. nigrum was black, whereas S. lycopersicum was red (Fig. 3c). This suggests the accumulation of different metabolites in the fruits of S. lycopersicum and S. nigrum, possibly due to domestication of S. lycopersicum and natural selection in S. nigrum. Accordingly, in spite of evolutionary closeness, S. nigrum and S. lycopersicum show clear differences in morphology, which suggest a significant transcriptomic change between the two species.

To investigate transcriptomic differences, we obtained RNA-seq read data for S. lycopersicum from previous studies43,44. We then determined the hierarchical orthologous groups (orthogroups) using the OMA standalone45 between 47,470 S. nigrum unigenes and 25,477 S. lycopersicum-expressed genes (see “Methods”). A total of 14,871 of S. nigrum genes and 15,316 S. lycopersicum genes were identified as orthogroups, which accounted for 60.1% and 31.3% of their total genes, respectively (Fig. 3d). In spite of evolutionary closeness, more than half (68.7%) of the S. nigrum unigenes were identified as unlikely to be orthologous to any of the expressed genes in S. lycopersicum. This might imply that after divergence, large genomic changes, such as insertion and deletion events, occurred during evolution and domestication, which led to phenotypic variations. Genes included in the orthogroups were further annotated with KEGG Orthology46 for the analysis of metabolic pathways (Supplementary Table S8).

Carotenoid biosynthesis in mature fruit

We determined the carotenoid content in the ripe fruits of S. nigrum and S. lycopersicum by high-performance liquid chromatography (HPLC) analysis. In addition to lycopenes and carotenes, the most abundant carotenoids in tomato, we also detected phytoene, phytofluene, and lutein in S. lycopersicum (Fig. 4a). However, most of the carotenoids tested were not detected in S. nigrum, and only β-carotene and lutein were detected. Interestingly, β-carotene and lutein contents were 2.2-fold and 7.2-fold higher, respectively, in S. nigrum than in S. lycopersicum. These data indicate that enzyme activities of the carotenoid biosynthesis pathway differ between the two species, resulting in a difference in carotenoid content. It is also possible that the expression of carotenoid biosynthetic enzyme genes is mostly repressed in S. nigrum, except for enzymes involved in β-carotene and lutein accumulation.

Figure 4
figure 4

Comparison of metabolites and expression profiles in carotenoid biosynthesis pathway. (a) Carotenoid contents in mature fruits of S. nigrum and S. lycopersicum. Data are shown as mean ± standard deviation: n = 5; five technical replicates, a minimum of 50 fruits were pooled. N.D, not detected. (b) Carotenoid biosynthesis pathway and expression profiles of carotenoid biosynthetic genes. Dotted arrows indicate the condensed pathway. S. nigrum unigenes with asterisk represent genes with highest homology. The expression is normalized by log10(TPM + 1). (c) qRT-PCR results for PDS and CRTISO between the two species. Expression levels of each gene were normalized with the value of leaf and UBIQUITIN was used as the endogenous control. Data are shown as mean ± standard deviation: n = 2, pooled samples and two technical replicates. L, leaf; SA, shoot apex; RF, red fruit; BF, black fruit.

We revised the carotenoid biosynthesis pathway in tomatoes based on the KEGG pathway (sly00906) and data from previous studies47,48, and then tested the expression patterns of 20 genes encoding enzymes in the pathway (Fig. 4b). Of the 20 genes, 15 identified in the orthogroups and an additional five genes which showed the highest homology to S. lycopersicum genes were selected by BLASTP. As it is not feasible to directly compare gene expression between two different species, we also prepared the expression profiles of S. lycopersicum in three different tissues, as we did for S. nigrum. (Supplementary Table S9). The expressions of genes SlGGPS2, SlPSY1, SlPSY2, SlPDS, SlZDS, and SlCRTISO were highly enriched in the red fruit of S. lycopersicum, whereas the expressions of the corresponding orthologs were not specifically enriched in the black fruit of S. nigrum. This suggests that the carotenoid biosynthetic process is relatively more active in S. lycopersicum than in S. nigrum. For example, CRTISO expression is highly enriched in the red fruit of S. lycopersicum, but not in the black fruit of S. nigrum, resulting in high accumulation of lycopene only in S. lycopersicum. Intriguingly, the expression of BETA-CAROTENE HYDROXYLASE 1 (CRTR-B1) was highly enriched in the black fruit of S. nigrum compared with the red fruit of S. lycopersicum, which might have caused the elevated levels of lutein in S. nigrum (Fig. 4a). PDS and CRTISO gene expressions were validated by qRT-PCR in both species, and the results showed that expression enrichment was observed only in the red fruit of S. lycopersicum, consistent with the in silico data (Fig. 4c).

To further investigate the molecular regulation of carotenoid biosynthesis, we investigated expression patterns of orthologous genes of well-known MADS-box ripening regulators in tomato, RIN, FUL1, FUL2, and TAGL1, which are activators of carotenoid biosynthetic genes13. Interestingly, SnRIN, SnFUL2 and SnTAGL1 were as highly enriched in S. nigrum fruit as in tomato and thus were included in the BF cluster (Supplementary Table S10). This finding suggests that there might be an antagonistic regulation controlling activators of carotenoid biosynthetic genes in S. nigrum, possibly through other BF-enriched transcription regulators. Based on gene expression profiles and HPLC results, we proposed a hypothetical model for the molecular regulation of carotenoid biosynthesis in two species (Supplementary Fig. S4).

Anthocyanin biosynthesis in mature fruit

Subsequently, we measured the flavonoid content in the ripe fruits of S. nigrum and S. lycopersicum by HPLC. Three types of delphinidin-derived flavonoids, delphinidin, petunidin, and malvidin, were detected in S. nigrum, but not in S. lycopersicum (Fig. 5a). This result suggests that the black color of the fruit of S. nigrum is mainly due to the accumulation of flavonoid pigments, consistent with a previous report showing anthocyanin accumulation in S. nigrum fruit49.

Figure 5
figure 5

Comparison of metabolites and expression profiles in flavonoid biosynthesis pathway. (a) Flavonoid contents in mature fruits of S. nigrum and S. lycopersicum. Data are shown as mean ± standard deviation: n = 5; five technical replicates, a minimum of 50 fruits were pooled. N.D, not detected. (b) Flavonoid biosynthesis pathway and expression profiles of flavonoid biosynthetic genes. Dotted arrow indicates the condensed pathway. S. nigrum unigenes with asterisk represent genes with highest homology. The expression is normalized by log10(TPM + 1). (c) qRT-PCR results for CHS and UFGT between the two species. Expression levels of each gene were normalized with the value of shoot apex and UBIQUITIN was used as the endogenous control. Data are shown as mean ± standard deviation: n = 2, pooled samples and two technical replicates. L, leaf; SA, shoot apex; RF, red fruit; BF, black fruit.

The flavonoid biosynthesis pathway was redrawn based on the KEGG pathway (sly00941) of tomatoes and information from previous studies50,51, and expression patterns of 14 enzyme genes were examined. Regarding enzyme genes in S. nigrum, 7 out of 14 genes were identified in the orthogroups and an additional seven genes with the highest homology to S. lycopersicum genes were selected (Fig. 5b). Although the expression patterns of three enzyme genes involved in phenylpropanoid biosynthesis were comparable in both species, the expression of flavonoid biosynthetic genes was clearly higher in the black fruit of S. nigrum than in the red fruit of S. lycopersicum. For example, expression of F35H, DFR, and ANS were not considerably enriched in the red fruit of S. lycopersicum, which reflects non-detectable anthocyanin levels in the ripe fruits. On the other hand, high enrichment of the flavonoid biosynthetic gene expression in the black fruit of S. nigrum might have caused the accumulation of flavonoid pigments. We could not detect the other kinds of flavonoids, pelargonidins and cyanidins, possibly due to the low sensitivity of our method; otherwise, these pathways could have been deactivated even in S. nigrum. qRT-PCR validation showed that the expressions of SlCHS and SlUFGT were not enriched in the red fruit of S. lycopersicum, whereas SnCHS and SnUFGT expressions were highly enriched in the black fruit of S. nigrum, consistent with the in silico data (Fig. 5c).

Identification of the key transcription factor for anthocyanin biosynthesis in the fruit of S. nigrum

We noticed that the expression of the SnAN2 gene, an ortholog of SlAN2, was significantly enriched in the black fruit of S. nigrum (Fig. 2b). SlAN2 encodes an R2R3-MYB transcription factor, which is sufficient for anthocyanin accumulation when it is ectopically expressed in tomatoes40. This prompted us to investigate whether AN2 gene expression regulation determines fruit color differences between the two species. RNA-seq results showed that, while SnAN2 gene expression was highly enriched in the black fruit of S. nigrum, SlAN2 expression was not enriched in the red fruit of S. lycopersicum (Fig. 6a). Thus, we hypothesized that SnAN2 plays a major role in anthocyanin biosynthesis in the black fruit of S. nigrum. To verify this, we created SnAN2 knock-out mutants of S. nigrum using the CRISPR-Cas9 system. Based on the RNA-seq data, we obtained a full-length genomic sequence of the SnAN2 gene by PCR and Sanger sequencing, and we designed four single guide RNAs (sgRNAs) targeting the 5’ regions of the gene. Two independent T1 transgenic plants were isolated and genotyped, both of which had a large deletion between targets 3 and 4, and one of them had a 1-base pair insertion in the target 1 region (Fig. 6c). Both mutations resulted in premature stop codons and consequent truncated SnAN2 proteins, the MYB domains of which were fully disrupted, implying possible null mutants (Supplementary Fig. S5). As a result, both mutant plants failed to properly synthesize anthocyanin; thus, mature fruits turned yellow/green in color (Fig. 6b). Expression of anthocyanin biosynthetic genes was tested by qRT-PCR. SnCHS, SnF35H, SnDFR, and SnUFGT expressions were decreased in the two mutant lines compared with the wild-type, whereas SnF3’H expression was not influenced (Fig. 6d). These data suggested that SnAN2 is mainly required for the expression of genes encoding anthocyanin biosynthesis enzymes and transcriptional induction of SnAN2 is essential for anthocyanin production during ripening of fruits in S. nigrum (Fig. 6e).

Figure 6
figure 6

SnAN2 transcription factor activates flavonoid biosynthetic genes. (a) Comparison of SnAN2 and SlAN2 expression. L, leaf; SA, shoot apex; MF, mature fruit. Data are shown as mean ± standard deviation. N.D, not detected. (b) Representative image of wild-type (WT) and two independent snan2-cr mutants for T1 generation. Scale bar, 1 cm. (c) Gene structure of WT and alleles of the snan2-cr mutants. Red and three bold characters, each target (sgRNA) and protospacer adjacent motif (PAM) site. Blue characters, each mutation. (d) qRT-PCR results of downstream genes regulated by SnAN2. Expression levels of each gene were normalized with the value of WT and UBIQUITIN was used as the endogenous control. Data are shown as mean ± standard deviation: n = 4, biological replicates. N.D, not detected. (e) Proposed model for the molecular regulation of flavonoid biosynthesis in the fruits of S. nigrum. Red arrows (left) indicate the positive regulation shown in this study. Red dashed-arrow and blue dashed-line (right) represent predicted positive and negative regulations. TF, transcription factor. C8/9, Cluster 8 and 9 defined in Fig. 1.

In addition to AN2, a number of transcription regulators were characterized in plant anthocyanin biosynthesis pathways52. We found that the expressions of orthologous genes of Production of Anthocyanin Pigment 1 (PAP1), another MYB transcription factor known as an activator for anthocyanin biosynthetic genes in Arabidopsis, and Transparent Testa 8 (TT8), a bHLH transcription activator for anthocyanin biosynthetic genes in tobacco, were enriched in the BF-cluster in S. nigrum. Alternatively, an ortholog of a homeodomain-leucine zipper transcription factor, GLABRA2, a potential repressor for anthocyanin production, was also enriched in the fruit of S. nigrum (Supplementary Table S11). This implies that the orchestrated functions of enriched TFs finely regulate anthocyanin biosynthetic gene expressions in the fruits of S. nigrum (Fig. 6e).

Sugar contents in mature fruit

In addition to the pigment contents, we also measured the levels of carbohydrates, including fructose, glucose, sucrose, maltose, and lactose, which are primary metabolites. In ripe fruits, only the monosaccharides fructose and glucose were detected in both species, and the levels were 3.9- and 4.4-fold higher, respectively, in S. nigrum than in S. lycopersicum (Supplementary Fig. S6). The sugar metabolism pathway of S. nigrum was drawn based on the KEGG pathway (sly00500) of tomatoes and information from previous studies53,54 (Supplementary Fig. S6). A total of 29 sugar metabolic genes of S. nigrum were identified in the orthogroups and 23 best-hit homologs were also found using BLASTP. Some genes encoding Sucrose Synthase (Susy) and some genes encoding cell wall invertases showed high enrichment of expression in the black fruit of S. nigrum and only moderate enrichment in the red fruit of S. lycopersicum (Supplementary Fig. S6). This might have caused the difference in sugar levels between the two species.

Discussion

Many wild crop species are utilized as food sources and in medicinal applications worldwide. Although domestication and molecular breeding of these wild plants are important for improving crop yield and usage, these are not easily achieved due to lack of genetic information. Therefore, the first step would be to obtain genetic resources for the domestication of wild species. S. nigrum has great potential as a medicinal plant and is used in many countries1,2. In this study, we identified 47,470 unigenes in S. nigrum by de novo transcriptome assembly from three tissue samples. In total, 78.4% of the unigenes were functionally annotated and DEGs in the tissue samples were classified by expression dynamics (Fig. 1). These data could be used as valuable genetic information resources for S. nigrum. We also performed a comparative analysis using S. lycopersicum, a widely used domesticated crop. This information might help in the de novo domestication of wild black nightshade species. For example, tomato domestication genes, such as SELF PRUNING (SP), which is important for the development of shoot architecture55, and CLAVATA3 (CLV3), which is a main regulator of tomato fruit size56, were found in the orthologous gene groups of S. nigrum, and the expression regulation of these genes in S. nigrum was similar to that in S. lycopersicum (Supplementary Tables S3, S8, and S9). Using CRISPR-mediated editing of these domestication genes, crop yield and usage of S. nigrum could be enhanced.

Solanum lycopersicum and S. nigrum are mostly similar in terms of shoot architecture. However, one of the notable differences in the aerial organs is the shape of leaves. S. lycopersicum has compound leaves, and S. nigrum has simple leaves (Fig. 3b). Many factors determining the leaf architecture were isolated in the orthologs of unigene sets (Supplementary Table S8). For example, an ortholog of Class I KNOX (KNOXI), knotted1-like homeobox transcription factors, may be required for the initiation of compound leaf development36,57. In addition, an ortholog of LANCEOLATE (LA), the CINCINNATA (CIN)-like TCP transcription factor, may regulate the activity of the leaf marginal blastozone58,59,60, and NO APICAL MERISTEM (NAM)/CUP-SHAPED COTYLEDON (CUC) proteins, which control the organ boundary, may also play a role in leaf development by suppressing auxin signaling between laminar regions61. The difference in leaf structure between S. nigrum and S. lycopersicum may also provide a clue about evolutionary divergence, and further analyses regarding expression patterns of relevant genes and evolutionary conserveness in these species are required.

Comparative transcriptome analysis of S. nigrum and S. lycopersicum was performed using tissue-level expression profiling as the direct comparison of expression levels might be misleading. We used three representative tissues from both species and compared the expression patterns of genes of enzymes involved in metabolic pathways. We systematically defined the gene expression profiles of enzymes involved in carotenoid biosynthesis (Fig. 4), anthocyanin biosynthesis (Fig. 5), and sugar metabolism (Supplementary Fig. S6) in S. nigrum and S. lycopersicum and found key enzyme genes that showed differential expression patterns, which possibly result in phenotype differences. This suggested that comparative analysis using a tissue-level transcriptome assay could successfully signify the phenotypic variations between two different species. However, there might be some limitations, such as missing DEGs and homologs, because of the lack of whole genome information. To compare gene diversification and variations in gene expression regulation more precisely in two closely related species, genomic comparison at the whole-genome level should be performed.

We explored the differences in metabolite contents in ripe fruits of S. nigrum and S. lycopersicum by HPLC analysis and comparative expression profiling of enzyme genes. We found that the fruits of S. nigrum contain higher levels of many metabolites beneficial for human health, such as β-carotene, lutein, and anthocyanin antioxidants, when compared with tomato fruits. Therefore, the fruits of S. nigrum could be utilized as dietary supplements or as edible fruits like tomatoes. To achieve this, toxic compounds, such as α-solanine and α-chaconine, need to be removed. Although it is known that these compounds are not detectable in fully ripe fruit of S. nigrum, some of the maturing fruits can contain them7. Therefore, we briefly investigated the steroidal glycoalkaloid (SGA) biosynthesis pathway in S. nigrum. Based on the KEGG pathway (map01066) and information from a previous study62, a total of 14 SGA biosynthesis genes of S. nigrum identified in the orthogroups and best-hit homologs were examined (Supplementary Fig. S7). Interestingly, the expression patterns of SGA biosynthesis genes were mostly similar in S. nigrum and S. lycopersicum. STEROL ALKALOID GLYCOSYLTRANSFERASE (SGT) family genes, which encode enzymes that produce α-solanine and α-chaconine, were weakly expressed in the fruits of S. nigrum and S. lycopersicum, possibly indicating that the fruits of S. nigrum contain relatively less toxic SGA contents than other black nightshade species7. Detailed analysis of SGA synthesis in S. nigrum is required. For examples, tomato GAME4 (GLYCOALKALOID METABOLISM 4) has been reported to play a key role in the biosynthesis of SGA63. Therefore, the enzyme activity of SnGAME4 could be modified to effectively reduce SGA level. We also noticed that SGA levels decline as fruits mature, and controlling fruit ripening could be another strategy for reducing it. In tomato, a self-pruning (sp) mutant showed determinate shoot growth, and this mutation can be used for identical fruit maturation55,64. This suggested that modifying SnSP gene activity can facilitate the synchronization of fruit maturation and simultaneous ripe fruit harvest. The plant hormone ethylene plays a key role in fruit ripening65, and molecular control of biosynthesis and signaling of ethylene can also facilitate fruit maturation control.

Sugar content was higher in the fruits of S. nigrum than in S. lycopersicum (Supplementary Fig. S6). Lycopersicum Invertase5 (LIN5), a cell wall invertase gene, has been reported to be the key enzyme influencing sugar uptake in tomato fruit. LIN5-RNAi knockdown transformants were characterized by reduced transpirational water loss in mature fruits accompanied by thickened cuticles66. Therefore, upregulated cell wall invertases presumably help in the uptake of more sugars into the S. nigrum fruit than in the fruit of S. lycopersicum. Further analyses are required.

In conclusion, we successfully generated transcriptomic information and data about the unigenes of S. nigrum for extensive molecular studies in the future. Through comparative analysis with tomato, which is one of the best characterized Solanaceae species at the genomic and molecular level, we were able to identify numerous important factors regulating the growth and development of S. nigrum and useful primary and secondary metabolites produced in the fruits of S. nigrum. Further, we tried to edit a gene involved in anthocyanin biosynthesis based on transcriptomic information, through which control anthocyanin accumulation in the fruits was controlled. This implies that we could rapidly domesticate S. nigrum by editing evolutionarily conserved genes related to plant development and production of useful metabolites.

Methods

Permission

No specific permits were required for growing S. nigrum plants at the greenhouse in Wonkwang University, Iksan, Republic of Korea. Transgenic and SnAN2 editing mutants were grown on LMO growth room (LML16-1201) permitted by National Research Safety Headquarters in Republic of Korea. All the methods complied with relevant institutional, national, and international guidelines and legislation for scientific research.

Plant materials and growth conditions

S. nigrum seeds (NIBRGR0000189638) were collected and provided by NIBR, Incheon, Republic of Korea. Plants were grown in a greenhouse under long-day conditions (16 h light, 26–28 °C/8 h dark, 18–20 °C; 40–60% relative humidity) supplemented with artificial light from 200 W halogen lamps at Wonkwang University, Iksan, Republic of Korea. Seeds were directly sown on the soil in 96-cell plastic flats, and seedlings were grown for four weeks on the flats. For harvesting fruits, some of the seedlings were transplanted to pots in the greenhouse. All the plants were grown under drip irrigation and standard fertilizer regimes.

RNA sequencing

Mature leaves that were fourth from the bottom, except for the cotyledon and shoot apices (containing one leaf primordium), of the reproductive stage were harvested 30 days after sowing. A minimum of eight shoot apex samples were pooled. Black fruits were harvested when the fruits were the most mature. A minimum of 50 black fruit samples were pooled. All samples were harvested with three biological replicates between 10 and 11 a.m. The samples were immediately frozen in liquid nitrogen and stored at − 80 °C.

Total RNA of the samples was extracted using the RNeasy® Plant Mini Kit (QIAGEN, Valencia, CA, USA) for leaf and shoot apex, and the Ribospin™ Seed/Fruit Kit (GeneAll Biotechnologies, Republic of Korea) for black fruit, including on-column DNase treatment using the RNase-Free DNase set kit (QIAGEN), according to the manufacturers’ instructions. The extracted total RNA samples were analyzed for concentration and quality using the ND-1000 system (NanoDrop Technologies, Wilmington, DE, USA) and the 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). A total of 1 µg of RNA was used for library construction, with the NEBNext® mRNA Library Prep Master Mix for Illumina® Kit (New England Biolabs, Beverly, MA, USA) for leaf and shoot apex and the TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, CA, USA) for black fruit, according to the manufacturers’ instructions. Libraries of 70–370 bp (mean 160 bp) insert size were constructed and sequenced using the Illumina HiSeq 2500 (leaf and shoot apex) and the NovaSeq 6000 (black fruit) to generate 101-bp paired-end reads.

De novo transcriptome assembly and functional annotation

The raw reads were checked for quality using FastQC v0.11.7 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and preprocessed to remove adaptor sequences and low-quality reads using Trimmomatic v0.3667 with the following parameters: ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10, LEADING:20, TRAILING:20, MINLEN:25, and phred33. To build a suitable set of reference contigs, a total of 586,099,338 clean reads were pooled and assembled using Trinity v2.4.068 with the following parameter: min_contig_length 300. Further clustering was then performed using CD-HIT-EST v4.669 with a 95% similarity parameter to obtain non-redundant transcripts. To identify coding regions within transcripts, the longest open reading frames were predicted using TransDecoder v3.0.1 (https://github.com/TransDecoder). To obtain gene expression profiles, the clean reads were aligned to coding sequences using Bowtie v2.2.670, and the abundance of each transcript was estimated and normalized to transcripts per million (TPM) values using RSEM v1.2.3171. Genes showing less than 0.3 TPM values were removed, and these sequences were defined as S. nigrum unigenes. To validate the expression profiles, correlation analysis was performed using corrplot R package v0.84 (https://github.com/taiyun/corrplot) and the unigenes were assessed using BUSCO v3.1.024 with an embryophyta (version, odb10) lineage dataset (Supplementary Fig. S1 and S2).

To predict the functions of the unigenes, gene functions were annotated using BLASTP v2.9.0 search25 based on Araport11, TrEMBL (Ensembl Plants), and Swiss-Prot (Ensembl Plants) with the following parameters: e-value 1e-10, outfmt 6, num_alignments 1, and max_hsps 1. Gene functions were also annotated with GO and Pfam using InterProScan v5.31–70.072. Moreover, KEGG Orthology was annotated using GHOSTZ search and single-directional best hit (SBH) method with the S. lycopersicum gene set in the KAAS v2.1 web tool (https://www.genome.jp/kegg/kaas/).

DEG and transcription factor analysis

To identify differentially expressed genes (DEGs) among leaves, shoot apices, and black fruits, the expression profiles were filtered using DESeq2 v1.26.028 with the following criteria: log2-fold change ≥ 2, FDR < 0.05, and TPM values ≥ 3. The DEGs were clustered based on a fuzzy c-means algorithm using Mfuzz R package v2.44.073. To decipher the biological functions of each cluster, GO enrichment analysis was performed using topGO R package v2.36.029 with the weight01 algorithm and Fisher's exact test. Enriched GO terms with a p value < 0.01 were selected.

To identify differentially expressed transcription factors (TFs) and determine their roles, plant-specific TFs were used from PlnTFDB (http://plntfdb.bio.uni-potsdam.de/v3.0/) and BLASTP search was performed with the following parameters: e-value 1e-10, outfmt 6, num_alignments 1, and max_hsps 1. The putative TFs were filtered by % identity ≥ 50 and those having Pfam domains.

Ortholog analysis

To identify the plant evolutionarily closest to S. nigrum, ortholog analysis was performed using OMA standalone v2.2.045 with chloroplast protein sequences and a phylogenetic tree was constructed using MEGA X74. To investigate transcriptomic differences, we obtained RNA-seq read data of S. lycopersicum from NCBI Sequence Read Archive: SRP01077543, leaf and red fruit; PRJNA34367744, TM, FM, SIM, and SYM, ftp://ftp.solgenomics.net/transcript_sequences/by_species/Solanum_lycopersicum/libraries/illumina/LippmanZ). Raw reads were preprocessed, and then the S. lycopersicum-expressed genes were defined using the same process utilized in S. nigrum. To identify hierarchical orthologous groups between S. nigrum and S. lycopersicum, OMA standalone was performed with protein sequences of S. nigrum unigenes and S. lycopersicum-expressed genes.

qRT-PCR validation

To determine the reliability of the RNA-seq data, qRT-PCR was performed on the same RNA pools used for RNA-seq. A total of 1 µg of RNA was used for cDNA construction using the ReverTra Ace® -α- Kit (TOYOBO, Osaka, Japan), according to the manufacturers’ instructions. qRT-PCR was performed using the StepOnePlus™ Real-Time PCR System (Thermo Fisher, Waltham, MA, USA) with iQ™ SYBR® Green Supermix (Bio-Rad, Hercules, CA, USA). The PCR reaction conditions were: 95 °C for 3 min, followed by 40 cycles of 95 °C for 15 s, 58 °C for 30 s, and 72 °C for 30 s; melt curve stage: 95 °C for 15 s, 55 °C for 15 s, and then increase up to 95 °C by 1.0 °C. Relative gene expression was calculated based on the 2−∆∆CT method75. The primer sequences used are listed in Supplementary Table S12.

HPLC analysis

To determine the anthocyanin content, anthocyanins were extracted from 0.2 g of finely ground black and red fruits of S. nigrum and S. lycopersicum, respectively. Experiments were performed as previously described with minor modifications76. Briefly, lyophilized samples were extracted with 1 ml of acidic methanol containing 1% HCl (v/v) for 18 h at room temperature (25 ± 2 °C) with moderate shaking. Subsequently, 500 μl of the supernatant was mixed with 500 μl of HPLC-grade H2O and 300 μl of chloroform to remove carotenoids. The water–methanol phase extracts (100 μl) were hydrolyzed. The samples were added to 900 μl of solvent [95:5 (v/v), n-butanol (100%):HCl (36%)], and the mixture was boiled for 2 h to release the core anthocyanidins. Then, the samples were dried in a speed vacuum at room temperature, and the residues were dissolved in 100 μl of 0.1% HCl–methanol solvent. The core anthocyanidins were identified in the supernatant by HPLC analysis using an Agilent 1260 Infinity II system (Agilent technologies, Santa Clara, CA, USA) with a Gemini column (5 µm C18 110A, 120 × 4.6 mm) sourced from Phenomenex (Torrance, CA, USA). All chromatograms were recorded at 520 nm. Pelargonin, delphinidin, cyanidin, petunidin, peonidin-3-O-glucoside (hydrolyzed), and malvidin (Sigma-Aldrich, USA) were used as standards for identification.

To determine the carotenoid content, approximately 0.1 g of frozen pericarp powder from ripe S. nigrum and S. lycopersicum fruits was used for carotenoid extraction, as previously described77. Extracted carotenoids were analyzed using a 1260 Infinity HPLC system (Agilent Technologies, Inc., Santa Clara, CA, USA) equipped with a YMC Carotenoid C30 S-5 column (4.6 × 250 mm). Each carotenoid was identified based on the absorption maxima and spectrum78.

To determine the sugar content, sugars were extracted from 0.5 g of finely ground black and red fruits of S. nigrum and S. lycopersicum, according to the Korean Food Standards Codex method (http://www.foodsafetykorea.go.kr/foodcode). Briefly, lyophilized samples were extracted with 30 ml of ethanol and mixed well using a reciprocating shaker for 15 min at room temperature at 200 rpm. Subsequently, the mixtures were sonicated in a water bath at 80 °C for 25 min. After cooling at room temperature, the mixtures were filtered using 0.2-μm syringe filters. The sugars were identified from the filtered mixtures using HPLC analysis using an Agilent 1260 Infinity II system (Agilent technologies, Santa Clara, CA, USA) with Imtakt Unison UK-Amino column (3 µm, 250 × 3.0 mm). Fructose, glucose, sucrose, maltose, and lactose (Sigma-Aldrich, USA) were used as standards for identification.

CRISPR-Cas9 mutagenesis and plant transformation

CRISPR-Cas9 mutagenesis of S. nigrum was performed as described previously79. Briefly, gRNAs were designed using the CRISPRdirect web tool (https://crispr.dbcls.jp/), and binary vectors were built through golden gate cloning as described80. The final binary plasmids were introduced into S. nigrum cotyledons by Agrobacterium tumefaciens-mediated transformation as described previously81. Transplantation of transgenic plants and genotyping of CRISPR-generated mutations were performed as previously described79. The gRNA and primer sequences used are listed in Supplementary Table S12.