Introduction

Flavonoids are polyphenolic secondary metabolites with C6–C3–C6 carbon backbones that are biosynthesized via the phenylpropanoid pathway (Winkel-Shirley 2001). There are approximately 10,000 known flavonoids in plants; these are classified into different subgroups depending on the organization and modifications of the three-ring structure (Harborne and Williams 2000; Williams and Grayer 2004). The major types of flavonoids in plants are flavonols, flavanones, flavones, anthocyanins, condensed tannins (proanthocyanidins), and so on (Routaboul et al. 2012). Flavonoids have antioxidant, anti-inflammatory, and anti-proliferative properties that can contribute to protecting humans from cancers and cardiovascular diseases (Geleijnse and Hollman 2008; Kale et al. 2008; Kaur et al. 2008; Kang et al. 2009; van Dam et al. 2013; Maggioni et al. 2014). Flavonoids are also known to participate in the regulation of the growth, development, and physiology of Arabidopsis and many other plants. For example, flavonols are involved in the modulation of the polar transport of auxin (Kuhn et al. 2011), flower color (Gronquist et al. 2001), pollen fertility (Mo et al. 1992), ethylene signaling (Lewis et al. 2011), and UV-B protection (Kusano et al. 2011). Thus, understanding and the promise of being able to engineer the synthesis of flavonoids, especially flavonols, is of great biological and medicinal significance.

The biosynthesis pathways of flavonoids in various plant species have been characterized through the isolation and characterization of numerous biosynthetic enzymes (Fig. 1) (Winkel-Shirley 2001; Williams and Grayer 2004; Kuhn et al. 2011; Routaboul et al. 2012; Cheng et al. 2014). In flavonoid biosynthesis, naringenin is hydroxylated to dihydroflavonol by F3H. Dihydroflavonol can subsequently be oxidized to flavonol by FLS (Chua et al. 2008; Owens et al. 2008). Alternatively, dihydroflanonol-4-reductase (DFR) reduces the dihydroflavonol to the leucoanthocyanidin, and then ANS, also known as LDOX/leucoanthocyanidin dioxygenase) produces an anthocyanin from the leucoanthocyanidin (Davies et al. 2003; Turnbull et al. 2004; Wellmann et al. 2006). F3H, FLS, and ANS all belong to the 2ODD oxygenase family, and are widely distributed among various plant species. Plant 2ODD oxygenases utilize molecular oxygen as their co-substrate, and are distinguished by their differing co-factor requirements, including, for example, ascorbate, 2-oxoglutarate, and/or Fe2+ (Chua et al. 2008). These enzymes are involved in hypoxic signaling, DNA repair, and various metabolic pathways, including the biosynthesis of phytohormones, amino acids, signaling molecules, and various secondary metabolites (Prescott and Lloyd 2000; Ivan et al. 2001; Falnes et al. 2002; Trewick et al. 2002).

Fig. 1
figure 1

Sketch map of the flavonoid biosynthetic pathway. CHS chalcone synthase, CHI chalcone isomerase, F3H flavanone-3-hydroxylase, FNS I flavone synthase I, FNS II flavone synthase II, F3′H flavonoid-3′-hydroxylase, FLS flavonol synthase, GT flavonol-3-glucosyltransferase, RT rhamnosyltransferase, DFR dihydroflavonol-4-reductase, ANS anthocyanidin synthase

The first full-length FLS gene was cloned from a petunia petal cDNA library. Its activity was confirmed by antisense expression in petunia, which led to significant reductions of flavonol synthesis in petals (Holton et al. 1993). Subsequent studies have characterized FLS genes from a range of plant species, including Arabidopsis thaliana (Chua et al. 2008; Owens et al. 2008; Preuss et al. 2009; Stracke et al. 2009), Vitis vinifera (Fujita et al. 2006), Camellia sinensis (Lin et al. 2007), and Zea mays (Falcone Ferreyra et al. 2010). The first ANS gene was isolated from maize and a further investigation of its biochemical functions was performed in Perilla frutescens (Menssen et al. 1990; Saito et al. 1999). The crystal structure of ANS was determined and has been widely used as template for building in silico models of other plant 2ODD oxygenases (Welford et al. 2005). Such models are useful for the identification of conserved residues that are required for oxygenase activity (Welford et al. 2001). The F3H gene was first described in Matthiola incana (Forkmann et al. 1980), and subsequently cloned from Petunia hybrid (Britsch et al. 1992), Hordeum vulgare (Meldgaard 1992), Malus (Davies 1993), Medicago sativa (Charrier et al. 1995), Zea mays (Deboo et al. 1995), Arabidopsis thaliana (Pelletier and Shirley 1996), Triticum aestivum (Himi et al. 2011), and soybean (Cheng et al. 2013). Recombinant domain swapping research found that the C-terminal region of F3H contributed to its activity but did not affect its selectivity (Lee et al. 2001; Wellmann et al. 2004). Although the 2ODD oxygenases have been extensively studied in a few plant species, such as Arabidopsis, maize, and grape, the 2ODD genes in tobacco have not been characterized.

In the current study, we identified and characterized the 2ODD genes involved in flavonoid biosynthesis from the allotetraploid N. tabacum and from two diploid tobacco species (N. tomentosiformis and N. sylvestris). Our analysis included the identification of tobacco 2ODD-IFB genes from genomic data, analysis of their structures and conserved motifs, phylogenetic analysis, evaluation of their N-/C-terminal structures, evaluation of their tissue- and developmentally dependent expression patterns, and correlations between 2ODD-IFB gene expression patterns and flavonoid content. Our results provide a foundation for the further investigation and manipulation of tobacco 2ODD-IFB genes, which should lead to a more complete understanding of the roles of these genes in the regulation of plant growth and may enable the intentional alteration of flavonoid composition and/or content according to human requirements.

Materials and methods

Plant materials

The Honghua Dajinyuan cultivar (Hongda, N. tabacum) was used in the expression profile analysis for the 2ODD-IFB genes. Tobacco seeds maintained at our laboratory were germinated and grown in pots under typical conditions (daily 28 °C for 16 h under light, 23 °C for 8 h of darkness) until the flowering stage. Total RNA was extracted from roots, leaves, stems, stem nodes, buds, sepals, stamens, pistils, and axillary buds.

For the stress treatments, tobacco seeds were soaked, sterilized, and then germinated on 1/2 MS medium in darkness. Young seedlings were then grown under typical culture conditions in vermiculite until the six-leaf stage. Phytophthora parasitica var. nicotianae (P. parasitica var. nicotianae) and potato virus Y (PVY) were inoculated onto the sixth leaves of tobacco seedlings for different durations of time (1 h, 2 h, 4 h for P. parasitica var. nicotianae inoculation; 3d, 6d, 9d, 14d for PVY inoculation). For the drought treatment, seedlings were initially transferred to a 1/3 strength Hoagland solution for 1 week prior to use in the experiment, and then cultivated in a solution containing 20% (w/v) PEG6000 for 1d, 2d, 4d, and 8d. Plant materials from the treatments were collected and immediately frozen in liquid nitrogen and stored at − 80 °C prior to RNA extraction.

Phylogenetic and gene structure analyses

The 2ODD-IFB sequences involved in flavonoid biosynthesis in other plant species were collected from NCBI GenBank. The sequences used in this study are listed in Suppl. Table S1.

Multiple alignments of the 2ODD-IFB sequences were performed using Clustal X (version 1.83) and DNAMAN (version 6.0) with default penalties for gap and the protein weight matrix of Blosum (Jeanmougin et al. 1998). MEGA 5.0 was used to construct the phylogenetic tree of the 2ODD-IFB amino acid sequences using the neighbor-joining algorithm with bootstrap method (1000 of bootstrap replications), Poisson model, and complete deletion of gaps/missing data (Tamura et al. 2011).

Each tobacco 2ODD-IFB coding sequence was aligned with its corresponding genomic DNA sequence to identify the exact intron/exon locations. The Multiple Em for Motif Elicitation program (MEME, http://meme.nbcr.net/meme3/mme.html) was used to predict the conserved motifs in the tobacco 2ODD-IFB protein sequences (Bailey et al. 2006). The putative motifs found by MEME were further queried in the InterPro database (http://www.ebi.ac.uk/interpro/;jsessionid=F412A1E32D81ECA4EBDB9A250D55D32E) to explore their possible biological functions (Finn et al. 2017).

cDNA sample preparation and gene transcription analysis

A SuperPure Plantpoly RNA Kit (Gene Answer, Beijing, China) was used to extract total RNA from the plant samples. DNA contamination was removed by digestion with RNase-free DNase I (Gene Answer). Reverse Transcriptase M-MLV (Takara Biomedical Technology, Beijing, China) and random primers were used to synthesize first-strand cDNA using 1 μg of total RNA as a template. The cDNA concentrations were evaluated with a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA) and then diluted to a 100 ng/µl concentration.

The gene-specific primers used in the PCR experiment are listed in Suppl. Table S2. RT-PCR was performed using Takara Taq polymerase in a Biometra thermal cycler. The PCR program was as follows: 95 °C for 5 min, 25–29 cycles of 94 °C for 30 s, 55 °C or 60 °C for 30 s, and 72 °C for 30 s. qPCR amplification reactions were performed using a LightCycler® 96 SW 1.1 cycler (Roche, Laval, QC, Canada) and an SYBR Green kit (Roche). The PCR program was as follows: 95 °C for 5 min, 40 cycles of 94 °C for 30 s, 60 °C for 30 s, signal acquisition, and then a final melting curve of 65–95 °C. The expression levels of the target genes were standardized to the expression level of the NtGAPDH gene using the 2− △△Ct method. Three independent biological replicates were performed for each gene.

Estimation of K a/K s ratios and protein structure modeling

To estimate Ka (nonsynonymous substitutions per site), Ks (synonymous substitutions per site) and their ratios, the amino acid sequences of each gene family were firstly analyzed by the MEME program. The motif amino acid sequences found by the program were aligned and subsequently transferred to the original cDNA sequences. Both Ka and Ks values were then estimated from their aligned cDNA sequences using the yn00 program of the PAML4b package.

The homology models of the tobacco 2ODD-IFB proteins were generated with tools available via the SWISS-MODEL website (http://swissmodel.expasy.org/interactive), as described in a previous a paper (Guex et al. 2009). Each target protein sequence was aligned with those of model proteins to select the most suitable template. Based on various factors (insertions/deletions, ligands, oligomeric state) for which information was available on the SWISS-MODEL website, the crystal structure of Arabidopsis LDOX (2brt.1A) was chosen as the template from which to build models for the tobacco 2ODD-IFB proteins.

Flavonoid staining

Tobacco seedlings (Hongda) grown in vermiculite medium were stained for 3 h in a solution containing saturated (0.25%, w/v) diphenylborinic anhydride (DPBA) and 0.2% (v/v) Triton X-100. The stained roots were visualized with an epifluorescence microscope (Olympus, Tokyo, Japan) equipped with an FITC filter (excitation 450–490 nm, suppression LP 515 nm).

Virus-induced gene silencing (VIGS) of 2ODD-IFB in tobacco

For the VIGS vectors construction, fragments with about 500 bp length of tobacco 2ODD-IFB genes were selected and amplified from cDNA samples using gene-specific primers shown in Suppl. Table S2. The fragments were digested with BamH I and Kpn I enzymes, and subsequently cloned into the pTRV2 vector. After sequencing, correct pTRV2-FLS, pTRV2-ANS, and pTRV2-F3H vectors were transformed into A. tumefaciens strain GV3101 by heat shock methods. The Agrobacterium suspensions were incubated at room temperature for at least 3 h without shaking, and subsequently injected into the leaves of 4-week-old N. benthamiana plants using a needleless syringe. The Agrobacterium containing pTRV2, pTRV2-PDS vectors were set as negative and positive control, respectively. The positive plants were identified by RT-qPCR 2 weeks after injection.

Analysis of flavonoid contents in various tobacco tissues

The powder prepared from about 150 mg of fresh tobacco tissue samples was mixed with 1.5 ml 80% ethanol (containing 0.012 g/L vitexin as internal standard) and then extracted for 30 min with ultrasonication. The mix was centrifuged at 12,000g for 10 min at room temperature. The supernatant was then filtered through a 0.22 μm membrane before being analyzed for flavonoid contents by HPLC–MS. The analytical method used was as follows: chromatographic column, ACQUITY UPLC®BEH Phenyl (1.7 μm 2.1 × 150 mm, Waters); gas temperature, 350 °C; injection volume, 1 μL; flow, 0.3 mL/min; wave length, 230, 260, 360, and 570 nm; mobile phase, A 100% water, B 100% acetonitrile; gradient elution, 8 min 85% A + 15% B, 5 min 58% A + 42% B, 0.01 min 100% B, 3 min 95% A + 5% B, 18 min stop.

Results

Identification and sequence analysis of tobacco 2ODD-IFB genes

We used the terms ‘flavonol synthase’, ‘anthocyanidin synthase’, and ‘flavanone 3-hydroxylase’ as query key words for searching the N. tabacum genome in the China tobacco genome database v4.0 (data not shown) and obtained 7, 8, and 7 hits for these terms, respectively. We then performed several blastp searches in the database using the amino acid sequences of Arabidopsis FLS, ANS, and F3H proteins as the query sequences. We aligned the nucleotide and amino acid sequences of each putative tobacco 2ODD-IFB homolog with those of the Arabidopsis FLS, ANS, and F3H genes. We excluded the tobacco genes that had low similarity and identity with Arabidopsis homologs and thus identified two FLS, two ANS, and four F3H genes in the N. tabacum genome (Table 1). Similarly, we identified and characterized one FLS, one ANS, and one F3H gene in the N. tomentosiformis genome as well as one FLS, one ANS, and one F3H gene in the N. sylvestris genome (Table 1). All the 2ODD sequences identified in our database were further validated in the public tobacco genome databases (Sierro et al. 2013, 2014; Edwards et al. 2017), and then submitted to the GenBank database. The sequence information and GenBank accession numbers are shown in Suppl. Table S3.

Table 1 Characteristics of tobacco 2ODD genes involved in flavonoids biosynthesis

In brief, we identified four FLS genes, four ANS genes, and six F3H genes from the three tobacco species. The identities of the four FLS amino acid sequences were between 66.04 and 94.51%. However, the identities between the sequences of the four FLS and four ANS ranged from 24.05 to 32.14%, and those of the four FLS and the six F3H ranged from 17.19 to 25.4% (Table 2). The four ANS homologs from the three tobacco species shared identities between 63.01 and 100% and the six F3H homologs shared identities between 96.21 and 100%. The identities between ANS and F3H were very low (15.18–24.82%). Similarly, alignments between the coding sequences of the 2ODD-IFB genes showed consistent results with those for the amino acid sequences (Table 2). It is worth noting that the NtF3H genes Ntab 0444890, Ntab 0908200, and Ntab 0908150 in the N. tabacum genome showed 100% identities in the amino acid sequences and CDS sequences, but they were found to be located either on different scaffolds or on different positions of the same scaffold (Table 1). We thus considered these as three independent NtF3H genes,

Table 2 Identity matrix of tobacco 2ODD-IFB amino acid sequences and their coding sequences

which might be generated by duplication after the formation of tetraploid tobacco.

Exon/intron organization of the 2ODD-IFB genes

Characterization of the exon/intron structure of genes can be useful for the evaluation of the evolution of a gene family. As shown in Fig. 2a, all of the tobacco FLS genes contained two introns. Most of the known FLS genes from other eudicot plant species also contained two introns (Suppl. Fig. S1). Among all of the FLS genes examined in this study, the length of exon one changed to a greater extent (ranging from 380 to 518 bp) than did exon two (from 327 to 333 bp) or exon three (from 211 to 229 bp). It is interesting that the FLS intron one length changed in a manner consistent with the divergence time between certain plant species and Arabidopsis (Fig. 3). For instance, previous studies have shown that the divergence of the Solanaceae and Arabidopsis occurred at about 112–156 Mya, a time earlier than the divergence of grape (Vitis vinifera, 101–112 Mya), cotton (Gossypium arboretum, 83–86 Mya), and Brassica napus (20–24 Mya) (Bowers et al. 2003; Moore et al. 2010). The FLS intron one in Solanaceae plants varies in length from 1.78 to 4.12 kb, lengths that are all much larger than those of intron one in grape (0.64 kb), cotton (0.31 kb), Brassica napus (0.25 kb), and Arabidopsis (0.21 kb). Although Brachypodium, maize, rice, and sorghum diverged from Arabidopsis even earlier (about 200 Mya), the FLS genes in these plants contained no or only very short introns at this position in the gene’s structure (0.07 kb in OsFLS, and 0.09 kb in SbFLS2). We aligned the nucleotide sequences of FLS intron one from various plant species, including tobacco, potato, tomato, grape, and Arabidopsis. As shown in Suppl. Fig. S2, there were three conserved fragments among the different FLS intron one sequences. Fragment one shown in black frame was conserved among all the sequences examined, while fragment two shown in a red frame was conserved in the Solanaceae FLS genes. The conserved fragment three in purple frame only showed up in tobacco FLS genes. Similarly, intron two of the Solanaceae FLS genes was larger than those in other plant species, especially in the StFLS and SolyFLS genes (Fig. 3). Alignment among the intron two sequences from various plant species revealed two conserved fragments (Suppl. Fig. S3). The conserved fragment one existed in all the species examined, while the fragment two only occurred in the StFLS and SolyFLS genes.

Fig. 2
figure 2

Exon/intron structural organization of tobacco FLSa, ANSb and F3Hc genes. Exons are denoted in black boxes, and the connecting lines represent introns. Numbers in boxes or above the lines represent the sizes (in bp) of the corresponding exons or introns, respectively. The 5′ and 3′ untranslated regions (UTRs) are represented by blank boxes

Fig. 3
figure 3

Intron length and Ka/Ks values of 2ODD-IFB genes. a Intron length of the FLS genes from various plant species. The arrow below shows the predicted divergence time between the indicated plant species and Arabidopsis. b Values of Ka, Ks, and Ka/Ks in monocot and eudicot plants. c Values of Ka/Ks in Nicotiana and other eudicot plants. The monocot and eudicot plant species are as shown in Supplementary Table S6

We also analyzed the structures of the ANS and F3H genes from various plant species. As shown in Figs. 2a, b and S4, all the ANS genes from eudicot plants examined, including tobacco, potato, tomato, Arabidopsis, Brassica napus, soybean, Medicago truncatula, and grape, contained one intron and two exons. There was no intron present in the ANS genes of monocots including rice, maize, sorghum, or Setaria italica. The exon 1 size of the eudicot ANS genes varied from 497 to 533 bp, while those of exon two varied from 562 to 860 bp in length, suggesting a higher degree of conservation of exon one during the evolution of the ANS gene. All the F3H genes examined in the present study contained two introns, though the intron size varied among the different plant species (Suppl. Fig. S4). Moreover, the sizes of the three exons were highly conserved, especially exon two, which was 431 bp in the tobacco F3H genes, and 429 bp in the other F3H genes. This indicated a relatively evolutionarily stable gene structure for F3H.

Conserved residues in tobacco 2ODD-IFB enzymes

To evaluate if the genes identified in tobacco were likely to encode functional 2ODD-IFB enzymes, we examined their conserved catalytic residues and motifs (Cheng et al. 2014). As shown in Fig. 4, there were five specific residues (in a red frame) that were highly conserved across all of the tobacco 2ODD-IFB proteins, including two histidines (His224, His280; numbers refer to the Ntab 0125670 sequence), one aspartic acid (Asp226), one arginine (Arg290), and one serine (Ser292). Four other residues, Gly70, His77, Gly240, and Pro210 (shown in a green box) that are likely involved in ensuring the correct folding of 2ODD proteins, were also found to be conserved in the selected tobacco 2ODD-IFB proteins. We searched all of the putative motifs of the tobacco 2ODD-IFB proteins on the MEME website and found a total of four conserved motifs (Fig. 4, Suppl. Table S4). These motifs were all either annotated as possessing oxidoreductase activity or as being involved in anthocyanidin biosynthesis. Taken together, all of the tobacco 2ODD-IFB identified from the database contained the conserved residues and motifs that are considered characteristic of 2ODD proteins.

Fig. 4
figure 4

Alignment of the predicted amino acid sequences of the tobacco 2ODD-IFB proteins. The red boxes indicate conserved catalytic residues in these 2ODD-IFB sequences, while the green boxes indicate an additional four conserved residues that are probably required to ensure correct folding. The black boxes indicate the location of the putative motifs shown in Table S2

To identify the most likely enzymatic activity-relevant residues in the termini of the tobacco 2ODD-IFB proteins, we constructed homology models based on the crystal structure of AtLDOX (2brt.1.A) as a template. The tobacco NtFLS (Ntab 0125670), NtANS (Ntab 0694550), and NtF3H (Ntab 0333640) proteins exhibited 43.38%, 74.71%, and 30.40% amino acid identity with AtLDOX, respectively. The largest apparent differences among the three protein structures were near the N terminus (Fig. 5a). There were at least two α-helices at the N terminus of all three proteins. There was one more four-residue α-helix at the N terminus of the NtANS model, while the NtFLS model contained half of an α-helix at a similar position. This helix was not present in the NtF3H model. We also used the homology models to identify the conserved residues that might be required for the formation of the N-terminal helices. As shown in Fig. 5b, residues highlighted in gray were supposed to constitute the helices at the N terminus of the tobacco 2ODD-IFB proteins. The residues presented in red did not differ from those of the template and might be the core position for N-terminal helices.

Fig. 5
figure 5

Structural analysis of the tobacco 2ODD-IFB proteins. a Homology models of NtFLS (Ntab 0125670), NtANS (Ntab 0694550), and NtF3H (Ntab 0333640) generated based on the crystal structure of AtANS (At4g22880). The arrows indicate the different helix motifs present in the N terminus of the tobacco 2ODD-IFB proteins. b Alignment of the 2ODD-IFB N-terminal sequences. The regions highlighted in gray are predicted to form the helix motifs at the N terminus. The residues shown in red are strictly conserved between the tobacco sequences and the AtANS sequence

Phylogenetic analysis of tobacco 2ODD-IFB genes

To better understand the evolutionary relationships among the 2ODD-IFB genes of tobacco and those of other plant species, we aligned 75 2ODD-IFB amino acid sequences from 26 species with the ClustalX program and then constructed an unrooted tree with MEGA 5 using the neighbor-joining method. As shown in Fig. 6, plant 2ODD-IFB genes could be generally divided into two major groups. One group contained two branches, namely an FLS group and an ANS group, while the other group was divided into an F3H group and an FNS I group. The FLS group could further be divided into three subgroups, which we named as the FLS monocot subgroup, the FLS eudicot I subgroup, and the FLS eudicot II subgroup. There were 24 genes from 11 plant species in the FLS eudicot I subgroup, but only three genes (GbFLS, VvFLS5, and CmFLS) present in the FLS eudicot II subgroup. Moreover, in both the ANS group and the F3H group, the monocot genes were apparently separate from the eudicot genes. Interestingly, we found that PhANS (BAA89008), BnFLS3 (CDX91472), and BnANS (CDX68770) were grouped into the FNS I group, which was confined to rice and Apiaceae species in previous studies (Britsch et al. 1981; Prescott et al. 2002). This might indicate a more broad existence of FNS I genes in plants.

Fig. 6
figure 6

Phylogenetic tree of plant 2ODD-IFB homologs. Unrooted phylogenetic tree of plant 2ODD-IFB proteins constructed using the neighbor-joining method with MEGA 5.0 program. Isozymes and corresponding plant species are shown in Table S7

Since the distinct evolution of plant 2ODD-IFB was observed in the phylogenetic tree, we investigated whether they were under different selection forces. We used the domain sequences to calculate their Ka, Ks and Ka/Ks ratio. As shown in Fig. 3b, there was no significant difference between eudicot and monocot plants in the average of Ka/Ks value for F3H family. For the FLS family, the average Ka/Ks of eudicot plants was 0.42, significantly higher than that in monocot plants (0.29). However, the average Ka/Ks ratio of ANS family was much higher in monocot plants (0.54) than that in eudicot plants (0.1). These data suggested that the FLS family evolved faster in eudicot plants than in monocot plants, whereas the ANS family evolved faster in monocot plants. We then compared Ka and Ks values separately to further analyze the reason why FLS and ANS families exhibited different evolutionary rates in eudicot and monocot plants. For the FLS family, the Ka value in eudicots is similar to the value in monocots, but the Ks value in monocots is almost twice the value in eudicots. The Ka value for ANS family in monocots is eight times the value in eudicots, while the Ks value in monocots is only twice the value in eudicots. Therefore, the higher Ka/Ks value of FLS family in eudicots is caused by lower Ks value, but the higher Ka/Ks value of ANS family in monocots is caused by higher Ka value. We also found that the Ka/Ks values of FLS and ANS families in Nicotiana were significantly higher than those in other eudicot plants (Fig. 3c), suggesting a faster evolution for these two genes in Nicotiana.

Correlation between 2ODD-IFB expression levels and flavonoids accumulation patterns

To better understand the possible functions of the 2ODD-IFB genes in tobacco, we initially examined their tissue-specific expression patterns in roots, leaves, stems, stem nodes, axillary buds, buds, sepals, stamens (before and after pollination), and pistils (before and after pollination). As shown in Fig. 7a, the expression of NtFLS (Ntab 0125670 and Ntab 0215410) was detected in leaves, buds, sepals, stamens, and pistils. The NtANS gene (Ntab 0256350 and Ntab 0694550) was mainly expressed in buds and sepals, while expression of NtF3H (Ntab 0333640 and Ntab 0444890) was detected in flower tissues, including buds, sepals, stamens, and pistils (Fig. 7a). Interestingly, the expression level of NtFLS changed markedly in pistils before and after pollination. The expression level of NtFLS could hardly be detected in the pistils before pollination, but increased substantially after pollination. We then measured the content of kaempferol, kaempferol 3-glucoside (K-3G), quercetin, and quercetin 3-glucoside (Q-3G), four products of the FLS enzyme, in the stamens and pistils. As shown in Fig. 7b, pollination could lead to a strong decrease in the kaempferol and K-3G content in the stamens, but led to markedly increased kaempferol and K-3G content in the pistils. Moreover, the quercetin and Q-3G content in the stamens increased significantly after pollination, while the content of Q-3G was reduced in the pistils after pollination. Pollination did not alter the content of cyanidin chloride or delphinidin 3-O-β-d-glucoside, two important products of NtANS, in the stamens or pistils. Thus, the variation in 2ODD-IFB gene expression in the stamens and pistils appears to be related to corresponding changes in flavonol content, which might further affect the development of stamens and pistils.

Fig. 7
figure 7

Expression levels of the 2ODD-IFB genes and flavonoids content in different organs of N. tabacum. a RT-PCR analysis of tobacco 2ODD-IFB genes. RNA samples were collected from roots (R), leaves (L), stems (S), stem nodes (SN), axillary buds (AB), buds (B), sepals (Se), stamen before pollination (SBP), stamen after pollination (SAP), pistils before pollination (PBP), and pistils after pollination (PAP). The PCR amplifications for each gene were performed with the same program, and the gels were run under the same experimental conditions. The full-length gels are shown as Suppl. Fig. S7. b Contents of seven flavonoids in tobacco flowers before and after pollination. BP, before pollination; AP, after pollination. c DPBA staining of flavonoids in root tips. Left, stained with water; right, stained with DPBA. d Contents of 12 flavonoids in different organs of N. tabacum

We observed the distribution of flavonoids in tobacco roots via DPBA staining. As shown in Fig. 7c, tobacco root tips were stained to deep yellow, indicating an abundance of naringenin chalcone or quercetin in these regions. In contrast, there was weak yellow staining in the elongation zone (not shown), indicating an asymmetrical distribution of flavonoids in tobacco roots. We analyzed the accumulation of nine flavonoids in various tobacco tissues (Fig. 7d) and found that almost all of the flavonoids examined could be detected in all of the tissues, a finding that was not consistent with the expression patterns of the corresponding 2ODD-IFB genes. For instance, the transcription of 2ODD-IFB genes could hardly be detected in tobacco roots, but the accumulation of various flavonoids was found in roots via both DPBA staining and HPLC–UV analysis. The accumulation levels of most flavonoids we assessed were quite low (less than 5 μg/g fresh weight), which might be caused by the fact that most substances we examined were the intermediate products during flavonoids metabolism.

Quantitative real-time RT-PCR was performed to further evaluate the relative expression levels of the 2ODD-IFB genes in tobacco leaves at different developmental stages (Fig. 8). Similar to the results of the semi-quantitative RT-PCR analysis, the expression of NtANS was hardly detected in all the tissues. NtF3H was expressed in leaves at all of the stages, but at varying levels (Fig. 8a). The NtFLS gene was apparently expressed in all of the leaf stages examined. The expression level of NtFLS appeared as a ‘U’ model across the whole developmental period (Fig. 8b), suggesting positive roles for this gene in both young and senescent leaves. It is worth pointing out that topping, an important agronomical practice in tobacco production, strongly induced the transcription of the NtFLS and NtF3H genes. We analyzed the flavonoid content in tobacco leaves at various developmental stages using HPLC–UV, and found that all of the leaves examined contained high levels of rutin (quercetin rutinoside) (Fig. 8c). The variation of the rutin content in leaves from different developmental stages also showed the aforementioned ‘U’ pattern (Fig. 8d), which was the same with the expression pattern of the NtFLS gene.

Fig. 8
figure 8

Transcription levels of 2ODD-IFB genes and flavonoids contents in N. tabacum leaves at various developmental stages. a Relative expression levels of 2ODD-IFB genes in N. tabacum leaves. b The ‘U’ model expression pattern of the Ntab 0125670 gene during the course of development. The expression levels of the 2ODD-IFB genes were determined relative to the internal control gene, NtGAPDH. Values are mean ± SD of three independent biological replicates. c Flavonoid contents in N. tabacum leaves at various developmental stages. d The variation curve of rutin content in N. tabacum leaves at various developmental stages. Values are mean ± SD of three independent biological replicates

The tobacco 2ODD-IFB gene was silenced via the VIGS method to explore the relationship between flavonoids contents and 2ODD-IFB gene expression levels. As shown in Fig. 9a, b, the expression levels of the target genes were obviously down-regulated, while the phenotype of the VIGS plants was the same as that of the wild-type plants. The contents of anthocyanidins including catechin, delphinidin, and procyanidine in FLS–VIGS leaves were much more than those in wild-type leaves, but the flavonol contents including quercetin, quercetin-3-glucose, kaempferol, kaempferol-3-glucose in FLS–VIGS leaves were less than those in WT leaves (Fig. 9c). On the contrary, compared to those in WT leaves, the anthocyanidins contents in the ANS–VIGS leaves decreased, but the flavonols contents significantly increased. The contents of flavonols and anthocyanidins both decreased in F3H–VIGS leaves. These data suggested that the contents of flavonoids in tobacco leaves were consistent with the expression levels of 2ODD-IFB genes.

Fig. 9
figure 9

VIGS of 2ODD-IFB genes altered the flavonoids composition in tobacco leaves. a Phenotype of the VIGS tobacco plants. b Expression levels of tobacco 2ODD-IFB genes in VIGS plants. c The flavonoids contents in VIGS leaves. Values are mean ± SD of three independent biological replicates

Differential expression patterns of tobacco 2ODD-IFB genes under drought and virus inoculation treatments

To examine any possible roles for the 2ODD-IFB genes in plant stress resistance, we measured their expression levels under drought and virus inoculation treatments by both qPCR (Fig. 10). Generally, the expression levels of the tobacco 2ODD-IFB genes were down-regulated, to varying extents, in leaves inoculated with Phytophthora parasitica var. nicotianae (P. parasitica var. nicotianae) for 1 h, 2 h, and 4 h (Fig. 10a). The variation trends of the expression levels of NtFLS, NtANS, and NtF3H genes were similar to each other in leaves inoculated with potato virus Y (PVY). As shown in Fig. 10b, compared with the expression levels in control leaves, the expression levels of the tobacco 2ODD-IFB genes were almost unchanged in leaves inoculated for 3 days (except a 3.5-fold increase in expression for the Ntab 0256350 gene), obviously suppressed in 6d leaves, then significantly up-regulated in 9d leaves, and finally decreased to various levels in 14-d-old leaves. When suffering drought stress, the expression patterns of the tobacco 2ODD-IFB genes were divided into two types (Fig. 10c). The expression levels of Ntab 0215410 (NtFLS) and Ntab 0256350 (NtANS) significantly increased along with the extension of the treatment time, but those of the other 2ODD-IFB genes were affected to a lesser extent throughout the treatment. Taken together, the expression levels of tobacco 2ODD-IFB genes under biotic stresses (P. parasitica var. nicotianae or PVY) fluctuated drastically along with the treatment time, while those under drought treatment showed apparent regularity, suggesting a more complex regulation network of tobacco 2ODD-IFB genes under biotic stresses relative to the regulation network for drought.

Fig. 10
figure 10

Relative expression levels of the 2ODD-IFB genes in response to P. parasitica var. nicotianae inoculation a, potato virus Y (PVY) inoculation b, and drought treatment c. Values are mean ± SD of three independent biological replicates

Discussion

The 2ODD enzymes constitute the second largest protein superfamily in plants and participate in various oxygenation/hydroxylation reactions. Here, we identified four FLS genes, four ANS genes, and six F3H genes from the N. tabacum, N. sylvestris, and N. tomentosiformis genomes. We further analyzed their conserved residues, gene structures, evolutionary relationships, and expression patterns in various tobacco plant materials. This work therefore provides an important step toward a comprehensive understanding of the phylogenesis and putative functions of these enzymes in various growth and developmental processes in tobacco.

Evolution and diversity of tobacco 2ODD-IFB genes

Synthesizing the structural information we have learned from tobacco and other plant species, we proposed the events occurred in the structures of 2ODD-IFB genes during evolution (Fig. 11). Initially, the ancestral gene of 2ODD-IFB contained three exons and two introns. Before the split of gymnosperms and angiosperms, this ancestral gene gave rise to the three progenitors of FLS, ANS, and F3H genes via mutations or duplication. After the separation of the two kinds of plants, the angiosperm ANS genes lost their second intron, which led to the fusion of exon two and exon three (Suppl. Fig. S4). Subsequently, after the divergence of monocots and eudicots, the monocot ANS genes further lost their first intron, which resulted in no intron being present in these genes. Similarly, the first intron was also lost during the evolution of monocot FLS genes after the separation of monocots and eudicots (Suppl. Fig. S1). By contrast, the gene structures of F3H were quite stable among various plant species examined, suggesting a relatively small pressure on this gene during evolution.

Fig. 11
figure 11

Gene structure evolution of plant 2ODD-IFB genes

Besides the inconsistent numbers of introns and exons in plant 2ODD-IFB genes, we also found that the intron length of FLS genes varied greatly among various plant species (Fig. 3). Based on the alignment of different plant intron sequences, it is easy to come to the speculation that there was a fragment insertion (1st insertion) in Solanaceae FLS intron one position after its divergence from other plant species, but before the differentiation within the Solanaceae. Moreover, another fragment insertion (2nd insertion) occurred in the tobacco FLS intron one, which gave rise to the largest intron one examined so far. We also aligned the FLS intron two sequences from different plant species, and found a fragment insertion (3rd insertion) occurred in tomato and potato FLS genes (Suppl. Fig. S3). It has been hypothesized that the intron size is constrained by energy use in transcription, since it might require more energy to transcribe and splice larger introns (Castillo-Davis et al. 2002). So selection usually acts against introns with excessive size, but meanwhile there also exists a selective preference for larger introns in certain genes, owing to the possession of more regulatory elements and better control of gene transcription (Marais et al. 2005). The insertions of transposable elements (TE) in introns could result in significant genetic and phenotypic changes via altering the spatial and temporal expression patterns of specific genes, which could also be preserved under natural selection (Lempe et al. 2005).

We have made several blast searches of the inserted sequences in FLS introns in the NCBI database, but failed to get highly similar sequences. There were many predicted TEs presented in the inserted fragments (Suppl. Table S5–7), indicating a more complex regulation system for the FLS genes harboring larger introns. The expression patterns of FLS genes vary among different plant species. For instance, the highest transcript levels of the two FLS genes in maize were found in young leaves (Falcone Ferreyra et al. 2012), but in G. biloba the highest GbFLS mRNA level was observed in the mature leaves and significantly higher than in the young leaves (Xu et al. 2012). In Arabidopsis, the highest transcript levels of AtFLS gene were detected in the reproductive tissues, including developing inflorescence, buds, flowers, and siliques (Owens et al. 2008). We found the tobacco NtFLS gene was mainly expressed in flowers and leaves. The various intron lengths might contribute to the distinct expression patterns for FLS genes in different plant species, which needed to be further confirmed.

The expansion of the 2ODD family occurred by large-scale duplication after the split from the common ancestor of land plants, which gave rise to three classes (DOXA, DOXB, and DOXC) based on the amino acid sequence similarity. All the 2ODD-IFB belong to the DOXC class, and the evolution of FLS and ANS is likely to have occurred after the F3H emergence during seed plant evolution (Kawai et al. 2014). The 2ODD-IFB genes (GbFLS, GbANS, and GnF3H) of Ginkgo biloba dispersed in the FLS, ANS, and F3H group, respectively, instead of clustering together. This implied that the 2ODD-IFB genes might split apart from each other before the divergence of gymnosperms and angiosperms, which was consistent with the previous speculation in this text conducted by the analysis of gene structures. Moreover, we found that the FLS group could be further divided into three subgroups, among which one subgroup contained FLS genes from monocots, while genes in the other two subgroups were both from eudicots (Fig. 6). This suggests that duplication might have happened to the ancestral gene and given rise to the two progenitors of FLS genes before the divergence of monocots and eudicots. Subsequently, the two progenitors underwent independent evolution, and one progenitor generated the FLS eudicot I subgroup and FLS monocot subgroup, while the other one generated the FLS eudicot II subgroup. If so, there should be an FLS monocot II subgroup, the possibility of which we could not rule out, as there have been relatively fewer FLS genes reported in the monocot plants so far.

Consistent 2ODD-IFB gene expression levels and flavonoids content in tobacco

In the present study, we measured the transcription levels of tobacco 2ODD-IFB genes in different tissues from flowering stage and found that most tobacco 2ODD-IFB genes were exclusively expressed in the reproductive tissues, including buds, sepals, stamens, and pistils. However, the deposition pattern of flavonoids among various tobacco tissues seemed not consistent with the expression patterns of 2ODD-IFB genes. For instance, we found flavonoids accumulation in tobacco roots by both DPBA staining and HPLC–UV analysis (Fig. 7c, d), while at the same time we did not detect obvious transcripts of tobacco 2ODD-IFB genes in this tissue (Fig. 7). This indicated the possibility that the flavonoids deposited in roots might be synthesized in other plant tissues, such as leaves, and then transported into the roots.

It is worth noting that in the tobacco tissues, where flavonoids were biosynthesized, the accumulation patterns of flavonoids were consistent with the expression patterns of 2ODD-IFB genes. For instance, pollination did not change the expression level of NtANS gene in the pistils, but significantly raised the transcription level of NtFLS gene (Fig. 7a). This led to the significant increase of flavonol content (especially, kaempferol) in pistils after pollination (Fig. 7b), which meanwhile also indicated a positive role of flavonol in promoting pollen germination or pollen tube elongation. In tobacco leaves, both the expression pattern of NtFLS gene and the accumulation pattern of rutin turned to be a ‘U’ model during leaf development (Fig. 8b, 9d). Moreover, when NtFLS was silenced in tobacco leaves by VIGS, the biosynthesis of flavonoids flowed to the anthocyanidin pathway (Fig. 9c). More flavonols accumulated in the tobacco leaves where the NtANS gene was silenced. Taken together, it could be concluded that the expression patterns of tobacco 2ODD-IFB genes did not determine the deposition patterns of flavonoids among various tobacco tissues, but strongly affected the levels of flavonoids contents in the tissues, where they were biosynthesized.

Distinct expression files of tobacco 2ODD-IFB genes under stresses

Flavonoids have been shown to participate in plant resistance to various biotic and abiotic stresses (Treutter 2005). In the present study, we detected the expression patterns of tobacco 2ODD-IFB genes under various stresses, and found two interesting facts. Firstly, the expression patterns of these genes varied under different biotic stresses. These distinct expression patterns suggested different roles played by 2ODD-IFB genes or flavonoids compounds in plant resistance to various biotic stresses. Secondly, we found that the expression patterns of up-regulated 2ODD-IFB genes were different among the plants with biotic and abiotic treatment, respectively. The up-regulation of 2ODD-IFB genes was transient under biotic stresses, while the up-regulation of these genes lasted longer under abiotic stress. We were inclined to believe that the regulation of the expression patterns of tobacco 2ODD-IFB genes might be in different ways under the two kinds of stresses, while those under biotic stresses were more complicated.

Author contribution statement

JY conceived and supervised this study. ZW carried out most of the work. ZFL and SSW carried out the bioinformatics analysis. PPL determined the contents of flavonoids in tobacco tissues. QSC performed the microscopic examination and photograph of flavonoids staining. MZW and FL examined the gene expression levels and carried out the stress and VIGS treatments. JY and ZW summarized the results and wrote the manuscript. AGY advised and modified the manuscript. All authors reviewed the manuscript.