Evolutionary and functional analyses of the 2-oxoglutarate-dependent dioxygenase genes involved in the flavonoid biosynthesis pathway in tobacco

This study illustrates the differences in the gene structure of 2-oxoglutarate-dependent oxygenase involved in flavonoid biosynthesis (2ODD-IFB), and their potential roles in regulating tobacco flavonoid biosynthesis and plant growth. Flavonol synthase (FLS), anthocyanidin synthase (ANS), and flavanone 3β-hydroxylase belong to the 2-oxoglutarate-dependent (2ODD) oxygenase family, and each performs crucial functions in the biosynthesis of flavonoids. We identified two NtFLS genes, two NtANS genes, and four NtF3H genes from Nicotiana tabacum genome, as well as their homologous genes in the N. sylvestris and N. tomentosiformis genomes. Our phylogenetic analysis indicated that these three types of genes split from each other before the divergence of gymnosperms and angiosperms. FLS evolved faster in the eudicot plants, whereas ANS evolved faster in the monocot plants. Gene structure analysis revealed two fragment insertions occurred at different times in the intron one position of tobacco FLS genes. Homologous protein modeling revealed distinct structures in the N terminus of the tobacco 2ODD oxygenases. We found that the expression patterns of genes encoding tobacco 2ODD oxygenases in flavonoids biosynthesis (2ODD-IFB) did not determine the accumulation patterns of flavonoids among various tobacco tissues, but strongly affected the concentration of flavonoids in the tissues, where they were biosynthesized. More carbon resource flowed to the flavonol biosynthesis when NtANS gene was silenced, otherwise more anthocyanidin accumulated when NtFLS gene was repressed. This study illustrates the 2ODD-IFB gene structure evolution, differences among their protein structures, and provides a foundation for regulating plant development and altering flavonoid content and/or composition through the manipulation of plant 2ODD-IFB genes.


Introduction
Flavonoids are polyphenolic secondary metabolites with C6-C3-C6 carbon backbones that are biosynthesized via the phenylpropanoid pathway (Winkel-Shirley 2001). There are approximately 10,000 known flavonoids in plants; these are classified into different subgroups depending on the organization and modifications of the three-ring structure (Harborne and Williams 2000;Williams and Grayer 2004).

3
The major types of flavonoids in plants are flavonols, flavanones, flavones, anthocyanins, condensed tannins (proanthocyanidins), and so on (Routaboul et al. 2012). Flavonoids have antioxidant, anti-inflammatory, and anti-proliferative properties that can contribute to protecting humans from cancers and cardiovascular diseases (Geleijnse and Hollman 2008;Kale et al. 2008;Kaur et al. 2008;Kang et al. 2009;van Dam et al. 2013;Maggioni et al. 2014). Flavonoids are also known to participate in the regulation of the growth, development, and physiology of Arabidopsis and many other plants. For example, flavonols are involved in the modulation of the polar transport of auxin (Kuhn et al. 2011), flower color (Gronquist et al. 2001), pollen fertility (Mo et al. 1992), ethylene signaling (Lewis et al. 2011), and UV-B protection (Kusano et al. 2011). Thus, understanding and the promise of being able to engineer the synthesis of flavonoids, especially flavonols, is of great biological and medicinal significance.
The biosynthesis pathways of flavonoids in various plant species have been characterized through the isolation and characterization of numerous biosynthetic enzymes ( Fig. 1) (Winkel-Shirley 2001;Williams and Grayer 2004;Kuhn et al. 2011;Routaboul et al. 2012;Cheng et al. 2014). In flavonoid biosynthesis, naringenin is hydroxylated to dihydroflavonol by F3H. Dihydroflavonol can subsequently be oxidized to flavonol by FLS (Chua et al. 2008;Owens et al. 2008). Alternatively, dihydroflanonol-4-reductase (DFR) reduces the dihydroflavonol to the leucoanthocyanidin, and then ANS, also known as LDOX/leucoanthocyanidin dioxygenase) produces an anthocyanin from the leucoanthocyanidin (Davies et al. 2003;Turnbull et al. 2004;Wellmann et al. 2006). F3H, FLS, and ANS all belong to the 2ODD oxygenase family, and are widely distributed among various plant species. Plant 2ODD oxygenases utilize molecular oxygen as their co-substrate, and are distinguished by their differing co-factor requirements, including, for example, ascorbate, 2-oxoglutarate, and/or Fe 2+ (Chua et al. 2008). These enzymes are involved in hypoxic signaling, DNA repair, and various metabolic pathways, including the biosynthesis of phytohormones, amino acids, signaling molecules, and various secondary metabolites (Prescott and Lloyd 2000;Ivan et al. 2001;Falnes et al. 2002;Trewick et al. 2002).
The first full-length FLS gene was cloned from a petunia petal cDNA library. Its activity was confirmed by antisense expression in petunia, which led to significant reductions of flavonol synthesis in petals (Holton et al. 1993). Subsequent studies have characterized FLS genes from a range of plant species, including Arabidopsis thaliana (Chua et al. 2008;Owens et al. 2008;Preuss et al. 2009;Stracke et al. 2009), Vitis vinifera (Fujita et al. 2006), Camellia sinensis (Lin et al. 2007), and Zea mays (Falcone Ferreyra et al. 2010). The first ANS gene was isolated from maize and a further investigation of its biochemical functions was performed in Perilla frutescens (Menssen et al. 1990;Saito et al. 1999). The crystal structure of ANS was determined and has been widely used as template for building in silico models of other plant 2ODD oxygenases (Welford et al. 2005). Such models are useful for the identification of conserved residues that are required for oxygenase activity (Welford et al. 2001). The F3H gene was first described in Matthiola incana (Forkmann et al. 1980), and subsequently cloned from Petunia hybrid (Britsch et al. 1992), Hordeum vulgare (Meldgaard 1992), Malus (Davies 1993), Medicago sativa (Charrier  (Deboo et al. 1995), Arabidopsis thaliana (Pelletier and Shirley 1996), Triticum aestivum (Himi et al. 2011), and soybean (Cheng et al. 2013). Recombinant domain swapping research found that the C-terminal region of F3H contributed to its activity but did not affect its selectivity (Lee et al. 2001;Wellmann et al. 2004). Although the 2ODD oxygenases have been extensively studied in a few plant species, such as Arabidopsis, maize, and grape, the 2ODD genes in tobacco have not been characterized.
In the current study, we identified and characterized the 2ODD genes involved in flavonoid biosynthesis from the allotetraploid N. tabacum and from two diploid tobacco species (N. tomentosiformis and N. sylvestris). Our analysis included the identification of tobacco 2ODD-IFB genes from genomic data, analysis of their structures and conserved motifs, phylogenetic analysis, evaluation of their N-/Cterminal structures, evaluation of their tissue-and developmentally dependent expression patterns, and correlations between 2ODD-IFB gene expression patterns and flavonoid content. Our results provide a foundation for the further investigation and manipulation of tobacco 2ODD-IFB genes, which should lead to a more complete understanding of the roles of these genes in the regulation of plant growth and may enable the intentional alteration of flavonoid composition and/or content according to human requirements.

Plant materials
The Honghua Dajinyuan cultivar (Hongda, N. tabacum) was used in the expression profile analysis for the 2ODD-IFB genes. Tobacco seeds maintained at our laboratory were germinated and grown in pots under typical conditions (daily 28 °C for 16 h under light, 23 °C for 8 h of darkness) until the flowering stage. Total RNA was extracted from roots, leaves, stems, stem nodes, buds, sepals, stamens, pistils, and axillary buds.
For the stress treatments, tobacco seeds were soaked, sterilized, and then germinated on 1/2 MS medium in darkness. Young seedlings were then grown under typical culture conditions in vermiculite until the six-leaf stage. Phytophthora parasitica var. nicotianae (P. parasitica var. nicotianae) and potato virus Y (PVY) were inoculated onto the sixth leaves of tobacco seedlings for different durations of time (1 h, 2 h, 4 h for P. parasitica var. nicotianae inoculation; 3d, 6d, 9d, 14d for PVY inoculation). For the drought treatment, seedlings were initially transferred to a 1/3 strength Hoagland solution for 1 week prior to use in the experiment, and then cultivated in a solution containing 20% (w/v) PEG6000 for 1d, 2d, 4d, and 8d. Plant materials from the treatments were collected and immediately frozen in liquid nitrogen and stored at − 80 °C prior to RNA extraction.

Phylogenetic and gene structure analyses
The 2ODD-IFB sequences involved in flavonoid biosynthesis in other plant species were collected from NCBI Gen-Bank. The sequences used in this study are listed in Suppl. Table S1.
Multiple alignments of the 2ODD-IFB sequences were performed using Clustal X (version 1.83) and DNAMAN (version 6.0) with default penalties for gap and the protein weight matrix of Blosum (Jeanmougin et al. 1998). MEGA 5.0 was used to construct the phylogenetic tree of the 2ODD-IFB amino acid sequences using the neighborjoining algorithm with bootstrap method (1000 of bootstrap replications), Poisson model, and complete deletion of gaps/ missing data (Tamura et al. 2011).
Each tobacco 2ODD-IFB coding sequence was aligned with its corresponding genomic DNA sequence to identify the exact intron/exon locations. The Multiple Em for Motif Elicitation program (MEME, http://meme.nbcr.net/meme3 /mme.html) was used to predict the conserved motifs in the tobacco 2ODD-IFB protein sequences (Bailey et al. 2006). The putative motifs found by MEME were further queried in the InterPro database (http://www.ebi.ac.uk/inter pro/;jsess ionid =F412A 1E32D 81ECA 4EBDB 9A250 D55D3 2E) to explore their possible biological functions (Finn et al. 2017).

cDNA sample preparation and gene transcription analysis
A SuperPure Plantpoly RNA Kit (Gene Answer, Beijing, China) was used to extract total RNA from the plant samples. DNA contamination was removed by digestion with RNase-free DNase I (Gene Answer). Reverse Transcriptase M-MLV (Takara Biomedical Technology, Beijing, China) and random primers were used to synthesize first-strand cDNA using 1 μg of total RNA as a template. The cDNA concentrations were evaluated with a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA) and then diluted to a 100 ng/µl concentration.
The gene-specific primers used in the PCR experiment are listed in Suppl. Table S2. RT-PCR was performed using Takara Taq polymerase in a Biometra thermal cycler. The PCR program was as follows: 95 °C for 5 min, 25-29 cycles of 94 °C for 30 s, 55 °C or 60 °C for 30 s, and 72 °C for 30 s. qPCR amplification reactions were performed using a LightCycler ® 96 SW 1.1 cycler (Roche, Laval, QC, Canada) and an SYBR Green kit (Roche). The PCR program was as follows: 95 °C for 5 min, 40 cycles of 94 °C for 30 s, 60 °C for 30 s, signal acquisition, and then a final melting curve of 65-95 °C. The expression levels of the target genes were 1 3 standardized to the expression level of the NtGAPDH gene using the 2 − △△Ct method. Three independent biological replicates were performed for each gene.

Estimation of K a /K s ratios and protein structure modeling
To estimate K a (nonsynonymous substitutions per site), K s (synonymous substitutions per site) and their ratios, the amino acid sequences of each gene family were firstly analyzed by the MEME program. The motif amino acid sequences found by the program were aligned and subsequently transferred to the original cDNA sequences. Both K a and K s values were then estimated from their aligned cDNA sequences using the yn00 program of the PAML4b package.
The homology models of the tobacco 2ODD-IFB proteins were generated with tools available via the SWISS-MODEL website (http://swiss model .expas y.org/inter activ e), as described in a previous a paper (Guex et al. 2009). Each target protein sequence was aligned with those of model proteins to select the most suitable template. Based on various factors (insertions/deletions, ligands, oligomeric state) for which information was available on the SWISS-MODEL website, the crystal structure of Arabidopsis LDOX (2brt.1A) was chosen as the template from which to build models for the tobacco 2ODD-IFB proteins.

Virus-induced gene silencing (VIGS) of 2ODD-IFB in tobacco
For the VIGS vectors construction, fragments with about 500 bp length of tobacco 2ODD-IFB genes were selected and amplified from cDNA samples using gene-specific primers shown in Suppl. Table S2. The fragments were digested with BamH I and Kpn I enzymes, and subsequently cloned into the pTRV2 vector. After sequencing, correct pTRV2-FLS, pTRV2-ANS, and pTRV2-F3H vectors were transformed into A. tumefaciens strain GV3101 by heat shock methods. The Agrobacterium suspensions were incubated at room temperature for at least 3 h without shaking, and subsequently injected into the leaves of 4-week-old N. benthamiana plants using a needleless syringe. The Agrobacterium containing pTRV2, pTRV2-PDS vectors were set as negative and positive control, respectively. The positive plants were identified by RT-qPCR 2 weeks after injection.

Identification and sequence analysis of tobacco 2ODD-IFB genes
We used the terms 'flavonol synthase', 'anthocyanidin synthase', and 'flavanone 3-hydroxylase' as query key words for searching the N. tabacum genome in the China tobacco genome database v4.0 (data not shown) and obtained 7, 8, and 7 hits for these terms, respectively. We then performed several blastp searches in the database using the amino acid sequences of Arabidopsis FLS, ANS, and F3H proteins as the query sequences. We aligned the nucleotide and amino acid sequences of each putative tobacco 2ODD-IFB homolog with those of the Arabidopsis FLS, ANS, and F3H genes. We excluded the tobacco genes that had low similarity and identity with Arabidopsis homologs and thus identified two FLS, two ANS, and four F3H genes in the N. tabacum genome (Table 1). Similarly, we identified and characterized one FLS, one ANS, and one F3H gene in the N. tomentosiformis genome as well as one FLS, one ANS, and one F3H gene in the N. sylvestris genome (Table 1). All the 2ODD sequences identified in our database were further validated in the public tobacco genome databases (Sierro et al. 2013(Sierro et al. , 2014Edwards et al. 2017), and then submitted to the GenBank database. The sequence information and GenBank accession numbers are shown in Suppl. Table S3.
In brief, we identified four FLS genes, four ANS genes, and six F3H genes from the three tobacco species. The identities of the four FLS amino acid sequences were between 66.04 and 94.51%. However, the identities between the 1 3 sequences of the four FLS and four ANS ranged from 24.05 to 32.14%, and those of the four FLS and the six F3H ranged from 17.19 to 25.4% ( Table 2). The four ANS homologs from the three tobacco species shared identities between 63.01 and 100% and the six F3H homologs shared identities between 96.21 and 100%. The identities between ANS and F3H were very low (15.18-24.82%). Similarly, alignments between the coding sequences of the 2ODD-IFB genes showed consistent results with those for the amino acid sequences (Table 2). It is worth noting that the NtF3H genes Ntab 0444890, Ntab 0908200, and Ntab 0908150 in the N. tabacum genome showed 100% identities in the amino acid sequences and CDS sequences, but they were found to be located either on different scaffolds or on different positions of the same scaffold (Table 1). We thus considered these as three independent NtF3H genes, which might be generated by duplication after the formation of tetraploid tobacco.

Exon/intron organization of the 2ODD-IFB genes
Characterization of the exon/intron structure of genes can be useful for the evaluation of the evolution of a gene family. As shown in Fig. 2a, all of the tobacco FLS genes contained two introns. Most of the known FLS genes from other eudicot plant species also contained two introns (Suppl. Fig.  S1). Among all of the FLS genes examined in this study, the length of exon one changed to a greater extent (ranging from 380 to 518 bp) than did exon two (from 327 to 333 bp) or exon three (from 211 to 229 bp). It is interesting that the FLS intron one length changed in a manner consistent with the divergence time between certain plant species and Arabidopsis (Fig. 3 (Bowers et al. 2003;Moore et al. 2010). The FLS intron one in Solanaceae plants varies in length from 1.78 to 4.12 kb, lengths that are all much larger than those of intron one in grape (0.64 kb), cotton (0.31 kb), Brassica napus (0.25 kb), and Arabidopsis (0.21 kb). Although Brachypodium, maize, rice, and sorghum diverged from Arabidopsis even earlier (about 200 Mya), the FLS genes in these plants contained no or only very short introns at this position in the gene's structure (0.07 kb in OsFLS, and 0.09 kb in SbFLS2). We aligned the nucleotide sequences of FLS intron one from various plant species, including tobacco, potato, tomato, grape, and Arabidopsis. As shown in Suppl. Fig. S2, there were three conserved fragments among the different FLS intron one sequences. Fragment one shown in black frame was conserved among all the sequences examined, while fragment two shown in a red frame was conserved in the Solanaceae FLS genes. The conserved fragment three in purple frame only showed up in tobacco FLS genes. Similarly, intron two of the Solanaceae FLS genes was larger than those in other plant species, especially in the StFLS and SolyFLS genes (Fig. 3). Alignment among the intron two sequences from various plant species revealed two conserved fragments (Suppl. Fig. S3). The conserved fragment one existed in all the species examined, while the fragment two only occurred in the StFLS and SolyFLS genes.
We also analyzed the structures of the ANS and F3H genes from various plant species. As shown in Figs. 2a, b and S4, all the ANS genes from eudicot plants examined, including tobacco, potato, tomato, Arabidopsis, Brassica napus, soybean, Medicago truncatula, and grape, contained one intron and two exons. There was no intron present in the ANS genes of monocots including rice, maize, sorghum, or Setaria italica. The exon 1 size of the eudicot ANS genes varied from 497 to 533 bp, while those of exon two varied from 562 to 860 bp in length, suggesting a higher degree of conservation of exon one during the evolution of the ANS gene. All the F3H genes examined in the present study contained two introns, though the intron size varied among the different plant species (Suppl. Fig. S4). Moreover, the sizes of the three exons were highly conserved, especially exon two, which was 431 bp in the tobacco F3H genes, and 429 bp in the other F3H genes. This indicated a relatively evolutionarily stable gene structure for F3H.

Conserved residues in tobacco 2ODD-IFB enzymes
To evaluate if the genes identified in tobacco were likely to encode functional 2ODD-IFB enzymes, we examined their conserved catalytic residues and motifs (Cheng et al. 2014).
As shown in Fig. 4, there were five specific residues (in a red frame) that were highly conserved across all of the tobacco 2ODD-IFB proteins, including two histidines (His224, His280; numbers refer to the Ntab 0125670 sequence), one aspartic acid (Asp226), one arginine (Arg290), and one serine (Ser292). Four other residues, Gly70, His77, Gly240, and Pro210 (shown in a green box) that are likely involved in ensuring the correct folding of 2ODD proteins, were also found to be conserved in the selected tobacco 2ODD-IFB proteins. We searched all of the putative motifs of the tobacco 2ODD-IFB proteins on the MEME website and found a total of four conserved motifs (Fig. 4, Suppl. Table S4). These motifs were all either annotated as possessing oxidoreductase activity or as being involved in anthocyanidin biosynthesis. Taken together, all of the tobacco 2ODD-IFB identified from the database contained the conserved residues and motifs that are considered characteristic of 2ODD proteins.
To identify the most likely enzymatic activity-relevant residues in the termini of the tobacco 2ODD-IFB proteins, we constructed homology models based on the crystal structure of AtLDOX (2brt.  I  I  I  I  I  I   T  T  T  T  P  P  P  P  P  P  P  P  P  P   T  T  T  T  T  T  T  T  I  I  I  I  I  I   L  L  L  L  I  I  I  I  I  I  I  I  I  I L  L  I  I  I  I  I  I  I  I  I  I I  I  .  I  I  I  I  I  I  I  I  I  I  I Ntab0125670  288  Ntab0215410  214  Ntom0342270  260  Nsyl0361410  295  Ntab0256350  295  Ntab0694550  267  Ntom0270640  295  Nsyl0342440  278  Ntab0333640  278  Ntab0444890  278  Ntab0908200  278  Ntab0908150  278  Ntom0320920 278 Nsyl0326580 Consensus H  H  H  H  H  H  H  H  H  H  H  H  H  H  t   T  T  T  T  T  T  T  T  T  T  T  T  T  T T  T  T  T  T  T   I  I  I  I  L  L  L  L  I  I  I  I  I  I  t   T  T  T  T  T  T  T  T  T  T  T  T  T  T   I  I  I  I I  I  I  I  I  I   D  D  D  D  T  T  T  T  T  T  T  T  T  T L  L  I  I  I  I  F  F  F  F  F  F   I  I  I  I  I  I  I  I H  H  H  H  H  H  H  H  N  N  N  N  N  N   I  I  I  I  I  I  I  I H  H  H  H  H  H   I  I  I  I  I  I  I  I H  H  H  H  I  I  S  I  I  I  I  I  I  I   E  E  E  E  I  I  K  I

Motif 2
Motif 3 Fig. 4 Alignment of the predicted amino acid sequences of the tobacco 2ODD-IFB proteins. The red boxes indicate conserved catalytic residues in these 2ODD-IFB sequences, while the green boxes indicate an additional four conserved residues that are probably required to ensure correct folding. The black boxes indicate the location of the putative motifs shown in Table S2 30.40% amino acid identity with AtLDOX, respectively. The largest apparent differences among the three protein structures were near the N terminus (Fig. 5a). There were at least two α-helices at the N terminus of all three proteins. There was one more four-residue α-helix at the N terminus of the NtANS model, while the NtFLS model contained half of an α-helix at a similar position. This helix was not present in the NtF3H model. We also used the homology models to identify the conserved residues that might be required for the formation of the N-terminal helices. As shown in Fig. 5b, residues highlighted in gray were supposed to constitute the helices at the N terminus of the tobacco 2ODD-IFB proteins. The residues presented in red did not differ from those of the template and might be the core position for N-terminal helices.

Phylogenetic analysis of tobacco 2ODD-IFB genes
To better understand the evolutionary relationships among the 2ODD-IFB genes of tobacco and those of other plant species, we aligned 75 2ODD-IFB amino acid sequences from 26 species with the ClustalX program and then constructed an unrooted tree with MEGA 5 using the neighbor-joining method. As shown in Fig. 6, plant 2ODD-IFB genes could be generally divided into two major groups. One group contained two branches, namely an FLS group and an ANS group, while the other group was divided into an F3H group and an FNS I group. The FLS group could further be divided into three subgroups, which we named as the FLS monocot subgroup, the FLS eudicot I subgroup, and the FLS eudicot II subgroup. There were 24 genes from 11 plant species in the FLS eudicot I subgroup, but only three genes (GbFLS, VvFLS5, and CmFLS) present in the FLS eudicot II subgroup. Moreover, in both the ANS group and the F3H group, the monocot genes were apparently separate from the eudicot genes. Interestingly, we found that PhANS (BAA89008), BnFLS3 (CDX91472), and BnANS (CDX68770) were grouped into the FNS I group, which was confined to rice and Apiaceae species in previous studies (Britsch et al. 1981;Prescott et al. 2002).
This might indicate a more broad existence of FNS I genes in plants.

Ntab0125670
Ntab0694550 Ntab0333640  N  N  I  I  I  I  I  I   P  P  P  P  P  P   I  I  I  I  I  I   I  I  I  I  I  I    Since the distinct evolution of plant 2ODD-IFB was observed in the phylogenetic tree, we investigated whether they were under different selection forces. We used the domain sequences to calculate their K a , K s and K a /K s ratio. As shown in Fig. 3b, there was no significant difference between eudicot and monocot plants in the average of K a /K s value for F3H family. For the FLS family, the average K a /K s of eudicot plants was 0.42, significantly higher than that in monocot plants (0.29). However, the average K a /K s ratio of ANS family was much higher in monocot plants (0.54) than that in eudicot plants (0.1). These data suggested that the FLS family evolved faster in eudicot plants than in monocot plants, whereas the ANS family evolved faster in monocot plants. We then compared K a and K s values separately to further analyze the reason why FLS and ANS families exhibited different evolutionary rates in eudicot and monocot plants. For the FLS family, the K a value in eudicots is similar to the value in monocots, but the K s value in monocots is almost twice the value in eudicots. The K a value for ANS family in monocots is eight times the value in eudicots, while the K s value in monocots is only twice the value in eudicots. Therefore, the higher K a /K s value of FLS family in eudicots is caused by lower K s value, but the higher K a /K s value of ANS family in monocots is caused by higher K a value. We also found that the K a /K s values of FLS and ANS families in Nicotiana were significantly higher than those in other eudicot plants (Fig. 3c), suggesting a faster evolution for these two genes in Nicotiana.

Correlation between 2ODD-IFB expression levels and flavonoids accumulation patterns
To better understand the possible functions of the 2ODD-IFB genes in tobacco, we initially examined their tissue-specific expression patterns in roots, leaves, stems, stem nodes, axillary buds, buds, sepals, stamens (before and after pollination), and pistils (before and after pollination). As shown in Fig. 7a, the expression of NtFLS (Ntab 0125670 and Ntab 0215410) was detected in leaves, buds, sepals, stamens, and pistils. The NtANS gene (Ntab 0256350 and Ntab 0694550) was mainly expressed in buds and sepals, while expression of NtF3H (Ntab 0333640 and Ntab 0444890) was detected in flower tissues, including buds, sepals, stamens, and pistils (Fig. 7a). Interestingly, the expression level of NtFLS changed markedly in pistils before and after pollination. The expression level of NtFLS could hardly be detected in the pistils before pollination, but increased substantially after pollination. We then measured the content of kaempferol, kaempferol 3-glucoside (K-3G), quercetin, and quercetin 3-glucoside (Q-3G), four products of the FLS enzyme, in the stamens and pistils. As shown in Fig. 7b, pollination could lead to a strong decrease in the kaempferol and K-3G content in the stamens, but led to markedly increased kaempferol and K-3G content in the pistils. Moreover, the quercetin and Q-3G content in the stamens increased significantly after pollination, while the content of Q-3G was reduced in the pistils after pollination. Pollination did not alter the content of cyanidin chloride or delphinidin 3-O-β-d-glucoside, two important products of NtANS, in the stamens or pistils. Thus, the variation in 2ODD-IFB gene expression in the stamens and pistils appears to be related to corresponding changes in flavonol content, which might further affect the development of stamens and pistils. We observed the distribution of flavonoids in tobacco roots via DPBA staining. As shown in Fig. 7c, tobacco root tips were stained to deep yellow, indicating an abundance of naringenin chalcone or quercetin in these regions. In contrast, there was weak yellow staining in the elongation zone (not shown), indicating an asymmetrical distribution of flavonoids in tobacco roots. We analyzed the accumulation of nine flavonoids in various tobacco tissues (Fig. 7d) and found that almost all of the flavonoids examined could be detected in all of the tissues, a finding that was not consistent with the expression patterns of the corresponding 2ODD-IFB genes. For instance, the transcription of 2ODD-IFB genes could hardly be detected in tobacco roots, but the accumulation of various flavonoids was found in roots via both DPBA staining and HPLC-UV analysis. The accumulation levels of most flavonoids we assessed were quite low (less than 5 μg/g fresh weight), which might be caused by the fact that most substances we examined were the intermediate products during flavonoids metabolism.
Quantitative real-time RT-PCR was performed to further evaluate the relative expression levels of the 2ODD-IFB genes in tobacco leaves at different developmental stages (Fig. 8). Similar to the results of the semi-quantitative RT-PCR analysis, the expression of NtANS was hardly detected in all the tissues. NtF3H was expressed in leaves at all of the stages, but at varying levels (Fig. 8a). The NtFLS gene was apparently expressed in all of the leaf stages examined. The expression level of NtFLS appeared as a 'U' model across The 'U' model expression pattern of the Ntab 0125670 gene during the course of development. The expression levels of the 2ODD-IFB genes were determined relative to the internal control gene,

NtGAPDH.
Values are mean ± SD of three independent biological replicates. c Flavonoid contents in N. tabacum leaves at various developmental stages. d The variation curve of rutin content in N. tabacum leaves at various developmental stages. Values are mean ± SD of three independent biological replicates the whole developmental period (Fig. 8b), suggesting positive roles for this gene in both young and senescent leaves. It is worth pointing out that topping, an important agronomical practice in tobacco production, strongly induced the transcription of the NtFLS and NtF3H genes. We analyzed the flavonoid content in tobacco leaves at various developmental stages using HPLC-UV, and found that all of the leaves examined contained high levels of rutin (quercetin rutinoside) (Fig. 8c). The variation of the rutin content in leaves from different developmental stages also showed the aforementioned 'U' pattern ( Fig. 8d), which was the same with the expression pattern of the NtFLS gene.
The tobacco 2ODD-IFB gene was silenced via the VIGS method to explore the relationship between flavonoids contents and 2ODD-IFB gene expression levels. As shown in Fig. 9a, b, the expression levels of the target genes were obviously down-regulated, while the phenotype of the VIGS plants was the same as that of the wild-type plants.
The contents of anthocyanidins including catechin, delphinidin, and procyanidine in FLS-VIGS leaves were much more than those in wild-type leaves, but the flavonol contents including quercetin, quercetin-3-glucose, kaempferol, kaempferol-3-glucose in FLS-VIGS leaves were less than those in WT leaves (Fig. 9c). On the contrary, compared to those in WT leaves, the anthocyanidins contents in the ANS-VIGS leaves decreased, but the flavonols contents significantly increased. The contents of flavonols and anthocyanidins both decreased in F3H-VIGS leaves. These data suggested that the contents of flavonoids in tobacco leaves were consistent with the expression levels of 2ODD-IFB genes.

Differential expression patterns of tobacco 2ODD-IFB genes under drought and virus inoculation treatments
To examine any possible roles for the 2ODD-IFB genes in plant stress resistance, we measured their expression levels under drought and virus inoculation treatments by both qPCR (Fig. 10). Generally, the expression levels of the tobacco 2ODD-IFB genes were down-regulated, to varying extents, in leaves inoculated with Phytophthora parasitica var. nicotianae (P. parasitica var. nicotianae) for 1 h, 2 h, and 4 h (Fig. 10a). The variation trends of the expression levels of NtFLS, NtANS, and NtF3H genes were similar to each  Fig. 10b, compared with the expression levels in control leaves, the expression levels of the tobacco 2ODD-IFB genes were almost unchanged in leaves inoculated for 3 days (except a 3.5-fold increase in expression for the Ntab 0256350 gene), obviously suppressed in 6d leaves, then significantly up-regulated in 9d leaves, and finally decreased to various levels in 14-d-old leaves. When suffering drought stress, the expression patterns of the tobacco 2ODD-IFB genes were divided into two types (Fig. 10c). The expression levels of Ntab 0215410 (NtFLS) and Ntab 0256350 (NtANS) significantly increased along with the extension of the treatment time, but those of the other 2ODD-IFB genes were affected to a lesser extent throughout the treatment. Taken together, the expression levels of tobacco 2ODD-IFB genes under biotic stresses (P. parasitica var. nicotianae or PVY) fluctuated drastically along with the treatment time, while those under drought treatment showed apparent regularity, suggesting a more complex regulation network of tobacco 2ODD-IFB genes under biotic stresses relative to the regulation network for drought.

Discussion
The 2ODD enzymes constitute the second largest protein superfamily in plants and participate in various oxygenation/ hydroxylation reactions. Here, we identified four FLS genes, four ANS genes, and six F3H genes from the N. tabacum, N. sylvestris, and N. tomentosiformis genomes. We further analyzed their conserved residues, gene structures, evolutionary relationships, and expression patterns in various tobacco plant materials. This work therefore provides an important step toward a comprehensive understanding of the phylogenesis and putative functions of these enzymes in various growth and developmental processes in tobacco.

Evolution and diversity of tobacco 2ODD-IFB genes
Synthesizing the structural information we have learned from tobacco and other plant species, we proposed the events occurred in the structures of 2ODD-IFB genes during evolution (Fig. 11). Initially, the ancestral gene of 2ODD-IFB contained three exons and two introns. Before the split of gymnosperms and angiosperms, this ancestral gene gave rise to the three progenitors of FLS, ANS, and F3H genes via mutations or duplication. After the separation of the two kinds of plants, the angiosperm ANS genes lost their second intron, which led to the fusion of exon two and exon three (Suppl. Fig. S4). Subsequently, after the divergence of monocots and eudicots, the monocot ANS genes further lost their first intron, which resulted in no intron being present in these genes. Similarly, the first intron was also lost during the evolution of monocot FLS genes after the separation of monocots and eudicots (Suppl. Fig. S1). By contrast, the gene structures of F3H were quite stable among various plant species examined, suggesting a relatively small pressure on this gene during evolution.
Besides the inconsistent numbers of introns and exons in plant 2ODD-IFB genes, we also found that the intron length of FLS genes varied greatly among various plant species (Fig. 3). Based on the alignment of different plant intron sequences, it is easy to come to the speculation that there was a fragment insertion (1st insertion) in Solanaceae FLS intron one position after its divergence from other plant species, but before the differentiation within the Solanaceae. Moreover, another fragment insertion (2nd insertion) occurred in the tobacco FLS intron one, which gave rise to the largest intron one examined so far. We also aligned the FLS intron two sequences from different plant species, and found a fragment insertion (3rd insertion) occurred in tomato and potato FLS genes (Suppl. Fig. S3). It has been hypothesized that the intron size is constrained by energy use in transcription, since it might require more energy to transcribe and splice larger introns (Castillo-Davis et al. 2002). So selection usually acts against introns with excessive size, but meanwhile there also exists a selective preference for larger introns in certain genes, owing to the possession of more regulatory elements and better control of gene transcription (Marais et al. 2005). The insertions of transposable elements (TE) in introns could result in significant genetic and phenotypic changes via altering the spatial and temporal expression patterns of specific genes, which could also be preserved under natural selection (Lempe et al. 2005).
We have made several blast searches of the inserted sequences in FLS introns in the NCBI database, but failed to get highly similar sequences. There were many predicted TEs presented in the inserted fragments (Suppl . Table S5-7), indicating a more complex regulation system for the FLS genes harboring larger introns. The expression patterns of FLS genes vary among different plant species. For instance, the highest transcript levels of the two FLS genes in maize were found in young leaves (Falcone Ferreyra et al. 2012), but in G. biloba the highest GbFLS mRNA level was observed in the mature leaves and significantly higher than in the young leaves (Xu et al. 2012). In Arabidopsis, the highest transcript levels of AtFLS gene were detected in the reproductive tissues, including developing inflorescence, buds, flowers, and siliques (Owens et al. 2008). We found the tobacco NtFLS gene was mainly expressed in flowers and leaves. The various intron lengths might contribute to the distinct expression patterns for FLS genes in different plant species, which needed to be further confirmed.
The expansion of the 2ODD family occurred by largescale duplication after the split from the common ancestor of land plants, which gave rise to three classes (DOXA, DOXB, Other eudicot FLS Fig. 11 Gene structure evolution of plant 2ODD-IFB genes 1 3 and DOXC) based on the amino acid sequence similarity. All the 2ODD-IFB belong to the DOXC class, and the evolution of FLS and ANS is likely to have occurred after the F3H emergence during seed plant evolution (Kawai et al. 2014). The 2ODD-IFB genes (GbFLS, GbANS, and GnF3H) of Ginkgo biloba dispersed in the FLS, ANS, and F3H group, respectively, instead of clustering together. This implied that the 2ODD-IFB genes might split apart from each other before the divergence of gymnosperms and angiosperms, which was consistent with the previous speculation in this text conducted by the analysis of gene structures. Moreover, we found that the FLS group could be further divided into three subgroups, among which one subgroup contained FLS genes from monocots, while genes in the other two subgroups were both from eudicots (Fig. 6). This suggests that duplication might have happened to the ancestral gene and given rise to the two progenitors of FLS genes before the divergence of monocots and eudicots. Subsequently, the two progenitors underwent independent evolution, and one progenitor generated the FLS eudicot I subgroup and FLS monocot subgroup, while the other one generated the FLS eudicot II subgroup. If so, there should be an FLS monocot II subgroup, the possibility of which we could not rule out, as there have been relatively fewer FLS genes reported in the monocot plants so far.

Consistent 2ODD-IFB gene expression levels and flavonoids content in tobacco
In the present study, we measured the transcription levels of tobacco 2ODD-IFB genes in different tissues from flowering stage and found that most tobacco 2ODD-IFB genes were exclusively expressed in the reproductive tissues, including buds, sepals, stamens, and pistils. However, the deposition pattern of flavonoids among various tobacco tissues seemed not consistent with the expression patterns of 2ODD-IFB genes. For instance, we found flavonoids accumulation in tobacco roots by both DPBA staining and HPLC-UV analysis (Fig. 7c, d), while at the same time we did not detect obvious transcripts of tobacco 2ODD-IFB genes in this tissue (Fig. 7). This indicated the possibility that the flavonoids deposited in roots might be synthesized in other plant tissues, such as leaves, and then transported into the roots. It is worth noting that in the tobacco tissues, where flavonoids were biosynthesized, the accumulation patterns of flavonoids were consistent with the expression patterns of 2ODD-IFB genes. For instance, pollination did not change the expression level of NtANS gene in the pistils, but significantly raised the transcription level of NtFLS gene (Fig. 7a). This led to the significant increase of flavonol content (especially, kaempferol) in pistils after pollination (Fig. 7b), which meanwhile also indicated a positive role of flavonol in promoting pollen germination or pollen tube elongation. In tobacco leaves, both the expression pattern of NtFLS gene and the accumulation pattern of rutin turned to be a 'U' model during leaf development (Fig. 8b, 9d). Moreover, when NtFLS was silenced in tobacco leaves by VIGS, the biosynthesis of flavonoids flowed to the anthocyanidin pathway (Fig. 9c). More flavonols accumulated in the tobacco leaves where the NtANS gene was silenced. Taken together, it could be concluded that the expression patterns of tobacco 2ODD-IFB genes did not determine the deposition patterns of flavonoids among various tobacco tissues, but strongly affected the levels of flavonoids contents in the tissues, where they were biosynthesized.

Distinct expression files of tobacco 2ODD-IFB genes under stresses
Flavonoids have been shown to participate in plant resistance to various biotic and abiotic stresses (Treutter 2005). In the present study, we detected the expression patterns of tobacco 2ODD-IFB genes under various stresses, and found two interesting facts. Firstly, the expression patterns of these genes varied under different biotic stresses. These distinct expression patterns suggested different roles played by 2ODD-IFB genes or flavonoids compounds in plant resistance to various biotic stresses. Secondly, we found that the expression patterns of up-regulated 2ODD-IFB genes were different among the plants with biotic and abiotic treatment, respectively. The up-regulation of 2ODD-IFB genes was transient under biotic stresses, while the up-regulation of these genes lasted longer under abiotic stress. We were inclined to believe that the regulation of the expression patterns of tobacco 2ODD-IFB genes might be in different ways under the two kinds of stresses, while those under biotic stresses were more complicated.
Author contribution statement JY conceived and supervised this study. ZW carried out most of the work. ZFL and SSW carried out the bioinformatics analysis. PPL determined the contents of flavonoids in tobacco tissues. QSC performed the microscopic examination and photograph of flavonoids staining. MZW and FL examined the gene expression levels and carried out the stress and VIGS treatments. JY and ZW summarized the results and wrote the manuscript. AGY advised and modified the manuscript. All authors reviewed the manuscript.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.