Introduction

Rice, one of the most economically important cereal crops in the world, is a significant staple for feeding much of the world's population. Although white rice is most commonly consumed, several rice cultivars contain color pigments, such as black and red rice. Black rice has a high anthocyanin content located in the pericarp layers, which gives it a dark purple color (Ryu et al. 1998; Takashi et al. 2001).

Anthocyanins, a group of reddish-purple water-soluble flavonoids, are the primary pigments in the red and black grains, and are responsible for the attractive red, purple, and blue colors of many flowers, fruits, and vegetables. Based on several reviews, it is estimated that more than 400 naturally occurring anthocyanins have been found (Kong et al. 2003). The pigment composes a group of natural colorants belonging to the flavonoid family and is produced by tissues in response to developmental and environmental signals (Quattrocchio et al. 1993). The regulatory elements that confer tissue-specific accumulation of anthocyanin have been characterized in several plant species (Holton and Cornish 1995). The main pigments comprising anthocyanin of dark purple rice are cyanidin (cyanidin-3-O-β-glucoside) and peonicin (peonidin-3-O-β-glucoside) (Ryu et al. 1998; Abdel-Aal et al. 2006).

Recently, anthocyanins have been recognized as health-promoting food ingredients due to their antioxidant activity (Nam et al. 2006; Philpott et al. 2006), and anticancer (Hyun and Chung 2004), hypoglycemic (Tsuda et al. 2003), and anti-inflammatory effects (Tsuda et al. 2002). For example, pigment-supplemented diets of black rice reduced oxidative stress in mice (Xia et al. 2003) and its pigment fraction may have antiatherogenic activity (Xia et al. 2006). A recent report showed that black rice is a good source of fiber, minerals, and several important amino acids (Zhang et al. 2005), and there is increased interest in alternative sources of anthocyanins due to a rising demand for economical sources of natural and stable pigments (Hu et al. 2003; Zhang et al. 2004).

The examined cv. Heugjinju has the richest anthocyanin content among dark purple rice cultivars, with the highest cyanidin-3-glucoside (C3G) pigment content of these cultivars (Ryu et al. 2000; Kim et al. 2007). C3G is known to have high antioxidative function and is the most powerful among the 14 main anthocyanins (Ding et al. 2006).

Orthology assignment in pigmentation biosynthesis is a critical prerequisite of numerous comparative genomics procedures (Dessimoz et al. 2006). The clusters of orthologous groups (COGs) analysis is delineated by comparing protein sequences encoded in complete genomes, representing major phylogenetic lineages. It is important that identification of orthologs is derived from the COGs methodology (Remm et al. 2001). The Gene Ontology (GO) is a major bioinformatics initiative with the aim of standardizing the representation of genes involved in the identification of orthologs (Ashburner et al. 2000). Recently, a number of studies have demonstrated that microarray analyses can be used to examine changes in genome-wide gene expression. Microarray experiments have been used to analyze gene expression changes in a number of crop species, including rice (Kim et al. 2009; Kim et al. 2010; Troester et al. 2009).

Here, we investigated anthocyanin gene expression used by black rice to gain insight into the causes of pigment production. Our results will greatly facilitate future breeding of hybrid varieties of rice rich in anthocyanin.

Materials and methods

Rice

The white rice used was the cv. Dongjin. Black rice was a Heugjinju and Heugseol cultivar. Three rice cultivars were cultivated under controlled conditions in an experimental field of the National Academy of Agricultural Science (NAAS). Heugjinju represents the first breeding cultivar of black rice in Korea (Moon et al. 1998). Heugseol is a new black pericarp cultivar for high anthocyanin, which was developed by the National Institute of Crop Science in 2008 (Choi et al. 2008). It was derived as a crossbreed between Seolgaeng/Heugjinju.

Experimental design

This experiment was designed to assess three factors (i.e., one white and two black rice cultivars) and three treatments (i.e., three seed developmental stages, heading + 7 days, + 14 days, and + 21 days), in triplicate. The samples were harvested from research plots in the experimental fields in 2009. They were manually hulled and ground to obtain a fine powder using a cyclone mixer mill (HMF-590; Hanil, Seoul, Korea) and a mortar and pestle. The milled rice powders were kept at –80°C before RNA extraction.

The results of anthocyanin analysis showed total anthocyanin content of Heugjinju cultivar to be 295 mg/100 g and Heugseol to be 417 mg/100 g. The major components for both consisted of cyanidin-3-O-glucoside (C3G) and peonidin-3-O-glucoside. The distribution ratio of individual anthocyanins showed Heugjinju to have 95.4% C3G and Heugseol, 94.7% C3G. In contrast, the Dongjin cultivar (white seed) showed no detectable anthocyanins.

Therefore, the experiments were performed with Dongjin (white color seed of no anthocyanin), Heugjinju (black color seed of lower anthocyanin) and Heugseol (black color seed of higher anthocyanin). We assumed that two black cultivars would be useful for comparing the gene expression of anthocyanins because they have a similar genetic background but very different anthocyanin content.

RNA extraction

Total RNA was extracted from seeds of two black rice cultivars, Dongjin and Heugjinju, and the wild type Dongjin cultivar. We performed at least three replicates for each treatment. Frozen samples were homogenized with a mortar and pestle in liquid N2. The ground powder was transferred to empty falcon tubes on liquid N2 until the homogenization procedure was ready to be performed, and 0.5 mL of RLC buffer (Qiagen, Hilden, Germany) was added. Because of the high starch content of rice seed, the homogenates were shaken with a vortex for only 10 s, and then plant debris was pelleted by centrifugation. Supernatants were extracted using an RNeasy Kit (Qiagen, Valencia, CA, USA) according to manufacturer’s instructions. The RNA samples were further purified using phenol-chloroform-isoamylalcohol (25:24:1) and an RNeasy mini plant kit (Qiagen). The quantity of total RNA was determined by measuring absorbance at 260 nm and 280 nm by using a Nanodrop ND-1000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, NC, USA). In addition, the level of protein contamination in the RNA was determined based on the A260/A280 ratio. Only RNA samples with ratios of 2.0-2.2 were used for these experiments.

Anthocyanin analysis

The aleurone layer, about 10% of the outer layer of black rice, was extracted with 10 mL of 1% HCL in methanol for 24 h in the dark. Quantification of anthocyanin was performed using a Waters CapLC XE pump (Millford, MA, USA) with Waters 2996 photodiode array detection system. A Symmetry C18, 5 μm (0.32 × 150 mm) column, operated at 25°C, from Waters was used for separation. The mobile phase consisted of 0.1% trifluoroacetic acid (TFA) in water (eluant A) and 0.1% TFA in 95% acetonitrile (eluant B). The injection volume for all samples was 0.2 μl. Spectra were recorded from 200 to 600 nm at a flow-rate of 0.5 μl min-1. The electrospray ionization mass spectrometry (ESI-MS) was carried out with a Micromass electrospray interface ZMD 4000 (Micromass, Manchester, UK). Nitrogen was used as the nebulizing gas. Source blocks and desolvation temperatures were 110°C and 20°C, respectively. Prior to analysis, all samples were filtered through a 0.45-μm membrane filter. The standard anthocyanins (i.e., cyanidin-3-O-glucoside and peonidin-3-O-glucoside) were obtained from Extrasynthese (R&D Chemicals, Genay, France), and apigenin, kaempferol and quercetin were purchased from Sigma Chemical Co. (St. Louis, MO, USA).

Microarray analysis

Microarray analysis was performed using newly designed 135 K Oryza sativa microarrays. These microarrays contained probes to assess 31,439 genes deposited at the International Rice Genome Sequencing Project (IRGSP, http://rgp.dna.affrc.go.jp/E/IRGSP/) and the Rice Annotation Project version 2 (RAP2, http://rapdb.dna.affrc.go.jp/). Four 60-nt probes were designed from each gene starting at 60 bp upstream of the stop codon and with a 30-bp shift such that four probes could be used to cover a 150-bp region at the 3' end of the gene. In addition, 200 genes were included from 123 chloroplasts, 74 mitochondria, and 3 selection markers. Therefore, the microarray had a total of 125,956 probes. The microarray was scanned using the GenePix 4000B (Molecular Devices, Inc., Sunnyvale, CA, USA), and signals were digitized and analyzed by Nimblescan (Nimblegen, Madison, WI, USA). To improve the sensitivity and reproducibility of microarray analysis (Irizarry et al. 2003), normalization was processed using the cubic spline normalization method (Workman et al. 2002), and probe-level summarization was processed by Robust Multichip Analysis (RMA) using a median polish algorithm (Sadlier et al. 2004).

RT-PCR analysis

Reverse transcription-polymerase chain reaction (RT-PCR) analysis was carried out to verify the microarray hybridization data. To reduce experimental variation, a total of 27 RNA samples were extracted from the Dongjin, Heuginju, and Heugseol cultivars at 7, 14, and 21 days after the heading stage, in triplicate. Each RNA was independently isolated for RT-PCR, apart from those used for the microarray analysis.

After treatment with RNase-free DNase followed by phenol extraction and ethanol precipitation, 5 μg of total RNA from each sample was used to synthesize each pool of cDNA using the SuperScript III First-Strand Synthesis System (Invitrogen, Carlsbad, CA, USA). For PCR amplification, 1 μl of the resulting cDNA reaction was used as a template. The PCR reactions were carried out in 50-μl volumes with 1.25 unit of Taq DNA polymerase and 20 pmol of each primer pair. The primers were designed based on sequence information available from the Rice Annotation Project Database (RAP, http://rapdb.dna.affrc.go.jp) and the NCBI/Primer-BLAST (http:// www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LINK_LOC=BlastHome). The PCR program was as follows: 3 min at 94°C, 24 cycles of 30 sec at 94°C, 30 sec at the optimal annealing temperature (58 to 65°C), and 1 min at 72°C, followed by 5 min at 72°C. To validate our RT-PCR results, we performed each experiment three times. The PCR reaction with 24 cycles of amplification was analyzed by gel electrophoresis. Actin mRNA was used as a loading control.

Data collection and statistical analysis

All experiments were performed using only Cy3 to eliminate the dye-swap error value. Data-based background subtraction using a local background estimator was performed to improve fold change estimates on arrays with high background. Spot intensity was calculated as the median value of the spot compared to the background median value. The functional classification categories assessed during COGs analysis using the NCBI/COGs database (http://www.ncbi.nlm.nih.gov/COG/). To determine the processes affected significantly by anthocyanin production, GO analysis was performed using GoMiner software (Zeeberg et al. 2003). The statistical significance of these matches was assessed by calculating p-values using the one-sided Fisher exact test for the number of categorized GO terms in the total analysis. False discovery rate (FDR) values were collected from less than 0.05. The selected genes were analyzed by average linkage clustering to group genes with similar function using SAS/STAT (SAS Institute Inc., Cary, NC, USA, http://www.sas.com). Transcription factors were evaluated and clustered using the hyper-geometric analysis method. Candidate genes in the hyper-geometric analyses were defined as those with p-values less than 0.05 according to ANOVA assessment using GeneSpring GX 11 software (Agilent Technologies Inc., Santa Clara, CA, USA, http://www.chem.agilent.com/).

Results and Discussion

To evaluate anthocyanin biosynthesis in rice cultivars, we performed a four-stage screen. First, we tested the hypothesis that comparison between cv. Dongjin and high anthocyanin cultivars, such as Dongjin/Heugjinju, Dongjin/Heugseol, and Heugjinju/Heugseol, express the same genes at three time points following the heading stage. Second, COG and GO analysis were performed to classify the genes functionally and identify orthologous genes involved in anthocyanin biosynthesis. Third, transcription factors involved in anthocyanin pigmentation were evaluated and clustered by the hyper-geometric distribution analysis method. Finally, selected unknown and predicted genes related to anthocyanin biosynthesis were verified by reverse transcription-polymerase chain reaction (RT-PCR).

Data collection and p-value analysis

Microarray analysis was performed to assess three factors (i.e., a white cultivar and two black rice cultivars) and three treatments (i.e., seed developmental stages) in triplicate. From this experiment, a total of 852,876 intensity items was yielded. We screened 12,673 genes to identify those with greater than a 2-fold up- or down-regulation in each combination of the three rice cultivars and three seed developmental stages. The number of genes significantly upregulated in each cultivar during the three seed developmental stages ranged from 528 to 3,695. The number of down-regulated genes ranged between 295 and 1,782. The number of significant up-regulated genes for each developmental stage was higher than that of down-regulated genes. In the Dongjin/Heugjinju combination, 1,089 upregulated genes were significantly expressed in the heading + 7 days stage, and this number increased to 3,695 genes at the heading + 21 days stage. In the Dongjin/Heugseol combination, the expression of 928 upregulated genes increased to 2,305 genes at the heading + 21 days stage. In the Heugjinju/Heugseol combination, the expression of 528 upregulated genes increased to 1,682 genes at the heading + 21 days stage.

Finally, 365 genes were upregulated in the two black rice cultivars at all developmental stages and showed a significant correlation (r = 0.6503, p = 0.0325), suggesting that these genes may be associated with pigmentation metabolism. We also identified 57 genes that demonstrated downregulation and showed a significant correlation (r = 0.4391, p = 0.0453). We assumed these genes were related to inhibition of pigmentation production or anthocyanin metabolism.

Clusters of orthologous groups (COGs) analysis

COGs analysis was performed on the 12,673 candidate genes using the NCBI/COGs database (http://www.ncbi.nlm.nih.gov/COG/). The upregulated genes were classified as general function (21.6%), signal transduction (12.3%), and posttranslational modification, protein, and chaperones (8.2%). Of the regulated genes, 3,271 genes (25.8%) were poorly characterized, and 617 of those genes (4.8%) had an unknown function. The COGs related to anthocyanin function were assumed due to the presence of highly conserved genes between two specific categories: posttranslational modification, protein turnover, and chaperones (8.2%) and carbohydrate transport and metabolism (4.7%).

GO analysis associated with anthocyanin production

The processes that were most affected during anthocyanin production were determined by GO analysis. First, GoMiner categorizes each gene according to its GO terms and mode of gene expression. Modes of expression are denoted as under, over, and change to represent genes that were downregulated, upregulated, or both, respectively. With this analysis, 1,558 and 295 genes were identified as orthologous to Arabidopsis genes and up- or downregulated, respectively. GoMiner analysis demonstrated that these genes were involved in 25, 19, and 12 biological processes, cellular components, and molecular functions, respectively. Figure 1 shows the composition ratio of genes in the each category.

Fig. 1
figure 1

GO analysis of anthocyanin production biosynthesis. The three process categories show the composition ratio of genes

Candidate genes associated with anthocyanin pigmentation

Genes identified through COGs and GO analysis were compared to the rice genome database at http://rgp.dna.affrc.go.jp/E/IRGSP/ and the rice genome system supported by NAAS website http://nabic.naas.go.kr/. Comparison with these databases enabled elimination of genes related to pigments other than anthocyanin from the candidate gene list. Ultimately, 1,289 candidate genes were predicted to be related to anthocyanin biosynthesis and production.

Transcription factors associated with anthocyanin

We characterized the transcription factors associated with anthocyanin pigmentation using the hyper-geometric analysis method. Table 1 shows the transcription factor groups predicted from the 1,289 candidate genes.

Table 1 Transcription factor groups (TFs) predicted by the cumulative hyper-genomic distribution analysis method

We identified 10 groups that exhibit functionally diverse transcription factor activity involved in anthocyanin biosynthesis. Of the 1,289 candidate genes, 137 (10.6%) were identified as putative transcription factors. The MYB and GT transcription factor families showed the highest expression levels with these groups accounting for approximately 37% of the predicted transcription factors. Other putative groups of anthocyanin production included the NAC families, basic helix-loop-helix (bHLH) and peroxisome proliferator-activated receptor binding protein (PBP). Involvement of the Myb and GT families, NAC, bHLH and PBP transcription factors in colored tissues has been reported previously (Martin and Paz-Ares 1997; Yujie et al. 2010; Reddy et al. 1998; Borevitz et al. 2000; Sheng et al. 2005). These results illustrate the functional diversity of the transcription factor families and that these factors may be highly activated during anthocyanin pigmentation.

MYB family members activate flavonoid pigmentation, are involved in specific steps of the anthocyanin pathway, and have evolved specific functions with different biochemical properties. The GT factors, GT-1 and GT-2, have been shown to interact with multiple sequences within the promoter of the Tdc gene and express a trans-acting factor possessing a GT-motif. GT-1-containing genes encoded factors such as the nuclear protein of light-responsive in rice. NAC factors play an important role in regulating the expression of flavonoid biosynthesis-related genes (Morishita et al. 2009). The bHLH factors function in determining red seed color in anthocyanin biosynthesis (Gonzalez et al. 2008). PBP factors may be closely related to the flower-specific Myb305 factor and MYB-like protein (Reddy et al. 1998). The WRKY factors encode the seed coat development gene of Arabidopsis (Johnson et al. 2002), and several MADS box transcription factors are expressed preferentially in flowers and cause early flowering when ectopically expressed in rice plants (Jeon et al. 2000). Papillomavirus binding factor (PBF) regulates expression of cereal storage protein, which interacts with Opaque-2 and Related to ABI3/VP1 1 (RAV) factors, to act as negative regulators of growth. APETALA2 (AP2) is related to seed development, ethylene-responsive expression, and floral morphogenesis (Jofuku et al. 1994; Shukla et al. 2006).

Unknown genes involved in anthocyanin biosynthesis

Among the 137 transcription factor genes identified above, the expression of 24 unknown and hypothetical genes was found at each seed developmental stage. Among these, 14 showed a pattern of upregulation, and 10 genes were downregulated. Therefore, this method is able to efficiently screen the unknown genes associated with specific biological process. Finally, 17 unknown and hypothetical genes differed between the Donjin cultivar and the two black cultivars at all seed developmental stages. Of the 17 candidate genes, 10 showed a pattern of upregulation and 7 appeared to be downregulated genes. These genes most likely play either a regulatory role in the anthocyanin production process or are related to anthocyanin metabolism during flavonoid biosynthesis. Additional function studies of these genes are necessary to further characterize possible regulatory factors of the anthocyanin pathway and pigmentation metabolism. Nevertheless, these 17 transcription factor genes play a potential role in anthocyanin production. This study provides valuable insight into anthocyanin pigmentation production and will greatly facilitate the future breeding of anthocyanin-rich hybrid rice varieties.

RT-PCR analysis of selected genes

RT-PCR analysis was performed on the 17 unknown and hypothetical genes using 27 RNA samples representing each cultivar. Among the 17 genes, nine showed an up-regulated pattern similar to white rice at all seed developmental stages tested, while six genes exhibited a down-regulated pattern. The remaining two genes were both up- and down-regulated, and were omitted as a result of small differences with control rice (Fig. 2).

Fig. 2
figure 2

RT-PCR analysis result of 15 candidate genes, representing nine up-regulated and six down-regulated genes. A total 27 RNA samples were isolated from three cultivars and three developmental stages, in triplicate. Actin was used as a control. DAF = Day after heading, WT = Dongjin (white color seed of no anthocyanin), B1 = Heugjinju (black color seed of lower anthocyanin), B2 = Heugseol (black color seed of higher anthocyanin)

Most of the up- and down-regulated genes appear to be related to major biological changes induced in the anthocyanin biosynthesis pathway. The results of RT-PCR analysis coincided largely with the expression profiles obtained through the microarray hybridization experiments. Interestingly, four unknown genes (Os07g0184633, Os03g0247300, Os11g0539600, and Os07g0486400) were highly induced during the early heading stage in the two black rice cultivars. This result suggested that these four unknown genes may play a role in anthocyanin production in the early rice heading stage. However, the Os01g0780900 gene is highly induced during late heading stage. Gene Os12g0425800 demonstrates a difference between Heugjinju and Heugseol. This gene was not expressed in the cv. Dongjin without anthocyanin, was weakly induced in the Heugjinju cultivar with low anthocyanin, and was highly expressed in the Heugseol cultivar with high anthocyanin. Therefore, this gene may play a role in controlling anthocyanin levels.

It was assumed that the down-regulated genes were related to inhibition of anthocyanin metabolism or had a negative effect on the anthocyanin biosynthesis pathway. Among the six down-regulated genes, Os06g0170500 showed levels of expression in all developmental stages of the Dongjin cultivar. Table 2 shows the primer list of the 15 candidate genes. Two genes were omitted due to small differences with control rice. Each primer used to amplify the candidate genes.

Table 2 RT-PCR primer list of the 15 candidate genes consisting of nine up-regulated and six down-regulated genes

Conclusions

There has been increased interest in the use of black rice due to the numerous health benefits associated with the anthocyanin pigment. In this study, we identified putative transcription factors involved in anthocyanin biosynthesis in black rice cultivars using a newly designed 135 K Oryza sativa microarray. We performed a four-stage analysis to evaluate anthocyanin biosynthesis in black rice: first, we tested for the presence of commonly expressed genes between different rice cultivars, second, functional classification categories were identified by COGs and GO analysis, third, transcription factors were identified by the hyper-geometric analysis method, and finally, selected unknown and hypothetical genes were verified by RT-PCR. Among the 137 transcription factor genes identified, 24 genes were unknown and hypothetical genes. Among these, 17 genes differed between the white cultivar and the two black rice cultivars at all the seed developmental stages. The 17 candidate genes consist of 10 and 7 genes were up- and down-regulated gene, respectively. These genes may play a regulatory role in anthocyanin production, or are related to anthocyanin metabolism during flavonoid biosynthesis.

Results of RT-PCR analysis largely coincided with the expression profiles obtained through the microarray experiments. While these genes still require further investigation and validation, these results demonstrate the potential of this method using newly designed microarrays. This study provides valuable insight into pigmentation production and will greatly facilitate future breeding of anthocyanin-rich hybrid varieties of rice.