Introduction

Metabolomics is one of the “omics” studies used for acquiring biological information in a comprehensive manner. In the narrow sense, metabolomics implies a qualitative and quantitative approach for all metabolites present in organisms; contrastingly, metabolite profiling or metabolic profiling is a term used to describe the quantitative and qualitative analysis of only the identified metabolites [9, 29, 35, 41, 80]. In this review, we use the term “metabolomics” for comprehensive metabolite analysis, irrespective of whether the detected metabolites have been identified or not.

The basic motivation for rice studies is crop breeding to acquire valuable traits. Forward and reverse genetics have been utilized to select better rice cultivars among natural varieties and artificial mutants [5, 27, 47, 105]. Conventional genetic methods revealed biological functions of unknown genes that are related to valuable qualities, such as rice grain production [6] and tomato yield [34], leading to the generation of new genetically modified (GM) crops. However, it is difficult to clarify the functions of all genes by only genetic techniques because estimation of functions from genome sequences has limitations. Furthermore, other omics technologies may serve to understand the functions of unknown genes and may refine the candidate genes having valuable traits. Metabolomics provides the possibility of clarifying gene functions directly connected to rice quality, since metabolome, implying the varieties and amounts of all metabolite in organisms, is closely related to the important traits of rice such as the yield, nutrient content, and defense mechanism [27, 105]. Therefore, metabolomics is thought to be a technique that may be applicable to rice breeding by a combination of other omics approaches.

Among the omics studies in plant science, genomics was the firstly emerged and is the mostly advanced technique; it has uncovered the genome sequences of several organisms including Arabidopsis [3] and rice [37, 49, 82, 107]. There should be no doubt that the development of an analytical instrument, i.e., the DNA sequencer, has led to the current progress of genomic studies. Other omics studies have also proceeded as a result of technical innovations. Two-dimensional electrophoresis and mass spectrometer (MS) have become universal methods for proteomic studies. Furthermore, the technological development of DNA microarray has led to the emergence of a new omics area, namely, transcriptomics, which encyclopedically analyzes the messenger RNA (mRNA) expression in biological samples. Similarly, MS and nuclear magnetic resonance spectrometer (NMR) have been facilitating metabolomic studies. However, metabolites have various chemical characters, particularly in the plant metabolites whose estimated number is up to 200,000 [20, 29]. Besides, most of the plant metabolites are still unknown. Due to these reasons, no unified analytical instrument for metabolomics has existed thus far. On the other hand, the techniques of metabolomics have been recognized to be useful for basic researches in the area of plant physiology and biology [9]. Additionally, they have been found to be applicable to the safety assessment of GM crops. Therefore, many positive studies on plant metabolomics have been performed, and therefore, many recent review articles on plant metabolomics can be consulted for general statements [9, 28, 31, 35, 41, 80, 90], methodology [13, 18, 30, 46, 69, 81, 89, 99, 101], and applications [42, 67, 76, 79, 84, 97].

In this review, we describe the current technological developments in plant metabolomics, including sample preparations, analytical methods, and informatics techniques. Furthermore, we show several examples and possibilities of the application of the techniques of metabolomics to the study of rice.

Technological developments in metabolomics

Metabolomics is a complex field of analytical chemistry and bioinformatics in which advanced techniques to determine the levels of a wide range of metabolites are employed in a series of procedures, including sample extraction and preparation, metabolite detection using analytical instruments, and data processing and mining by means of bioinformatics techniques. Each step includes some alternatives; for example, using 100% or 80% aqueous methanol for extraction and MS or NMR for metabolite analysis. Depending on the combination of these options in each step, subtly or completely different results can be acquired from the same sample itself. Hence, we should carefully select a proper method in the respective studies. In this section, we mention the details of the metabolomic methods that can be used in each step.

Sample extraction and preparation

To encompass as many plant metabolites as possible, it is important to carefully perform the operations in the sample extraction and preparation steps before metabolomic analysis. First, we need to select an optimal buffer for sample extraction because the extraction efficiency of metabolites is dependent on the type of the extraction buffer used due to the various chemical characters of plant metabolites. The most frequently used buffer is 80% or 100% aqueous methanol, which enables the extraction of a relatively wide range of metabolites from plant tissues. Extraction performed twice, i.e., once each with polar and non-polar buffers, for one sample is surely considered to be more effective to encompass many metabolites, as compared to extraction at one time with one buffer, although the time and labor for the analysis of each sample are doubled. In this case, hydrophilic and hydrophobic buffers such as 50% methanol and 100% acetonitrile are used [1]. Second, the sample preparation step must be taken into account. This step is conducted to remove the unnecessary compounds and residues that interfere with the separation of metabolites and decrease the sensitivity of metabolite detection. Filtration, ultrafiltration, liquid–liquid extraction, and solid-phase extraction are the main techniques used for sample preparation in plant metabolomics. Depending on the instruments used and the samples, the methods involved in sample preparation have slight variations; for example, when capillary electrophoresis (CE)–MS is used for metabolomic analysis, ultrafiltration and liquid–liquid extraction are employed to remove high-molecular-weight compounds such as proteins and hydrophobic compounds such as chlorophyll, respectively, which adversely affect metabolite separation by CE [88]. In case of sample preparation for gas chromatography (GC)–MS analysis, chemical derivatization should be employed to improve the sensitivity of detection of hydrophilic metabolites. Nevertheless, monitoring the recovery rate during sample preparation by use of suitable internal standards is essential for precise quantification of each detected metabolite.

Analytical technologies

For detecting as many metabolites as possible, we should consider the physicochemical properties of metabolites, i.e., their hydrophobicity, volatility, and acidity. Therefore, various separation and detection techniques, such as GC–MS [31, 65], liquid chromatography (LC)–MS [62, 70], CE–MS [83, 92], Fourier transform ion cyclotron resonance (FT-ICR) MS [1, 73, 75] and NMR [61, 85] have been applied to metabolomics. Among them, GC–MS is the most popular method in current metabolomic studies [31]. In addition to volatile compounds, GC–MS can detect hydrophilic metabolites, such as amino acids, organic acids, and sugars, by chemical derivatization of these metabolites. Moreover, in GC–MS analysis using the electron-impact ionization method, it is possible to detect many fragment peaks derived from each compound; these fragment peaks provide information of the structure of the metabolites and thereby help to identify the detected peaks as known metabolites. LC–MS is a powerful tool for profiling of hydrophobic secondary metabolites including alkaloids, flavonoids, and phenylpropanoids due to its high chromatographic performance for separation of these compounds. CE–MS can separate and detect ionic metabolites including amino acids, organic acids, nucleotides, and sugar phosphates without any experiments for chemical derivatization. Most of these metabolites belong to a central metabolic pathway such as the glycolytic pathway and the tricarboxylic acid cycle. Consequently, CE–MS analysis leads to the acquisition of a considerable amount of information on the metabolites of a central metabolic pathway. FT-ICR MS has an ultra-high performance, particularly in terms of mass resolution and sensitivity, leading to the detection of a huge number of peaks without the need for any steps for the separation of metabolites from a mixture by chromatography or electrophoresis. Additionally, FT-ICR MS needs only a few seconds for one scan; thereby, it enables us to perform high-throughput and high-resolution analysis. NMR is also a conducible technique for metabolomics, since it is able to determine the atomic state of compounds and enables the identification of metabolites that are otherwise unidentifiable by MS analysis. Among the various pulse sequence programs, 1H-NMR has been routinely employed for high-throughput metabolomic studies due to its relatively short acquisition time per analysis. Alternatively, several conventional as well as innovative analytical instruments, such as Fourier transform infrared spectrometry [51], direct-infusion time-of-flight (TOF) MS [24, 77], ion mobility MS [25], and LC–FT-ICR MS [48, 91], have recently been applied to metabolomic studies.

Meanwhile, thus far, there have been no breakthroughs in the technology for the detection of unstable metabolites or the metabolites that are present in low concentrations. In case of unstable compounds, we have to treat the samples carefully; for example, by avoidance of exposure to strong light, high temperature, and contact with oxygen during sample extraction. On the contrary, because of the narrow dynamic range of the currently available analytical instruments, it is difficult to simultaneously detect the metabolites that are present in high and low concentrations. The existing analytical instruments are inadequate to encompass all plant metabolites in a single analysis, considering the natural dynamic range of plant metabolites. Additionally, it is laborious and time consuming to analyze diluted or concentrated samples in several batches besides the original one, although this method might compensate for the narrow dynamic range of the analytical instruments and may enable the detection of metabolites independent of their concentrations. Thus, the development of an innovative instrument with a wide dynamic range and a high sensitivity is expected.

Informatics techniques

We encounter a large amount of data after instrumental analysis, particularly in case of high-performance instruments that can detect tiny peaks with high resolution and thus increase the number of signals detected. Since these large datasets cannot be handled manually, we need an automatic high-throughput precise software that is capable of picking up peaks from mass or NMR spectra, aligning the peaks among the samples, and identifying or estimating and quantifying each metabolite. Therefore, informatics is also an essential tool to deal with large metabolomic datasets [4]. Recently, a number of software that can be downloaded from websites are available [15]; for example, “DrDmassPlus” ([75], http://kanaya.naist.jp/DrDMASSplus/) can be used to treat metabolome data from FT-ICR MS, including the functions of peak picking, peak alignment, and some statistical analyses. Even if a metabolomic dataset contains a large amount of data derived from over thousands of detected peaks, DrDMassPlus can process it at high throughput. In addition to such data-processing softwares, we should take advantage of the compounds database in metabolomic studies. Recently, some databases have been constructed for metabolomics. KNApSAcK ([87], http://kanaya.naist.jp/KNApSAcK/) is one of the comprehensive metabolite databases containing information on 21,599 metabolites (March 21, 2008) including their names, chemical formulae, molecular weights, CAS numbers, the organisms from which they are derived, and the reference research papers that have reported them. Furthermore, we can project the detected m/z values to the KNApSAcK database and whether this can easily line up the candidate metabolites for each detected peak.

In metabolomic studies, statistical analysis is often employed to understand the comprehensive differences in the detected metabolites among various samples [29, 35, 41]. Various statistical methods used in conventional genetic studies are applicable to metabolomic data by considering the amount of each metabolite as a trait value. Principal component analysis (PCA), one method of multivariate analyses, is commonly used in metabolomic studies. There have been many reports on the application of PCA to metabolomic data [7, 12, 19, 54, 65, 71, 75, 93, 94, 95]. In addition, several statistical analytical methods have been used for analysis of metabolomic datasets; for example, hierarchical cluster analysis (HCA) [39, 78], partial least squares discriminant analysis (PLS-DA) [52, 65], and batch-learning self-organizing map (BL-SOM) [43, 44, 54]. Depending on the objective of each study, the most appropriate statistical analytical method should be exploited to analyze the metabolomic data.

Application of metabolomics to rice study

Since rice is one of the most important crops in the world, there is a constant challenge to generate better rice cultivars with valuable traits, although it has been performed using traditional and molecular breeding techniques [105]. Recently, it has been found that metabolomic methods help to generate new rice cultivars in the context of identification of valuable gene functions and safety assessment of GM crops [7, 12, 58, 105]. In this section, we elucidate actual examples of applications of metabolomic techniques to rice studies from two viewpoints: the application of metabolomics to the study of rice biology, including the detection of metabolic changes according to the natural genetic variations in rice plants and the application of metabolomics to the production of GM rice, including the evaluation of substantial equivalence.

Metabolomics for rice biology

Analysis of the natural variations in rice using metabolomic techniques is thought to be not only useful to understand the biological traits of rice, such as the yield and defense responses, but is also helpful to improve rice quality, including its taste and nutritive value [27]. Kusano et al. [65] used a metabolomic method combining one-dimensional (1D) and two-dimensional (2D) GC-TOF MS for metabolic phenotyping of the 68 rice varieties in the world rice core collection (WRC) and two other varieties. The WRC is a representative set of the cultivated rice (Oriza sativa L.) and covers approximately 90% of the DNA polymorphisms detected in the original population [57]. In this study, in order to characterize the metabolic phenotypes of WRC, metabolomic analysis of the extracts of brown rice seeds was conducted as follows. First, 1D-GC-TOF MS analysis in combination with multivariate statistical analysis, i.e., PLS-DA, was performed to narrow down the representative varieties, which were selected on the basis of the criterion of good separation from the other varieties in PLS-DA. Second, the three selected varieties were subjected to a detailed 2D-GC-TOF/MS analysis for high-resolution metabolite profiling. Finally, we statistically compared the ten metabolites whose concentrations in the three selected cultivars were different from those in a control rice cultivar (Nipponbare). Through these experiments, the selected varieties were successfully characterized in terms of the differences in their metabolite compositions. For instance, although the selected variety Urasan 1 is considered to be a Japonica variety on the basis of its DNA polymorphisms, this variety showed a unique metabolic profile when compared to other Japonica varieties. This result may suggest that the metabolomic approach can evaluate the quality of rice cultivars according to their metabolite compositions.

Changes in the concentrations of the metabolites according to the developmental or diurnal periods were also investigated using metabolomic techniques. The combination of GC–MS and statistical analyses was exploited to discover certain biomarker metabolites that occur in rice plants in the developmental period [94]. In this study, first, rice samples in various developmental stages were extracted using chloroform and water, and both hydrophobic and hydrophilic extracts were analyzed by GC–MS. Next, the metabolome data was projected onto PCA. The principal component representing the differences in the developmental period of rice was determined, and then, the loading metabolites on this axis were found to be candidates for the biomarker metabolites related to the developmental period of rice. Finally, the physiological roles of each biomarker candidate were discussed. This successive flow from sample extraction to statistical analysis appears to be the typical method for current metabolomic studies. Moreover, Sato et al. [83] applied CE–MS and CE–DAD (DAD, diode array detector) technology to detect water-soluble metabolites in rice leaves during the day and night. They measured the levels of 88 metabolites, including amino acids, organic acids, nucleotides, and sugars, whose detection limits ranged from 0.3 to 71 μmol·l−1. In this study, metabolite fluctuations within a single day were discussed further by comparison with the results reported in the earlier investigations. Furthermore, there have been reports on the comprehensive metabolite analysis of root exudates [26] and flavor volatiles [36] of rice plants. Thus, recently, there have been various examples of applications of metabolomic techniques to rice plants, indicating that metabolomics is accepted to be a potent tool for the study of rice biology.

Metabolomics for GM rice

Molecular breeding of GM rice has been attempted in order to improve its nutritional quality [17, 105]. One of the manipulation targets is the engineering of a metabolic pathway to produce rice cultivars with a high content of valuable metabolites, such as β-carotene in “golden rice” [104] and tryptophan. High tryptophan rice is a prospective GM crop because tryptophan is an essential amino acid for mammals and a growth-encouraging fodder for livestock. Because tryptophan can presently be produced industrially by fermentation, high tryptophan GM rice may serve as a functional food or fodder with a health- or growth-promoting benefit. Normally, a low level of tryptophan is maintained in plants as a result of the feedback inhibition of anthranilate synthase (AS) by tryptophan [10, 66]. Hence, AS has been considered to be the prime target in order to increase the amount of tryptophan [14, 108]. Tozawa et al. [96] cloned 2 AS α-subunit genes (OASA1, OASA2) from rice (Oryza sativa cv. Nipponbare) and produced transgenic rice expressing a feedback-insensitive OASA1 gene (OASA1D) in which aspartate-323 is replaced by asparagine [96]. Detailed evaluation of OASA1D transgenic rice revealed that the levels of free tryptophan in the calli, seedlings, and seeds were increased up to 35- to 300-folds [96, 100], while the other agronomic traits in the field-grown transgenics were essentially identical to those of untransformed variety [100]. Besides the visible phenotypes, it might be expected that the recombination of tryptophan biosynthetic genes affected other invisible phenotypes, such as the concentration of metabolites that are related or unrelated to the tryptophan biosynthetic pathway [50]. Indeed, it has been reported that a change in the pool size of tryptophan in potato activated the biosynthesis of phenylpropanoid compounds [103]. In addition, since tryptophan is a precursor for the biosynthesis of various plant metabolites including auxin, indole-3-acetic acid, [8] and indole alkaloids [98], some of which are toxic or bitter for mammals, the overaccumulation of tryptophan is likely to be associated with the production of these by-products. However, prediction of the side effects has been difficult due to our incomplete understanding of the regulation of metabolism in plants [50]. Thus, the use of metabolomic techniques as well as other omics tools to survey the wide range of metabolites is important to evaluate the substantial equivalence of GM crops [7, 12, 19]. Microarray and targeted metabolite analyses of OASA1D rice seedlings showed that the effect of the transgene was very limited, probably due to the low activity of the tryptophan-utilizing pathways in rice seedlings [23]. Profiling analyses of UV-active metabolites in the rice calli and seeds expressing the OASA1D transgene also demonstrated that drastic accumulation of tryptophan had little effects on the profiles of other UV-active metabolites except for the accumulation of a few minor indole alkaloids in these tissues [72, 100]. From these results, it can be predicted that the unchanged metabolic profiles are the consequences of the low secondary metabolic activity in rice, suggesting that transformation with OASA1D might prove effective in the breeding of crops having an increased tryptophan content.

In addition to the studies mentioned above, there have been a number of reports regarding the application of metabolomics to the production of GM rice; for example, to understand the correlation between gene overexpression and the concentrations of metabolites, the GM rice overexpressing the YK1 gene, which is known as the homologue of the HC-toxin reductase gene and has a dihydroflavonol-4-reductase activity, was subjected to metabolomic analysis using FT-ICR MS and CE–MS [92, 93]. Slight changes were detected in the amount of amino acids and organic acids in the leaves, roots, seeds, and calli of this GM rice. At the same time, increase in the level of glutathione derivatives was found in the calli. The observation on the relationship between the overexpression of the YK1 gene and accumulation of glutathione derivatives was hence summarized in this report. Moreover, for the elucidation of the varieties of mutants that have a small amount of phytic acid considered to be an antinutrient factor in rice grains, several types of low phytic acid-containing mutants (lpa) were analyzed by means of metabolite profiling using GC [33]. The metabolic changes in the mutant rice grown in different fields were also investigated. Using these metabolomic data, the effects of the fields in which the rice varieties were grown and the mutations on the concentrations of various metabolites were discussed. As is clear from these studies, metabolomic approaches may enable the understanding of the influences of genetic modifications on plants metabolite concentrations.

Molecular breeding of GM rice by introducing various transgenes is expected to be a promising approach for the production of rice cultivars with preferred agronomical traits [17, 105]. However, before GM rice enters the market, its potential risks need to be assessed for ensuring safety [63, 64]. Metabolomics combined with other allergenicity and toxicity tests [22, 38] plays an important role in risk management [19]. However, it should be noted that although the survey of the complete metabolome is required for detailed scrutiny of risks, the number of detectable metabolites by currently available metabolomic techniques is limited. Likewise, since the scientific procedures or guidelines for assessment of possible risks using metabolome data have not been established yet [21], ensuring substantial equivalence tends to require considerable work and sometimes might be impossible [38]. These drawbacks indicate that further improvement in metabolomics is essential for more comprehensive profiling and interpretation of rice metabolites.

Future prospects

Plant metabolomics has a possibility to be applied to various purposes in addition to the studies exemplified above. Recently, several Arabidopsis researches have demonstrated that integration analysis of transcriptomic and metabolomic datasets is a promising strategy for the functional identification of metabolism-related genes; for example, Hirai et al. [43, 44] processed the transcriptomic and metabolomic data of the Arabidopsis plants under the sulfur- and nitrogen-deficient conditions. In these studies, BL-SOM analysis clearly revealed the coordinated changes in the mRNA and metabolite levels in the stressed plants, thereby enabling the identification of the functions of several unknown genes [43]. Considerable transcriptomic data on Arabidopsis can be easily acquired on the web site of AtGenExpress (http://www.arabidopsis.org/info/expression/ATGenExpress.jsp), and the expression correlation data is now available in other databases such as ATTED-II [74]. The data in these databases makes it possible to identify the candidate genes for unknown biological processes by evaluating the correlation of the expression of these genes with the functionally identified genes [41, 76, 81]; these databases have facilitated the hunting of secondary metabolism-related genes of Arabidopsis [106]. In case of rice, Rice Expression Database (http://red.dna.affrc.go.jp/RED/) has been already released for the processing and mining of rice DNA micorarray data. The integration of transcriptomic and metabolomic data should be a robust tool in rice studies in the foreseeable future.

In a modern Arabidopsis research, the loss-of-function or gain-of-function mutant lines play a very important role in the functional genomics study. A combination of the metabolomic approach and these bioresources has been demonstrated to be a smart strategy to uncover the role of a functionally unknown gene. Using GC–MS and CE–MS, Watanabe et al. [102] obtained a metabolome dataset of Arabidopsis seedlings of a wild-type plant, and several T-DNA insertion mutants having immature bsas (β-substituted alanine synthase) genes. Statistical analyses revealed that one unknown metabolite was not accumulated in one of the bsas mutants, namely, bsas3;1, whereas this unknown metabolite was detectable in sufficient quantities in the wild-type and other mutants. Next, they identified the unknown metabolite as a unique dipeptide based on a comparison with an enzymatically synthesized dipeptide. The bsas3;1 gene was eventually estimated to be included in the biosynthesis of this dipeptide. Correspondingly, by the exploitation of large bioresources of rice mutants [2, 45, 47], the metabolomic approach might be able to reveal the functions of various rice genes.

Furthermore, in addition to the clarification of the function of single genes, metabolomics has the potential to determine the location of quantitative trait loci (QTL). QTL have traditionally been statistically analyzed by comparison between the genotypes of marker genes and the occurrence of the trait among naturally occurring genetic variants [5, 11, 27, 59]. Recently, transcripotmic information has been used for QTL, which is referred to as expression QTL (eQTL), thereby clarifying the genes directly related to metabolism, such as those coding for enzymes and transporters, and the regulator genes such as the transcription factors [16, 40, 53, 55, 56, 86]. In addition to transcriptomic data, metabolomic data may be capable of application to QTL analysis. Very recently, Lisec et al. [68] reported the results of metabolic QTL analysis of Arabidopsis. In this study, using recombinant inbred lines and introgression lines of Arabidopsis, biomass, and metabolic QTL were compared. Eventually, it was suggested that the amounts of several metabolites were strongly linked to Arabidopsis biomass. Because the variety and amounts of metabolites are directly connected to the important traits of plants such as biomass, nutrition, and taste, QTL analysis combined with metabolomics should be a promising method.

Figure 1 summarizes the possible role of metabolomics in a sequence of investigations for rice breeding. Conventional forward and reverse genetics, such as QTL mapping derived from the phenotypes of natural rice variants or target gene mapping of spontaneously or artificially occurring mutants, have revealed the functions of valuable genes [11, 47]. Since metabolomics is a potent tool for the identification of valuable gene functions, the role of beneficial genes that can be utilized to generate a useful rice cultivar might be also elucidated by metabolomic approaches. Meanwhile, crossbreeding based on QTL pyramiding or genetic modification of target genes are the current techniques that are used to develop improved cultivars. Affirmation of substantial equivalence is essential for safety assessment of the generated rice, particularly GM rice. Metabolomic techniques might be able to play a role in ascertaining whether unexpected and undesirable compounds are accumulated in GM rice.

Fig. 1
figure 1

The sequence of steps in rice breeding. Metabolomic techniques are applicable to develop useful rice cultivars having superior qualities and particularly for the clarification of the functions of valuable genes and safety assessment of GM rice.

The methodology in metabolomics tends to converge several analytical instruments such as directly infused high-performance MS, hyphenated MS, and NMR. However, the development of a more high-throughput and high-sensitive method is expected now. Simultaneously, with the innovations in analytical technologies and methods, a more useful and high-throughput data processing software is also expected. Furthermore, besides the existing metabolomic database that can be used for an encyclopedia of metabolites, such as the KNApSAcK, Golm Metabolite Database ([60], http://csbdb.mpimp-golm.mpg.de/csbdb/gmd/gmd.html), and MoToDB ([70], http://appliedbioinformatics.wur.nl/moto/), more metabolomic databases of other plants including rice should hopefully become available. On the other hand, the necessity of quality control in plant metabolomics has recently been proposed [32]. Standardization of metabolomic experiments from the biological study design to data analysis will not only facilitate metabolomic studies but will also exploit the existing metabolomic data. These technical improvements in metabolomic studies will contribute to the progress of the basic and applied studies on rice.