1 Introduction

Most domesticated crop plants were developed within a much more abbreviated time-scale than is often perceived, many only in the past few hundred years (Gepts 2002). Yet the genetic changes induced by selective breeding are such that major domesticated crops are typically represented by hundreds, even thousands, of unique cultivars specialized for production in a wide variety of geographic and societal settings. Whilst there are concerns regarding potential loss of genetic diversity through domestication (Hoisington et al. 1999; Lee 1998; Reif et al. 2005) it is clear that the hyper-mutable and genetically fluid character of plant life has facilitated the development of an extraordinary range of genetically distinct and metabolically diverse populations for any given crop.

Extensive variation in DNA sequence is typical within plants species. Transpositions and rearrangements are also common. These natural phenomena typically arise by non-programmed sequence-independent insertions and deletions and other modifications. Transposition activity can also be induced by stresses such as hydrostatic pressure (Long et al. 2006) or through conventional breeding (Shan et al. 2005). Individuals within a species can, through natural processes, contain different numbers of genes (Bradford et al. 2005a, b; Cellini et al. 2004). Contributing to further genetic plasticity, DNA methylation represents a layer of regulatory information beyond that encoded in the basic structure of the plant genome. This epigenetic process can greatly influence levels of gene expression. Recent studies in rice highlighted extensive changes in DNA methylation patterns (as well as gene copy number and sequence) following the introduction, through conventional breeding, of a chromosomal fragment from a related wild species (Liu et al. 2004; Long et al. 2006; Chen et al. 2006). These naturally occurring transformations and their induction through stresses and conventional breeding have impacted and will clearly continue to impact the genetic diversity of plants including domesticated crops and, potentially, the composition of foodstuffs derived from such plants.

Quality traits in important crops and ornamental plants are related to metabolic composition (Memelink 2004) whilst metabolic changes underpin plant development and responses to applied stresses. Furthermore, metabolic information is often viewed as reflecting biological endpoints more accurately than transcript or protein analysis. Indeed, the plant metabolome has been described as bridging the “genotype-phenotype gap” (Fiehn 2002; Hall 2006). As such, the science of metabolomics has proven to be of increased popularity and value in assessing genotypic and phenotypic diversity, in defining biochemical changes associated with developmental changes and organ differentiation during plant growth, and, increasingly, in comparative compositional studies. A brief illustration of such concepts is provided in the following examples prior to more extensive discussion.

Non-domesticated relatives of crop plants may retain stress and disease resistance traits that are compromised in modern cultivars developed through selective breeding for properties more directly related to yield performance and to consumer preferences. As such, Schauer et al. (2005, 2006) recently conducted a GC-MS-based profiling survey to investigate the relationship between metabolite content and stress tolerance in a number of wild species of tomato and in introgression lines. A high degree of variation in the levels of metabolites in selected wild tomato species and introgression lines was recorded, an observation that may imply a significant genetic resource for eliciting stress tolerance improvements in domesticated crops. This exemplifies the potential of metabolomics to associate metabolic diversity with differences in desirable agronomic traits, as will be discussed in more detail later.

The metabolic profiling technologies that can be applied to investigating metabolic changes between different plant species and cultivars can also be applied to understanding changes within plants as they mature. Targeted analyses of carbohydrates (Obiadalla-Ali et al. 2004), amino acids (Boggio et al. 2000), carotenoids and isoprenoids (Fraser et al. 1994) or steryl lipids (Whitaker 1988) on ripening tomato have been conducted whilst non-targeted evaluation of developing rice (Tarpley et al. 2005) has also proven highly informative. Pursuing a parallel analysis on both flesh and seed tissues of the same tomato fruits, Mounet et al. (2007) recently highlighted compositional similarities between the different tissues at early developmental stages and a metabolic trajectory that served to increasingly differentiate the tissues during development and at maturity. Understanding patters of metabolic changes during development will clearly influence breeding strategies and a role for metabolic profiling is clearly warranted.

Current profiling technologies have been shown to be effective in identifying and describing differences and similarities in the compositions of different plant cultivars and species. Significantly, recent reports of omics studies on biotechnology-derived plants indicate that these are typically more closely related to the isogenic parental strain used in their development than the parental strain is to other cultivars of the same species. This effect has been observed at the metabolome level for wheat (Baker et al. 2006) and potato (Defernez et al. 2004; Catchpole et al. 2005). Analogous results have been observed at the transcriptome level for wheat (Baudo et al. 2006) and grape (Franks et al. 2006) and at the proteome level for Arabidopsis thaliana (Ruebelt et al. 2006a, b, c), potato, (Lehesranta et al. 2005), tomato (Corpillo et al. 2004) and soybean (Wei et al. 2006).

The omics sciences continue to be applied to diverse areas of plant research including, for example, transcriptomic evaluations of heterosis or hybrid vigor in maize (Auger et al. 2005; Guo et al. 2006; Swanson-Wagner et al. 2006), and proteomic evaluations of the impact of different agricultural systems on potato tubers (Lehesranta et al. 2007). These studies, as with those described above, are aimed primarily at identifying and selecting genetic and metabolic features that associate with sought-after agronomic improvements. We now describe in more detail the potential of one of the omics sciences, metabolomics, in evaluating metabolic diversity in plants, and in supporting breeding programmes based on conventional and emerging paradigms such as the use of RNAi silencing.

2 Characteristics of metabolomics

The diverse physicochemical nature of metabolites and differences in their abundances and stabilities requires a range of diverse technologies for metabolome data acquisitions. Options principally centre on different variants of nuclear magnetic resonance spectroscopy (NMR) or mass spectrometry (MS) (see Dunn and Ellis 2005; Goodacre et al. 2004; Harrigan and Goodacre 2003; Vaidyanathan et al. 2005 for reviews). Metabolomics strategies exploiting such technologies are characterized by two different conceptual approaches (Goodacre et al. 2004) defined herein as “targeted” and “non-targeted”. Non-targeted approaches aim to provide a hypothesis-free global overview of readily detectable metabolites. The range and biochemical identity of metabolites measured in a given non-targeted metabolic profiling experiment is typically dependent on the data acquisition technology adopted. Consequently, the hypothesis-generating value of non-targeted approaches can be somewhat compromised by bias and selective reporting.

Targeted approaches employ optimized measurements designed for specific classes of metabolites or pathways. As such, they offer advantages in terms of improved quantitation and interpretability of acquired data. Strategies can be developed for assessing discrete classes of metabolites including, but not restricted to, organic acids, sugars, free amino acids and lipids. Indeed, metabolomic “sub-disciplines” are now developing as technologies for assessing different metabolite classes improve. For example, lipidomics, profiling technologies focused on lipids, represents an increasingly well-developed and influential aspect of metabolomics (Mutch et al. 2006; http://www.lipidmaps.org). Analytical platforms relevant to free amino acid analysis can be designed to measure the immediate biosynthetic precursors of target amino acids as well as associated anabolites and catabolites. Thus, Dubouzet et al. (2007) recently reported methodology to assess levels of anthranilate, tryptamine, serotonin and gramine in rice producing high levels of tryptophan, that could be integrated with information on all the major free amino acids (Wakasa et al. 2006). Studies on changes induced in free amino acid metabolism in lysine-producing maize can include measurements of lysine catabolites such as saccharopine and(α-aminoadipic acid (Huang et al. 2005).

Targeted approaches can also be considered to include the use of flux-based methodology (Ratcliffe and Schachar-Hill 2006). Here, specifically designed tracer-labelled substrates can be administered to a biological test system and the fate of the substrate monitored as it is metabolized and distributed throughout the test system. At present, this approach still remains somewhat underrepresented in metabolomics but clearly has great promise in associating metabolic changes with phenotypic diversity (see Last et al. 2007 for an comprehensive review).

Many metabolic profiling studies adopt a two-tiered hierarchical approach that incorporates both non-targeted and targeted elements (Goodacre et al. 2004; Catchpole et al. 2005). Thus, for example, a rudimentary assessment of composition can be provided through “fingerprinting” analyses such as a rapid flow-injection MS (Vaidyanathan et al. 2002) as a pre-determinant to more targeted and quantitative measurements using chromatographic and higher resolution methods. Such hierarchical approaches have been applied to “differentiation” studies, for example in microbial classification (Vaidyanathan et al. 2002) or in compositional comparisons of closely related crops (Catchpole et al. 2005; Enot et al. 2007). One attractive feature of high-throughput “fingerprinting” measurements is that their short analysis time (typically < 1 min) can allow a greater number of technical and biological replicates to be incorporated into an experiment,. thus enhancing the statistical validity of generated results (Enot and Draper 2007).

Cellular and sub-cellular compartmentalization of different biochemical processes is an important aspect of metabolic regulation (Browsher and Tobin 2001; Lunn 2007). Thus, a deeper understanding of plant metabolism requires a qualitative and quantitative description of the metabolome within different compartments of the cell. One promising approach is the use of non-aqueous fractionation (NAQF) to separate, through ultra-centrifugation, sub-cellular compartments under biochemically and enzymatically inactive conditions (Benkeblia et al. 2007) followed by metabolic profiling of the compartmental fractions. This approach has not been extensively used, but aspects of spatial resolution in metabolomic studies could prove an important component in studies of developmental changes within a growing plant tissue.

Metabolomics, which shares the same etymological root as metabolism (metabol, GK. change) ultimately seeks to understand changes in biological systems in terms of changes in metabolism. To fully realize such a goal, an ideal metabolomics methodology would allow a quantitative global analysis of metabolite content and of metabolic flux, including information on spatial resolution. At present however, available metabolic profiling methods (Dunn and Ellis 2005; Goodacre et al. 2004; Harrigan and Goodacre 2003; Vaidyanathan et al. 2005) allow identification of only a subset of previously described metabolites, and due to technical limitations and the diverse physicochemical properties and relative abundances of metabolite analytes, accurate quantitation is not always achievable. However the principle that the metabolome more accurately reflects the phenotype (Fiehn 2002) [an altogether different assertion than saying that current metabolic profiling methods actually capture the (metabolome) phenotype] continues to drive improvements in analytical technologies, informatics, standardized approaches to data reporting and dissemination (Fiehn et al. 2005; Jenkins et al. 2004).

Despite the ever-expanding range of technological options, analytical sophistication, and informatics capabilities, there remain fundamental concerns related to the acquisition, interpretation and statistical treatments of omics data, particularly non-targeted approaches (Broadhurst and Kell 2006; Lay et al. 2006). No discussion on the characteristics and applications of metabolomics would be complete without acknowledging this. Dangers include (but are not restricted to) (i) bias, unintentional or otherwise, (ii) inadequate sample size relative to the large number of measured variables in a typical non-targeted omics experiment [see Jorstad et al. (2007) for an informative discussion on sample requirements for plant transcriptomics experiments] (iii) over-fitting when utilizing supervised machine learning (iv) excessive false discovery rates due to multiple hypothesis testing, (v) artifacts and drift inherent in data acquisition technology, and (vi) artifacts introduced through improper or variable sample preparation. The reader is referred to Broadhurst and Kell (2006) and Lay et al. (2006) for highly illuminating discussions.

3 Metabolomics applications

The scope of potential applications of plant metabolomics is clearly extensive. The examples now discussed in more detail are not presented as a comprehensive survey but are intended to highlight the value of metabolomics in assessments of biological diversity, biochemical changes during development, metabolic engineering as a component of enhancing germplasm diversity, and in comparative compositional studies.

3.1 Assessment of biological diversity through population screening

As indicated earlier, the genetic and phenotypic diversity of wild relatives of important crops may prove to be an important resource in eliciting new traits of value to the consumer and environment alike. Using tomato (Solanum lycopersicum) as a model system, Schauer et al. (2005) sought to evaluate the contribution of metabolic diversity to differences in the trait attributes of non-domesticated species. The goal was essentially to identify biochemical markers that could be directly associated with desired traits and applied to direct progeny selection when crossed with domesticated counterparts. Using a GC-MS methodology that allowed quantitation of over 90 metabolites including organic acids, sugars, polyols, amino acids and a limited number of secondary metabolites, profiles were generated that differentiated the fruit and leaves of tested wild species from that of a modern domesticated cultivar. A conclusion from this study was that “...the wide metabolic variance of primary metabolites in fruits of the wild species suggests that.....boosting the levels of nutritionally important metabolites such as lysine, methionine, ascorbate and tocopherol will stand a high chance of success”. The higher levels of secondary metabolites in the wild species also suggested a valuable resource for flavor and color compounds such as volatiles and carotenoids.

In an analogous study designed to correlate metabolic traits with abiotic stress resistance, Semel et al. (2007) assessed the impact of water restriction on yield performance and metabolite content of the domesticated tomato variety, M82, and the F1 hybrid of its cross with the wild tomato, S. pennellii. The fruit yield and brix (soluble organic material) content of the F1 hybrid were less affected by water restriction than that of M82. This was also true of the generated metabolic profiles; whilst drought-induced changes in the F1 hybrid were modest, implying tight homeostatic control, water restriction induced profound metabolic changes in M82. These changes included dramatic elevations in the levels of amino acids, TCA cycle intermediates, sugars and polyols. Many of these changes were consistent with current knowledge about plant osmoprotectants and compatible solutes. Thus, increases in levels of proline, along with its biosynthetic precursor, glutamate, were consistent with its postulated stress-protective role whereas increase in levels of glycine most probably reflected accumulation of glycine betaine, a well-known compatible solute. Other less obvious findings, such as increases in the levels of branched amino acids, certain TCA metabolites, and gentobiose, may represent new opportunities for utilizing metabolite-based research to guide breeding improvements in abiotic stress tolerance. The relatively high levels of most major metabolites measured in the drought tolerant F1 hybrid fruit may imply some form of “metabolic priming” designed to withstand abiotic stress. This intriguing hypothesis is consistent with the observation that the metabolite composition of the irrigated F1 hybrid fruit is similar to that of the M82 variety grown under water restriction.

In summary, these results may identify opportunities to improve drought tolerance in a valuable crop such as tomato, and provide new metabolite biomarkers to support breeding efforts in this area. Whilst the biochemical basis of complex traits such as stress tolerance is clearly multi-factorial, the importance of molecules such as proline and glycine betaine has been identified in other major crops. Studies associating stress tolerance with proline accumulation in maize seedlings (Raymond and Smirnoff 2002) and roots (Verslues and Sharp 1999) have been reported, for example. Levels of glycine betaine in maize leaf are known to increase in growing seasons which experience drought (Saneoka et al. 1995, Yang et al. 1995).

Levels of metabolites can also be monitored in introgression lines carrying wild species alleles and metabolic differences mapped to the introgressed region and its associated phenotype. Overy et al. (2005) reported one of the first metabolite profiling studies on introgression lines. They selected six plants (based on fruit characteristics) from a library of 76 tomato introgression lines comprised of wild S. pennellii alleles incorporated into the domesticated S. lycopersicum. Analysis of polar extracts of ripe S. lycopersicum and S. penellii fruit through non-targeted direct-injection-MS indicated differences in the relative levels of numerous metabolites, primarily organic acids and sugars. Subtle differences in the metabolism of the introgression lines when compared to the parent lines were also observed but not further defined in this study.

The experiment by Overy et al. (2005) was subsequently followed by a more rigorous evaluation of the entire library of 76 introgression lines using a GC-MS-based profiling platform that allowed quantitation of 74 metabolites (Schauer et al. 2006). Since levels of the majority of these metabolites were known to be higher in S. penellii fruit relative to their levels in the domesticated tomato, the finding that most metabolites in the fruit of the introgression lines were also elevated was unsurprising. However, it was also shown that many metabolites, for example, certain TCA cycle intermediates, were uniquely elevated in the introgression lines with levels greater than that observed in the S. pennelllii and S. lycopersicum parents. Whilst of potential relevance to breeding for improved traits the mechanism underlying the events that induce these metabolic features is not immediately apparent.

Assessments of the relationship between the 74 quantified metabolites and yield-associated morphological traits were conducted. Correlation analysis revealed three major network modules, as well as a few minor “loosely connected” modules, characterized principally by their degree of interconnectivity with other module co-members. One of the major modules represented phosphorylated intermediates and the yield-associated morphological traits, the second included amino acids, and the third contained organic acids and sugars. In addition to the high interconnectivity within the three major modules, many metabolites also exhibited high interconnectivity between modules. Some of these observations might be anticipated for signalling molecules such as γ-aminobutyric acid and inositiol-1-phosphate; however reasons for the high interconnectivity of metabolites such as glycerol-1-phosphate, for example, appear less obvious. Further analysis demonstrated that free amino acids and sugar phosphates were “morphologically-associated”, that is, changes in their levels were associated with changes in at least one yield-associated morphological trait, whereas other metabolites such as a number of fatty acids and minor sugars were “morphologically independent”. A conclusion from the study was that “morphologically associated” metabolites generally comprised central pathways of intermediary metabolism, an observation that did not appear to apply to “morphologically independent” metabolites. With the caveat that the metabolite coverage in this experiment extended to only 74 analytes, Schauer et al. (2006) suggested that manipulation of less central metabolism may prove to have less impact on plant morphology than manipulation of more central pathways. The application of such an observation may depend primarily on how pathway modification is achieved, including factors such as tissue specificity. For example, increases in lycopene and β-carotene levels in fruits of tomato modified through incorporation of Arabidopsis DET1, a nuclear protein implicated in regulating light transduction, were accompanied by pronounced yield decreases (Davuluri et al. 2004, 2005). Association with a fruit-specific promoter, however, might be reasonably predicted to increase levels of lycopene and β-carotene levels with less impact on overall plant development.

Ultimately, as suggested by Giovannoni (2006) the metabolic genomics approach described above may “...lead to modified crop breeding strategies and to a clearer understanding of the basic plant biology underlying important fruit quality traits”.

3.2 Metabolic changes during plant development

Developmental changes in the synthesis, transport, accumulation and breakdown of metabolites influence the compositional and nutritional qualities of foodstuffs. Understanding these metabolic transitions should lead to improved approaches designed to enhance the nutritional quality of major crops. Adopting a variant of hierarchical metabolic profiling in a study on early development in rice, Tarpley et al. (2005) utilized a non-targeted GC-MS based platform that allowed identification of 155 metabolites. Analysis of profiles generated on seedlings grown over 10 days allowed identification of 21 metabolites that accounted for 83% of metabolite variance associated with developmental changes. It was suggested by the authors that future developmental studies could reasonably be conducted based on assessments on only these 21 metabolites. Comprising primarily organic acids, several sugars and amino acids this “biomarker set” essentially provides the advantages of a targeted approach whilst still representing extensive metabolite coverage.

Targeted analyses have been conducted on many metabolites during fruit development, particularly in tomato. Analytes evaluated include carbohydrates (Obiadalla-Ali et al. 2004), amino acids (Boggio et al. 2000), carotenoids and isoprenoids (Fraser et al. 1994) and steryl lipids (Whitaker 1988). However, as explained by Mounet et al. (2007), corresponding analyses of seed and flesh tissues from the same biological material may provide novel insights into the relationships between metabolic changes and developmental patterns in seed and fruit. The use of a non-targeted approach utilizing nuclear magnetic resonance spectroscopy (NMR) in conjunction with a targeted approach utilizing HPLC-photodiode array spectroscopy (PDA) and GC-flame-ionization detection (FID) allowed quantitation of 50 metabolites, of which 44 were detected throughout a 45-day development period in both flesh and seed samples. Expectedly, there were clear differences between the metabolic composition of seeds and flesh at all developmental stages. Differences which became more pronounced during development from 8 to 45 days post-anthesis included changes in glucose and fructose, free amino acids, and isoprenoids, which greatly increased in flesh, paralleled by a concomitant decrease in seed. Most intriguingly however, a high degree of compositional similarity was found at early stages of development. Thus, at 8 days post-anthesis concentrations in mannose, choline, N-methylnicotinamide and the major fatty acids were similar in flesh and seeds. The high levels of choline, a precursor for phosphatidylycholine, an important membrane component and N-methlynicotinamide, a small molecule associated with cell cycle regulation, suggest that the observed compositional similarities is a function of their low differentiation and active cell proliferation at early development stages. Parallel trends in flesh and seeds were observed for several metabolites during development including mutual decreases in the levels of choline, chlorogenate, N-methlynicotinamide, and fatty acids. A parallel transitory increase in starch followed by a decline was also observed. The relevance of these to growth or differentiation in fleshy and seed tissues could provide further insights into how breeding approaches can be optimized. Indeed, in a general observation by Tarpely et al. (2005) progress in metabolic profiling in this area can address our “…lack of knowledge of the broad changes in metabolite patterns during development [that] limits our efficiency to manipulate the cellular or molecular aspects of plant development with intent to influence yield or sustainability of production”.

3.3 Metabolic changes and modification of regulatory genes

Many conventionally bred improvements in desirable agricultural traits have been associated with modulations of transcription factors and other regulatory proteins. The impact of alterations in levels of regulatory proteins has been well illustrated by the classic example of the TCP transcription factors encoded by the teosinte branched1 (tb1) locus that regulate the expression of morphological traits known to differentiate wild Mexican grass teosinte from its domesticated counterpart, maize (Cubas et al. 1999). Many mutations in transcription factors, as opposed to changes in their expressed levels, have also been associated with agriculturally and commercially significant trait enhancements. Many of the yield gains achieved in wheat during the “Green Revolution” are attributable to mutant alleles in the reduced height-1 loci (Peng et al. 1999) whereas mutation of the opaque 2 gene that encodes a bZIP transcription factor was pivotal to the development of Quality Protein Maize (QPM), the winner of the World Food Prize in 2000 (Prasanna et al. 2001; Vietmeyer 2000). A spontaneous mutation in the MADS-box transcription ripening-inhibitor (rin) has been widely exploited to yield “commercially ubiquitous” tomato plants that exhibit an extended shelf life (Vrebalov et al. 2002). Indeed, it could be argued that domestication, breeding, and diversification of modern crops have been driven, in large part, by positive selection in the regulatory machinery underpinning gene expression. As such, natural mutations of regulatory proteins can therefore be considered to have played a transforming role in plant biology and in agricultural and societal developments.

Eliciting trait improvement through modern biotechnological approaches to modify regulatory protein expression may prove to be an attractive and viable option, and is, in a sense, an approach that mimics natural selection. However, plants encode many nuclear protein families (Liu et al. 1999; Riechmann et al. 2000) and, to date, the function of only a small fraction of these has been determined. In attempting to provide a framework for functional classification of transcription factor families. Grotewold (2005) suggested that some families can be considered to be associated with “core” processes such as organ development, for example, whereas other families may regulate more “peripheral” processes such as plant form or secondary metabolism. Whether this binary classification of regulatory proteins will prove to be generally applicable (see earlier discussion of DET1 and its impact on both tomato plant development and secondary metabolism) remains to be determined, but clearly a fuller understanding of the structure, function, and distribution of such proteins is required to facilitate developments in the use of transcription factor biology as a tool in trait development.

Considerable attention has been paid to two related transcription factor families in maize, the MYB C1 family and the basic-helix-loop-helix (bHLH) R family. Both of these are associated with the “peripheral” process of secondary metabolism and more specifically the regulation of phenylpropanoid and flavonoid synthesis (Grotewold et al. 1998, 2000; Grotewold 2005). Expression of C1 and R appear to be required for complete anthocyanin synthesis. Another MYB protein, P1 is associated with the biosynthetically related phlobaphenes (Grotewold 2005).

Clearly, advantages in studying secondary metabolites centre on their potential agronomic and commercial significance as well as their amenability to investigation through targeted metabolic profiling. Maize transcription factor genes controlling flavonoid synthesis have now been expressed in various transgenic plant species and their activation generally monitored by resultant production of flavonoids and anthocyanin pigmentation. Heterologous expression of maize LC and C1, for example, can upregulate flavonol synthesis in tomato flesh, a tissue that does not normally accumulate these secondary metabolites (Bovy et al. 2002). At least one non-targeted metabolic profiling experiment on the transgenic expression of MYB transcription factors has been pursued. A Fourier-transform-MS analysis of Arabidopsis thaliana over-expressing the MYB transcription factor PAP1 Tohge et al. (2005) allowed measurement of 1,800 putative metabolites confirming accumulation of anthocyanins and quercetin-type flavonols derivatives including at least eight new metabolites. Little discussion on changes in metabolites not directly associated with flavonoid chemistry was provided though it appeared that levels of immediate intermediates of the flavonoid biosynthetic pathways were markedly decreased. Transcriptomic analyses identified a total of 38 genes whose expression was modified by MYB PAP1 expression. Most of these had putative functions, such as glycosyltransferase activity, for example, that correlated with the detected changes in the metabolome. These results suggest minimal effects on the metabolome and transcriptome through transcription factor modification, and that the influence on flavonoid chemistry was highly specific.

3.4 Metabolic engineering

Metabolic engineering seeks to modify the tissue amounts or chemical structures of specific metabolites through changes in the levels or activities of biosynthetic or catabolic enzymes, or the introduction of novel enzyme activities (see Facchini et al. 2000; Hughes and Shanks 2002; Kutchan 2005; Larkin and Harrigan 2007; Sato et al. 2001; Trethewey 2004; Verpoorte and Memelink 2002, for reviews). Predicting which enzymatic activities of a complex metabolic pathway must be modified to successfully introduce desired phenotypic changes remains a challenge despite developments in availability of modelling tools such as Metabolic Control Analysis (MCA) (Cornish-Bowden 1995; Kacser and Burns 1973; Rees and Hill 1994) and Biochemical Systems Theory (BST) (Voit and Radivoyevitch 2000). As pointed out by Larkin and Harrigan (2007), however, transformation of most crops through transgenic technologies can usually be achieved, and current metabolic profiling technologies render it relatively straightforward to discover influential steps in a target pathway. Thus, for each enzymatic step where a corresponding gene is available, transgenic perturbations can be introduced, and metabolomics applied, to investigate control points of metabolite synthesis and accumulation. In other words, current applied transgenic and metabolomic technologies mitigate, at least partially, any current limitations in predictive analyses. This argument also implies that targeted metabolic profiling is an important tool in assessing the value and efficacy of transgenic approaches to broadening the genetic diversity of existing germplasm. Significantly, profiling analyses of metabolically engineered plants have highlighted that in some instances, pronounced secondary effects in the metabolome and transcriptome can occur in crops modified with even exquisite genetic specificity, whereas in others, the same degree of specificity effects negligible secondary changes. It has become increasingly apparent that the nature of any pathway or trait is more relevant to the potential for secondary effects than the means by which that trait is introduced. More specifically, the dynamic nature of a pathway, likelihood of metabolon channelling, localization of major intermediates and co-factors and associated transport phenomena contributes more to the potential for secondary effects than the issue of whether genetic modification is pursued through modern biotechnology, conventional breeding or other approaches such as irradiation or mutagenesis.

Recently, the entire biosynthetic pathway for dhurrin, a tyrosine-derived cyanogenic glucoside, was incorporated into A. thaliana (Kristensen et al. 2005; Jǿrgensen et al. 2005; Tattersall et al. 2001) (see Fig. 1). This required transgenic expression of two multifunctional cytochrome P450s and a glucosyl transferase from Sorghum bicolour. The observed dramatic increase in levels of the anti-insectant dhurrin was not accompanied by any impact on plant morphology. Focused and non-targeted profiling studies indicated only marginal effects on the metabolome and transcriptome relative to wild type. For example, the pool of free amino acids was unaffected by the channelling of tyrosine into dhurrin synthesis and changes in measured flavonoid glycosides and glucosylated benzoic acid metabolites as assessed by LC-MS were minimal. The transcriptome was also unaffected as assessed by microarray analyses. However, in fascinating contrast, partial expression of the dhurrin biosynthetic pathway through incorporation of only the two S. bicolour multifunctional cytochrome P450s with no concomitant glucosyltransferase incorporation led to alterations in the metabolome and transcriptome as well as stunted plant growth.

Fig. 1
figure 1

Representation of the dhurrin biosynthetic pathway. Transgenic incorporation of the entire pathway or only the first catalytic step leads to minimal secondary effects. Incorporation of CYP79A1 and CYP71E1 but not UGT85B1 leads to dramatic changes in morphology and the metabolome as well as transcriptome alterations. These changes may be associated with increases in levels of p-hydroxymandelonitrile

Metabolic profiling demonstrated that this observation coincided with accumulation of the dhurrin intermediate, p-hydroxymandelonitrile (Fig. 1). In S. bicolour, the two multifunctional cytochrome P450s and glucosyl transferase of the dhurrin biosynthetic pathway may form a metabolon allowing channelling of all intermediates. The complete incorporation of the entire metabolon appears to be required to minimize inadvertent secondary effects, whereas incomplete incorporation results in accumulation of an intermediate that induces profound phenotypic effects.

Significantly, Franks et al. (2006) introduced the S. bicolour derived dhurrin pathway into grapevine (Vitis vinifera) and noted no difference in patterns of gene expression between dhurrin-positive and dhurrin negative lines, consistent with the findings of Jǿrgensen et al. (2005), Kristensen et al. (2005) and Tattersall et al. (2001).

The Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia has performed some remarkable metabolic engineering in plants including the first example of opium-free poppies (Allen et al. 2004; Millgate et al. 2004; Larkin et al. 2007). This programme has utilized transgenic expression and RNAi silencing of enzymes involved in the benzylisoquinoline pathway to modulate expression of morphine and pharmaceutically important products such as thebaine which serves as a precursor for the chemical synthesis of analgesics such as buprenorphine and oxycodone

Metabolic profiling indicated that over-expression of codeinone reductase (COR) which catalyzes the penultimate step in morphine synthesis (see Fig. 2) resulted not only in the intended increase in morphine and codeine levels, but was accompanied by an increase in levels of the upstream intermediate, thebaine. Details regarding regulation of the cellular and sub-cellular localization of the morphinan pathway still remain unclear, and no definite reasons for this secondary effect can be established at present. RNAi silencing of the same enzyme (Allen et al. 2004) was also characterized by unexpected changes in benzylisoquinoline metabolism. Whilst the intended decrease in morphine and codeine was observed, secondary accumulation of the biosynthetic precursor (S)-reticuline (as well as methylated derivatives), several enzymatic steps upstream of codeinone synthesis, was not predicted. This accumulation may indicate a feedback mechanism preventing intermediates from general benzylisoquinoline metabolism entering the morphine-specific branch. A testable hypothesis for the observed long-range effect is that COR comprises part of a metabolon that includes the enzyme that acts on (S)-reticuline, namely 1,2-dehydroreticulinium ion synthase; loss of COR would suppress the activity of this and other enzymes of the postulated metabolon.

Fig. 2
figure 2

A partial representation of the benzylisoquinoline/morphinan pathway. Over-expression of codeinone reductase leads to increases in the levels of codeine and morphine as expected but is also accompanied by an increase in thebaine. RNAi silencing of codeinone reductase leads to accumulation of (S) reticuline

Increasing levels of protein and free amino acid in plants represents a significant approach to enhancing the nutritional quality of foodstuffs. This area not only offers opportunities for emerging technical approaches such as RNAi silencing (e.g., Huang et al. 2005; Segal et al. 2003) but highlights the inappropriateness of selective associations of “unintended” effects with biotechnology-based approaches over genetic modification through conventional breeding. The opaque 2 mutation, discussed earlier, which was critical to the development of QPM is associated not only with the intended elevated levels of total lysine in grain due to down-regulation in the expression of lysine-poor zein storage proteins but also associated with changes in the levels of numerous free amino acids in the mature maize grain (Wang and Larkins 2001). The extent of this secondary effect appears to be germplasm dependent and several factors such as transport, increased amino acid synthesis, or reduced rates of amino acid incorporation into protein can, at least in principle, contribute (Wang and Larkins 2001). Regardless of mechanism, genetic modification of zein content through conventional breeding as a means to increase lysine levels is associated with pronounced secondary effects in free amino acid metabolism.

In contrast, transgenic incorporation of a mutant tryptophan-insensitive anthranilate synthase gene (OSAID) into a common rice cultivar is associated with markedly elevated levels of free tryptophan and only relatively minor changes in other free amino acids (Wakasa et al. 2006). Significantly, levels of phenylalanine and tyrosine, which as with tryptophan are derived from the shikimate pathway, were also relatively unaffected by OSAID incorporation. Analysis was conducted to determine whether changes in the levels of tryptophan impacted levels of precursors such as anthranilic acid and downstream intermediates including, tryptamine, serotonin, melanonin, indole and gramine, metabolites all known to be present in rice. Targeted analyses through LC-MS/MS showed no substantial differences (Dubouzet et al. 2007). Levels of tryptamine and serotonin were slightly increased, but fold changes were much less than that observed for tryptophan. The anthranilate levels were increased but remained at less than 0.1% of the levels of tryptophan indicating that it is effectively processed. Dubouzet et al. (2007) also pursued microarray analysis to investigate why so little tryptophan is channelled into downstream products. A monofactorial analysis of variance revealed only 70 genes that showed differential expression at P < 0.01, the statistical threshold used in this study. The analysis further revealed that there were no significant changes in the transcription of genes encoding enzymes of the phenylalanine and tyrosine metabolism or the shikimate pathway. A conclusion from this study was that the genes needed to convert tryptophan into other metabolites did not respond proportionately to the increased levels of substrate. The results may also imply that the sub-cellular and cellular distribution of intermediates, and the apparent lack of a metabolon of the type associated with dhurrin synthesis contribute to the minimal effects on the metabolome and transcriptome in rice modified by transgenic incorporation of OSAID.

3.5 Compositional comparisons

Compositional studies spanning over a decade of research on new biotechnology-derived crops have highlighted consistently the similarities of these crops with their conventional counterparts. These studies include assessments of soy (Padgette et al. 1996; Taylor et al. 1999), corn (Ridley et al. 2002; Sidhu et al. 2000; Herman et al. 2004), cotton (Bertrand et al. 2005; Nida et al. 1996), rice (Oberdoerfer et al. 2005), wheat (Obert et al. 2004), and alfalfa (McCann et al. 2006). As mentioned earlier, reports of omics studies on biotechnology-derived plants indicate that these are typically more closely related to the isogenic parental strain used in their development than the parental strain is to other members of the same species (see earlier references).

As described earlier, profiling analyses of metabolically engineered plants have highlighted that in some instances, pronounced secondary effects on the metabolome and transcriptome can occur in crops modified with even exquisite genetic specificity, whereas in others, the same degree of specificity effects negligible secondary effects. It is most probable that the nature of any introduced trait is more relevant to the possibility of secondary effects than the means by which that trait is introduced. More specifically, the dynamic nature of a pathway, prospects for metabolon channelling, localization of major intermediates and co-factors and associated transport phenomena contributes more to the potential for secondary effects than the issue of whether genetic modification is pursued through modern biotechnology, conventional breeding or other approaches such as irradiation and mutagenesis. It is clear that targeted profiling can contribute to discussions on mechanism and biochemical theory and prove extremely valuable in determining which traits and pathways are most resistant or labile to secondary effects. Thus, targeted metabolomics should prove of immense value in contributing to evaluation of new traits at the early stages of discovery and selection of prospective candidates prior to regulatory submission. It must be acknowledged however, that other non-targeted metabolic profiling approaches to comparative compositional studies have proven quite informative. As an illustrative example, the hierarchical approach, described earlier, was recently applied to field-grown tubers from conventional potatoes and two biotechnology-derived varieties developed to yield high levels of inulin-type fructans (Catchpole et al. 2005). The first such variety was derived through incorporation of a gene encoding sucrose:sucrose 1-fructosyltransferase (SST), an enzyme that catalyzes the production of the trisaccharide 1-kestose, and oligofructans (Hellwege et al. 2000). The second experimental variety was derived through incorporation of a gene encoding fructan:fructan 1-fructosyltransferase (FFT), the product of which utilizes 1-kestose (and other oligofructans) to build inulin polymers (Hellwege et al. 2000). Preliminary flow-injection MS analyses of extracts from these varieties and the conventional potato cultivars highlighted extensive metabolic variation in all plants tested. A subsequent higher resolution gas chromatography-time-of-flight-MS approach allowed automated measurement of 252 analytes (90 positively identified, 89 assigned to a specific metabolite class, and 73 classified as unknowns) on over 2200 tubers (∼160 per cultivar). This higher resolution data confirmed the large variation in the metabolic profiles and composition of the conventional cultivars. In summary, the metabolite composition of field-grown biotechnology-derived potatoes was consistent with the natural metabolite range of conventional cultivars and these lines differed from the progenitor line only in the intended elevation of levels of fructans and their derivatives.

4 Concluding remarks

Genetic diversity is required to elicit trait improvements in domesticated crops, yet concerns remain that crop domestication serves to deplete such diversity. Although plants demonstrate remarkable genetic plasticity including random DNA arrangements and transpositions, systematic approaches to enhancing agriculturally relevant diversity are required. Two major strategies were discussed in this review; the use of (i) non-domesticated sexually compatible relatives of major crops and (ii) biotechnology-based approaches including such emerging technologies as RNAi silencing. It was argued that metabolomics can contribute to both strategies and also play a major role in issues related to comparative compositional strategies. Notably, a previous document (ILSI 2004) reasons that application of metabolic profiling technologies to the comparative safety assessment process will require completion of efforts to standardize the reporting structure of such data, establishment of public repositories on baseline metabolomes of major crops including data generated from samples from diverse genetic backgrounds and environments. Thus, metabolic profiling can be used to identify biochemical traits in wild relatives of major crops and in introgression lines and correlation analyses further pursued to associate metabolic networks with morphological and other desired traits. The efficacy of transgene incorporation and the possibility of secondary effects can also be pursued through metabolomics, particularly targeted approaches.

Comparisons of results from biotechnology-based metabolic engineering with that of conventional breeding have established that the nature of any introduced trait or metabolic pathway is more relevant to the possibility of secondary effects than the breeding alternatives by which that trait is introduced. Furthermore, understanding pathway dynamics will prove critical not only to comparative compositional assessments but in enhancing the efficacy of transgenic trait enhancement. This was highlighted by the fact that transgenic expression in A. thaliana of the entire S. bicolour dhurrin pathway induced, not only its intended anti-herbivory effect, markedly less secondary effects than incorporation of only a subset of pathway members. Thus, the greatest contribution of metabolomics may lie in supporting the development of biochemical theory allowing predictive analyses to become a reality. It is worthwhile recognizing that the interaction of free amino acid metabolic networks with protein composition, an area one would expect to be thoroughly well researched in the pursuit of nutritionally enhanced crops remains only partially understood and remains an active area of research (e.g., Galili et al. 2006). Hopefully, the integration of metabolomics with biotechnology-based metabolic engineering and with conventional breeding will bring us closer to fulfilling the promise of enhanced diversity and its implications for sustaining agricultural practices that benefit growers, consumers and the environment alike.