Introduction

The rising demand for energy, coupled with uncertain sources of fossil fuels and concerns over the effects of increasing carbon dioxide, has contributed to the search for alternative energy sources. In this sense, the translation of biomass derived from crop plants into biofuels has emerged as an attractive solution. Currently, sucrose from sugarcane and starch from corn are used as feedstock for ethanol fermentation. However, there is increasing interest in using the bulk of plant biomass in the form of cell walls [1],[2] or triacylglycerols [3] to meet the energy needs of the future. Some attempts to make the production of lignocellulosic biofuels possible include the development of strategies to harness structural sugars from plant cell walls by prospecting novel microbial enzymes and biomass-oriented plant breeding. However, to achieve this, it is necessary to understand the molecular mechanisms underlying plant biomass production [4] and microbial conversion pathways [5],[6]. These processes rely on complex signaling networks closely linked to the metabolism. Therefore, understanding how plants and microorganisms grow in response to environmental stimuli and how they can adjust their metabolic ratios could also provide means to simplify the conversion of biomass into biofuels [7].

Recent advances in high-throughput technologies and analytic methods, such as transcriptomics and metabolomics, have enabled measurements of phenotypic variations at the molecular level. The metabolome, which represents the chemical composition of all small molecules in a cell or organism under certain conditions, allows for a global view of cellular and physiological functions. In comparison to other technologies, metabolomics is cheaper per sample than transcriptomics and is not reliant on the availability of genome sequences [8],[9]. Therefore, it is considered a powerful tool for the unbiased characterization of genotypes and phenotypes with application in multiple areas, such as evaluation of genetically modified organisms [10]–[12], functional genomics [13]–[17], responses to environmental factors [18]–[23], metabolic engineering [24],[25], and quantitative genetics [26]–[30]. Furthermore, metabolomics, together with multivariate and correlation analyses, is an excellent tool for the study of systems biology, being widely recognized as the cornerstone of this emerging area [22],[31]–[34]. In this review, we focus on how metabolomics approaches have been used to identify novel metabolic routes for microbial biomass conversion and also highlight biochemical pathways important for plant biomass production. Furthermore, we provide a brief overview of advancements in analytic techniques and data analysis commonly used in metabolomics studies.

Review

Overview of metabolomics approaches

Despite significant advancement in analytic tools, complete coverage of the metabolome will always be constrained by polarity, stability, dynamic range, and biological properties of metabolites [35]. Currently, there is no single technology available for the detection of all metabolites present in an organism [36]–[38]. Therefore, the optimal choice for an analytic technology will largely depend on the goal of each study and is usually a compromise of selectivity and speed [36]. Because of this, numerous protocols for metabolite analysis have been developed to cover a broad range of compound classes, which are frequently characterized by the following workflow, as summarized in Figure 1: (i) material of choice, (ii) sample preparation and extraction, (iii) analytical methods, (iv) data processing and (v) data analysis and interpretation.

Figure 1
figure 1

Material of choice (A), sample preparation and extraction (B), analytical methods (C), data processing (D), and data analysis and interpretation (E).

Sample preparation and extraction

The choice and optimization of sample preparation procedures are crucial steps for metabolomics analysis, as the efficiency and balance of compounds moving from the biological sample into the extract will determine the quality of the extract and thus must represent the original material [39],[40]. At this stage, one must carefully select the sample material (e.g., cells, tissues, organism), conditions for cultivation, and the strategy for harvesting, ensuring a minimal number of replicates (Figure 1A).

Immediately after harvesting, inactivation of the metabolic processes is critical to avoid the loss of metabolites with high turnover (Figure 1B). The most common method employed for this end is rapid freezing by liquid nitrogen, although quenching or acidic treatment are also appropriate alternatives [41]–[47].

Different extraction protocols tailor to different compound classes based on several solvents (e.g., methanol/water/chloroform mixture used for simultaneous extraction of both hydrophilic and hydrophobic compounds or solid phase extraction used for volatile metabolites [40]), limiting coverage of the metabolome (Figure 1B). Due to the fact that metabolites differ greatly in their concentrations, structures, and chemical behaviors, extraction buffer composition, temperature, and period of extraction must all be optimized. Additionally, as the extraction process introduces an element of bias (not all compounds will be extracted with the same efficiency), this will be reflected in the analytes detected and measured [35],[37],[40]. In order to assess metabolite recovery from the sample, internal standards are often spiked in the extraction buffer, functioning also as a correction factor for quantification [37] (Figure 1B). Finally, depending on the choice of the analytic technology, a derivatization step or re-suspension in a solvent compatible with the chromatographic separation might be necessary.

Separation and detection: analytic techniques

In recent years, many efforts have contributed to the achievement of a more comprehensive coverage of the metabolome, mainly using mass spectrometry (MS) and nuclear magnetic resonance (NMR) (Figure 1C). Although NMR has many advantages, such as high selectivity, non-destructive, relative stability of chemical shifts, and ease of quantification [48],[49], its low sensitivity places MS as the most frequently used technology in metabolomics studies. MS's main features are high sensitivity, high resolution, wide dynamic range, robustness, and feasibility in elucidating the molecular weight and structure of unknown compounds. A mass spectrometer consists of three primary components, namely ion source, mass analyzer, and detector, that provide mass-to-charge ratio information [36],[49]–[51]. Nowadays, there is a wide range of MS-based technologies, which differ in operational principles and performance [36],[51]–[58]. Great advancements have been achieved with the combination of different ionization sources (e.g., electrospray ionization (ESI), electron impact ionization (EI)) and mass analyzers with various resolving power (e.g., Fourier transform ion cyclotron resonance (FT-ICRMS), orbitrap, time of flight (TOF), and linear traps) [36],[51],[58]. Tables 1 and 2 provide an overview of the most common ionization techniques and mass spectrometer analyzers, respectively. Many reviews have focused on available analytic technologies, and more detailed information can be found in Lei et al. [36], Dass [51], Villas-Bôas et al. [59], and Saito et al. [60].

Table 1 Most common ionization techniques used for metabolomics applications
Table 2 Overview of the most frequently used mass spectrometer analyzers for metabolomics applications

There are two MS strategies currently incorporated into metabolomics: direct-infusion MS and chromatography coupled to MS (Figure 1C, Table 3). The first approach uses soft ionization for mass peak assignment, promoting reduced or no fragmentation of fragile thermolabile molecules and results in a fast high-throughput screening tool [59] suitable for various studies such as microbial diversity [62], lipidomics analysis in plants and microorganisms [63],[64], plant-microbe interactions [65], among others. On the other hand, in MS coupled to chromatographic separation, the complexity of the biological sample and mass spectra is reduced, improving the separation of isomers, compound quantification and, consequently, accuracy and sensitivity [36],[37]. In this context, the most widely used separation techniques are gas chromatography (GC), liquid chromatography (LC), and capillary electrophoresis (CE). As for other analytic tools, the choice of methodology relies on the compound class of interest. For instance, GC-MS is extensively used for the analysis of semi-polar primary metabolites due to its heat stability, which is applicable for modifications through derivatization reactions that make them volatile [41], whereas LC-MS has been particularly employed for lipid or plant secondary metabolites [66]–[68]. In recent years, advances in high performance (e.g., fast GC and ultra-high performance liquid chromatography UHPLC) or multidimensional columns (e.g., GC × GC and LC × LC) have allowed for an increase in resolution, providing faster analysis and better peak separation of complex biological mixtures [50].

Table 3 Comparison of direct infusion and hyphenated techniques in mass spectrometry

Data processing, analysis, and interpretation

The ultimate goal of metabolomics experiments is to have a final matrix that can be subjected to a range of statistical tools in order to link the differences in biochemical levels to the phenotype [68]. However, as a high-throughput technology, metabolomics datasets are extremely large and require multiple tools for data information and management, raw analytical data processing, compound standardization and ontology, statistics, integration, visualization, mathematical modeling of metabolic networks, and interpretation (Figure 1D,E) [69]. Excellent reviews are available providing a detailed description of each step, and metabolomics databases can be found in Fukushima and Kusano [38], Kopka et al. [69], Redestig et al. [71], Boccard and Rudaz [72], Xia and Wishart [73], Fiehn et al. [74], and Redestig et al. [75].

The most challenging aspect of data analysis is raw data processing, which involves data conversion, baseline correction, spectrum deconvolution, peak detection and integration, chromatogram alignment, normalization, and compound identification and quantification (Figure 1D). There are a number of commercial and open source programs that automatically perform those steps and can be effectively used for each specific analytical platform (e.g., Target Search [76] or TagFinder [77] for GC-MS) or a combination of them (e.g., MetAlign [78] or XCMS [79] for GC-MS or LC-MS).

For comparison of biological groups (e.g., control and treated samples, mutant and wild type), a wealth of statistical and machine learning algorithms using unsupervised (e.g., hierarchical clustering and principal component analysis) or supervised (e.g., ANOVA, partial least squares) methods enable comprehensive identification of variables (metabolic features) in order to capture the dimension of variation among the entire dataset (Figure 1E) [69],[70],[74],[80],[81]. After this, data visualization tools allow for the simplification and incorporation of metabolic data into biochemical pathways, facilitating interpretation (Figure 1E). Several tools have been developed as highlighted in [38]; however, their utilization is still limited to annotated pathways.

Metabolomics applications for bioenergy

Microbial metabolomics

The metabolism of several microorganisms is innately able to produce ethanol as well as other ‘advanced biofuels’ such as long-chain alcohols and isoprenoid- and fatty acid-based fuels [82]. Likewise, the myriad of ecological niches occupied by microbes provides the opportunity for prospection of novel biochemical pathways, allowing for better conversion of residual biomass. Finally, microorganisms have great biotechnological potential for the engineering and/or incorporation of complete exogenous metabolic pathways for the production of value-added chemicals. Despite the unlimited capability of microorganisms for biomass conversion and biofuel production, there is still a lack of information about the metabolic networks underlying these processes.

Yeast cells are constantly exposed to multiple stress conditions (e.g., high temperature and low pH) during the industrial fermentative process, and consequently, there is growing interest in identifying novel strains with better performance. The metabolic differences of diploid (α/a) and haploid (α, a) yeasts in response to ethanol stress were recently assessed by GC-MS-TOF [83]. The results indicated that the haploid genotype was more susceptible to ethanol stress than the diploid due to its higher content of protective metabolites including polyols [83]. These findings indicate the power of metabolomics for the selection of genotypes or identification of candidate genes/paths for metabolic engineering.

Xylose is the second most abundant fermentable sugar in lignocellulosic feedstocks [84]; however, commercial yeast strains are unable to convert it into ethanol [85] (reviewed in [86]). For this reason, prospection of naturally xylose-fermenting yeasts species (e.g., Scheffersomyces stipitis or Pachisolen tannophilus), comparative genomics, and evolutionary analysis have been used as strategies to determine the limiting steps in pentose metabolism [89],[90]. Although the identified functional enzymes were expressed in recombinant Saccharomyces cerevisiae industrial strains, their efficiency in fermenting xylose was shown to still be quite low [87],[88], suggesting the requirement of additional modifications. Overexpression of genes encoding enzymes of non-oxidative pentose phosphate pathway (PPP) [89], replacement of a small amount of enzymes of xylose metabolism [90], as well as isolation of xylose transporters [91],[92] were pointed out as crucial factors for the adequate function of this pathway. However, a more holistic approach could also provide the metabolic reconstruction of most biologically relevant and predictive models to improve the fermentative ability of S. cerevisiae[93]. Recently, dynamic metabolomics studies of two recombinant strains of S. cerevisiae during anaerobic batch fermentation of a glucose/xylose mixture were conducted using LC-MS [22]. The results suggest that xylose can be primarily used as an energy source, as both strains maintained a high energy charge during the transition to xylose fermentation. However, it seems that xylose fermentation uncouples energy and carbon metabolism [22]. These findings were uncovered solely through metabolomics analysis.

Biomass pre-treatment using consolidated bioprocessing combines enzyme production, saccharification, and fermentation and leads to the production of toxic compounds (e.g., weak acids), which can inhibit yeast growth and, consequently, ethanol yield. Recently, the effect of acetic acid was analyzed by metabolomics during xylose fermentation in a recombinant strain of S. cerevisiae using GC-MS and CE-MS [25],[34]. The results revealed a significant accumulation of intermediates of PPP, indicating a slowdown of the metabolic flux [25]. Based on these findings, the authors generated a recombinant xylose-fermenting strain overexpressing the gene encoding a PPP-related enzyme, transaldolase, which conferred increased ethanol productivity in the presence of acetic and formic acids [25]. This study demonstrated the strength of metabolomics in developing rational strategies to improve tolerance to stresses through genetic engineering.

One promising area in bioenergy is the development of microbial cell factories as a platform for producing advanced biofuels, such as 1-butanol or biodiesel [6]. In most of these cases, however, genetic engineering of entire exogenous pathways to yield those compounds does not consider the metabolic flux and balance of the organism, leading to ineffective experiments. Therefore, an integrated systems approach based on fast screening, ‘omics’ tools, and metabolic-mathematical modeling could be useful for the design and optimization of metabolic pathways, which will result in a more efficient conversion of low-cost materials to highly desired products [6],[24],[94]. In light of these observations, the combination of metabolomics, fluxomics, and synthetic biology is a powerful tool for prospecting novel metabolic routes, as well as producing chemicals derived from biomass.

Plant metabolomics

Nowadays, the production of bioethanol relies almost entirely on sucrose and starch from crops. Major efforts have been taken to understand and manipulate the pathways involved in carbohydrate storage and partitioning (for review, see [2]). In this sense, few descriptive metabolomics studies attempted to unravel the correlation between metabolites, developmental stages, and sucrose accumulation pattern in sugarcane [95] or the impact of environmental stress in different species used for bioenergy [96],[97].

One obvious sustainable way to enhance biofuel production would be by increasing yield per planted area. Interestingly, a number of important traits, such as stress resistance and postharvest processing, are largely dependent on metabolic content [8], implying a vast potential for manipulation of metabolic phenotypes via classical breeding [26],[27],[29],[30],[98],[99]. This approach has several advantages in relation to genetic markers, as it does not rely on the genome sequence nor does it depend on understanding the complex mixture of segregating patterns among the progenies. Metabolomics has been successfully used to investigate the relationship between biomass yield and metabolic composition of plants [26],[27],[29],[99],[100]. In summary, the data showed that a combination of metabolites correlates with biomass rather than a single compound, suggesting the identification of metabolic signatures for complex traits [29]. These findings were extended by a more detailed analysis of recombinant inbred line (RIL) and near isogenic line (NILs) populations in Arabidopsis that revealed a couple of hot spots in which yield quantitative trait loci (QTL) overlapped with a large number of metabolite accumulation QTL [27]. Similarly, a comparative analysis of the root metabolome of parental maize inbred lines and their corresponding hybrids showed that the metabolic profile of each hybrid is distinct from its parents [28]. Altogether, these results indicated that metabolomics-assisted breeding should accelerate the selection process and, in combination with other high-throughput technologies, will probably shorten the time required for the production of elite lines [8].

Of the biomass produced by land plants, 70% is estimated to represent plant cell walls, considered a highly promising source for lignocellulosic ethanol [7],[101]. Lignocellulosic biomass includes materials such as agricultural residues (e.g., sugarcane bagasse and straw, corn stover), forestry residues, and various industrial wastes. The major problems in converting structural polysaccharides into ethanol lies in cellulose's recalcitrant nature and the complex matrix embedded by it, which interferes with the access of hydrolytic enzymes mainly due to the presence of the phenolic macromolecular structure lignin [101]. To ensure the successful production of lignocellulosic ethanol, it has been suggested that the elucidation of cell wall-related pathways could provide the means for engineering its structure as well as other bioproducts. In this sense, lignin has garnered special attention due to its potential value-added products, such as plastic and vanillin, and for this reason, many efforts have been made to improve gene annotation of the pathway [102]–[104], molecular phenotyping [105], and metabolic and integrative analysis of plants with altered lignin content [102],[106]–[108]. In an excellent study, comprehensive analysis of 20 Arabidopsis mutants of genes encoding for enzymes of the lignin pathway was accomplished using a systems approach [107]. The authors were able to identify over 560 compounds using GC-MS and UHPLC-MS, and by correlating this data with transcriptome in a network, they found genes with a putative role in phenolic metabolism, gaining insight into lignin regulatory network, and finally, had a systems view of plant response to perturbations in the pathway [107]. It is interesting to note that those mutants did not present a clear growth phenotype, and molecular characterization was crucial for understanding the compensation mechanisms in the lignin pathway. Such information may help in engineering plants with altered lignin content suitable for the bio-based economy.

Another renewable fuel that can be produced from plants is biodiesel. Plant oils are composed of triacylglycerols (TGAs), like fatty acyl chains, which are chemically similar to the bulk of molecules found in petrol [3]. Despite great potential for the use of plant biodiesel, there are a number of factors restricting its production, such as cold-temperature properties, competition with other industrial sectors, and limited feedstock (for review, see [3]). In this sense, different strategies have been applied to alter the fatty acid profile and identify plant species able to produce these compounds in greater scale. One example is Physaria fendleri, which synthesizes the highly valued fatty acid lesquerolic acid for application in cosmetic, plastic, and biofuel industries. Metabolomics characterization of this species using GC-MS and UHPLC-MS has provided important insights into fatty acid synthesis and also points to the importance of metabolic annotation into pathways [109]. The emerging field of lipidomics has garnered much attention in the last few years due to the possibility for profiling large-scale lipid classes such as TGA, glycerolipids, among others [110],[111]. This technology could also assist in the discovery of new feedstocks for biofuel production, which has been already successfully applied to microalgae in the selection of strains and/or optimization of biomass growth [112],[113]. However, the use of lipidomics in higher plants has been limited to annotation of pathways [114]–[116] and response to environmental stresses [117], with no direct application for biofuels so far.

Conclusions

We have highlighted the current status of metabolomics and its power in bioenergy-oriented studies. Although great advances in several analytic platforms allow for the assessment of hundreds of metabolites in complex biological samples, compound identification and data analysis and integration are still bottlenecks that make this approach a challenge. Efforts to further understand metabolic changes in microorganism during the harsh ethanolic fermentation steps, as well as the prospection of novel routes and strains for biomass conversion, indicate that metabolomics plays an indispensable role in diagnostics and metabolic engineering. In addition, metabolomics links phenotype to genotype and has great potential for application in various plant science areas such as metabolomics-assisted breeding and systems biology. For this reason, metabolomics could boost plant yield and microorganism performance for biotechnological purposes in bioenergy research.