1 Introduction

Metabolomics is a technique focused on the global analysis of hundreds of metabolites, characterizing their changes in a biological sample under different conditions. On one hand the aim is to detect specific metabolic markers in response to different stimulus and on the other hand to allow exploring the overall patterns of metabolic changes (Fiehn 2002; Holmes et al. 2008; Vinayavekhin et al. 2010; Wishart 2008). Metabolomics deals not only with identification and evaluation of altered metabolic states, but it is also concerned with relating these findings to the biology and metabolism of living organisms. As a result, it facilitates the verification and discovery of new and unknown mechanisms involved in the development of observable physiological consequences (Schauer and Fernie 2006).

Gas chromatography coupled with mass spectrometry (GC-MS) is one of the most commonly used analytical techniques in metabolomics. This is due to its higher consistency and robustness as compared to liquid chromatography mass spectrometry (LC-MS) and its higher sensitivity as compared to nuclear magnetic resonance spectroscopy (NMR) (Lisec et al. 2006). The nature of the data produced in GC-MS-based metabolomics is complex and the amount of information is high-dimensional so that researchers are faced with several conceptual difficulties. They include revealing and representing internal structure of measurements in a comprehensive way and choosing an appropriate statistical strategy for data processing and evaluation in order to draw reliable and robust scientific conclusions (Gehlenborg et al. 2010; Thysell et al. 2007; Wiklund et al. 2008).

Several approaches have been successfully utilized for qualitative data acquisition and quantitative analysis of information contained in spectral data (Boccard et al. 2010). The data acquisition phase encompasses preprocessing steps including mass spectra generation, normalization procedures, and alignment and deconvolution of peaks across samples using multivariate curve resolution tools. The common statistical approach used in metabolomics data analysis is based on multivariate analysis (MVA) including methods such as principal component analysis (PCA) (Wold et al. 1987), partial least squares discriminant analysis (PLS-DA) and its extension orthogonal projections to latent structures discriminant analysis (OPLS-DA) (Trygg and Wold 2002). However, being successful in processing, clustering and representing essential trends and modifications among large subsets of metabolites the MVA classifications can be limited for analysis of the dynamics on an individual metabolite level. As a result, the conclusions drawn regarding biochemical properties of individual metabolites might be vague or inconsistent. Hence, if such dynamics are of interest, then multivariate methods should be complemented by testing relevant hypotheses for individual metabolites of particular interest.

This paper discusses the metabolomic analysis of Siberian spruce Picea obavata needles during cold acclimation, which can serve as an example of the importance of justification and complementary analysis of conclusions obtained from multivariate methods. As shown, OPLS-DA can be successfully applied for separation of collected samples and metabolites in groups clearly revealing four consecutive phases during the acclimation period from mid-August to January. Meanwhile, the conclusions derived from the OPLS-DA based separation goes beyond the known result in developing the freezing tolerance in Arabidopsis (Kaplan et al. 2004), motivating complimentary analysis. Indeed, the studies on the dynamics of the metabolome in cold acclimation in Arabidopsis showed that there are no clear indications of coherent time changes in metabolites that can be separated into individual phases during the pre-acclimation period. Thus the four phases in acclimation of Picea obovata concluded with the help of OPLS-DA require further justification and correct interpretation.

To this end, we performed analysis of dynamics of levels of each individual metabolite over the acclimation period with the help of an organized pair-wise multicomparison procedure in order to complement and correctly interpret the conclusions obtained by OPLS-DA. In addition, we observe that multicomparison tests can be valuable for grouping the data itself. The results of multicomparisons deliver individual information for each particular metabolite in the study that can be used further for identification and characterization of metabolites with closely connected functions in the cold acclimation process.

2 Material and methods

2.1 Original data

2.1.1 Sample description

Data from a detailed metabolomic study of acclimation of Siberian spruce to extreme freeze tolerance were used. Briefly, this study involved needle extracts from three Siberian spruce trees growing in The Ringve Botanical Garden in Trondheim, Norway (Strimbeck et al. 2008). According to previous findings these trees develop extreme freezing tolerance even in relatively milder local climate than in species’ natural range (Strimbeck et al. 2007). Needles were collected at nine different time points every two to four weeks from August 2006 to January 2007. Samples were placed in 50 ml centrifuge tubes (Sarstedt), and frozen directly in liquid nitrogen. The metabolites were extracted from 9 to 12 mg of needles ground to powder in liquid nitrogen with normalized volumes of extraction mixture. The ratio between extraction mixture volumes and sample weight was chosen equal to 1:12. Derivatized samples were analyzed according to Gullberg et al. (2004) using an Agilent 6890N gas chromatograph equipped with a 10 m × 0.18 mm ID fused silica capillary column with a chemically bonded 0.18 m DB5-MS stationary phase. The samples were randomized to minimize the influence of systematic time drift. Each sample was injected in splitless and split (1/20) modes by a CTC Combi Pal autosampler (CTC Analytics AG, Zwingen, Switzerland) and analyzed in three batches. A series of n-alkanes (C12–C40) and blank control samples were run for each separate batch in order to calculate retention indexes.

2.1.2 Data pre-processing

Split and splitless MS-files were pretreated separately including smoothing, base-line correction, chromatogram alignment and hierarchical multivariate curve resolution (H-MCR) according to (Jonsson et al. 2004, 2005) and combined together. Multiple peaks corresponding to the same metabolites were recalculated manually for selected masses using custom scripts. PCA was used for data overview and to calculate internal standards scores, which subsequently were used as normalization factors. The data set in the study contained at least 431 putative metabolites. Out of them 115 metabolites were unambiguously annotated by comparing their mass spectra to spectral databases containing authentic standard compounds and maintained by Umeå Plant Science CenterFootnote 1 in Umeå, Sweden, and the Max Planck InstituteFootnote 2 in Golm, Germany.

2.2 Statistical analysis

2.2.1 Multivariate analysis: OPLS-DA

Orthogonal projections to latent structures discriminant analysis (OPLS-DA) was performed in SIMCA-P+ 12.0 (Umetrics AB, Umeå, Sweden) on mean centered and unit variance scaled metabolite responses from the original data set. The data set was examined by PCA followed by OPLS-DA to overview the relationships between extracted metabolites (variables) and samples (observations) as well as among the variables themselves and to reveal groups/trends and deviating behavior among the observations. Significant metabolites contributing to the class separation were determined using the loading column plots with confidence intervals calculated by jack-knifing. The validation used here was implemented as a cross-validation procedure, where 1/7 portion of the samples was kept out of the modeling and the data being remodeled seven times. This provided a predictive value regarding the classification for each sample, and based on this the Q 2 value was calculated as a measure of the predictive robustness of the model.

2.2.2 Univariate pair-wise multicomparison procedure

In order to validate the classification of the metabolites revealed by OPLS-DA, we considered each individual metabolite found in the study and made an attempt to identify its homogeneity intervals in process of acclimation. This step was realized with the help of the univariate pair-wise multicomparison procedure and implemented in MATLABTM R2007b (the MathWorks, MA). In particular, the mean values \( \bar{\mu}_i \) of metabolite contents collected in all nine time pointsFootnote 3

$$ \bar{\mu}_1, \bar{\mu}_2, \ldots,\bar{\mu}_9 $$

were tested to support or reject the corresponding pair-wise equalities

$$ \bar{\mu}_i=\bar{\mu}_j, \quad i,j=1,\ldots,9,\;\; i \ne j. $$
(1)

Then the results of these tests were used to reconstruct time intervals, where the metabolite contents were found unchanged. Equation (1) constitute 36 individual hypotheses, which were tested for each individual metabolite as follows:

  1. 1.

    The samples within each of nine groups of the metabolite were checked for normality by Jarque-Bera test. To control the family-wise error rate (FWER) for these nine parallel hypotheses below the 0.05 value, the correction procedure due to Holm was used (Holm 1979; Lehmann and Romano 2005);

  2. 2.

    The groups with normally distributed samples were checked in pairs for common variance by Bartlett’s test. The number of tests was dependent on results of the previous step, while the control FWER for testing these parallel hypothesis below 0.05, the correction procedure due to Holm was used (Holm 1979; Lehmann and Romano 2005);

  3. 3.

    Each of the 36 hypotheses in Eq. 1 were tested by t-test or Mann–Whitney U-test depending on whether the samples in groups have normal distribution or not, and if both normally distributed, whether the variances in groups are the same or different. To decide which test to use, the results from the previous step were taken into account. The corresponding P-values for tests were chosen to control the FWER below 0.05 following the Holm procedure (Holm, 1979; Lehmann and Romano 2005).

3 Results and discussion

3.1 Classification analysis by OPLS-DA

The OPLS-DA method was used to test the differences in metabolite alterations related to low-temperature acclimation, to assess the overall experimental variations and to determine time points of such variations. This revealed an evident and statistically significant separation (P < 0.05) between several acclimation phases (Fig. 1) defined by specific time points, where the metabolomic pattern was substantially changed as a whole. Namely, the division of the acclimation period for the Siberian spruce into four consecutive phases was classified by the time points of the nine available measurements:

  1. phase 1:

    [1] (around August 15; pre-acclimation phase)

  2. phase 2:

    [234] (from September 4 till October 8; early acclimation phase)

  3. phase 3:

    [5] (around October 23; late acclimation phase)

  4. phase 4:

    [6789] (from November 5 till January 2; full acclimation phase)

Fig. 1
figure 1

OPLS-DA score scatter plot derived from GC-MS data of P. obovata needles. Each point represents a metabolic profile of a sample harvested during acquisition of freezing tolerance: samples collected in August are represented as diamonds, samples collected in September and early October as dots, samples collected in late October as squares and samples collected from November till January as triangles

The OPLS-DA fitted model resulted in three predictive and five orthogonal components. 66.6% of variation in the data set (R 2 X cum) was used to account for 96.6% of the variance in the class separation (R 2 Y cum); for additional information see Supplementary Table 1 in Appendix. The cross-validated predictive ability of the model was 92.9% (Q 2 cum), which is conclusive for supporting the presented separation into four phases.

3.2 Pair-wise multicomparison procedure

3.2.1 Inference on normality of measurements in groups and (in-)equalities of variances for pair-wise comparisons

Normality of observations on each of the nine sample dates for each individual metabolite was decisive for choosing the appropriate strategy for testing pair-wise comparisons in Eq. 1. The Jarque-Bera test applied for measurements in each groups and corrected by the Holm procedure to control the FWER below 0.05 value, revealed that for the original data only 157 groups out of 431 × 9 = 3,879 possible have measurements significantly deviating from normally distributed. The test was repeated for the case when the data were modified by the log10 transformation with the similar outcome: 102 groups out of 3,879 have log10 transformed measurements significantly deviating from normally distributed. In both cases, the numbers of such groups with abnormal properties (157 and 102) were around 3 and 2% of the total number of possible groups respectively and can be considered as reasonably small. Checking how these groups were distributed among the whole period of the observation, one can observe just marginal differences between the time points that the measurements were collected, see Fig. 2.

Fig. 2
figure 2

The numbers of groups for all metabolites that have been statistically confirmed to have non-normally distributed measurements depicted versus dates the measurements were taken. The plot on the left shows such numbers for the original data, while the one on the right presents such numbers for the original data transformed by log10

On the next step, the groups of normally distributed measurements for each metabolite were tested in pairs for common variance by Bartlett’s test corrected by the Holm procedure to keep the FWER below 0.05. For 186 out of 431 metabolites in the original (non-scaled) data, at least two or more groups of data had statistically significant differences in variances. Making the log10-transformation of the original data, as expected, improved and stabilized variances of measurements. Only 108 out of 431 metabolites were found for the transformed data, for which at least two or more groups of the same metabolite had statistically significant differences in variances. To conclude:

  • For the large portion of metabolites, measurements grouped by sample dates were normally distributed and variances of measurements done on different sample dates were equal;

  • However, the presence of non-normally distributed measurements as well as the presence of measurements with different variances in between groups both for substantial number of metabolites make questionable the use of standard statistical tool such as one-way ANOVA and its modifications for further searching intervals of homogeneity levels for each of metabolites. These hypotheses were tested using the pair-wise multicomparison procedure.

3.2.2 Inference on class separation based on results of pair-wise multiple comparisons

The results of pair-wise comparisons in Eq. 1 run for each metabolite allowed reconstruction subintervals of the observation period, where its content was kept unchanged. For instance, the statistical analysis of dynamics of myo-inositol-1-phosphate depicted in Fig. 3 reveals that there are four homogeneity subintervals

$$ [1],\;\; [234],\;\; [5],\;\; [6789] $$
(2)

representing the following grouping of its means

$$ \left\{{\bar\mu}_1\strut\right\}, \;\; \left\{ {\bar\mu}_2={\bar\mu}_3={\bar\mu}_4\strut\right\}, \;\; \left\{{\bar\mu}_5\strut\right\}, \;\; \left\{ {\bar\mu}_6={\bar\mu}_7={\bar\mu}_8\strut={\bar\mu}_9\right\}. $$
Fig. 3
figure 3

Development of myo-inositol-1-phosphate over the observation period from the mid-August 2006 to the beginning of January 2007

The procedure run for all found metabolites and implemented on the original (non-scaled) data led to the following conclusions:

  • 263 metabolites out of 431 have two or more different homogeneity intervals. In other words, 263 metabolites undergo significant changes during the observation period;

  • Only 25 metabolites out of these 263 have the exact split of the homogeneous intervals as predicted by OPLS-DA;

  • 156 metabolites out of these 263 have one or several homogeneous intervals as predicted by OPLS-DA.

These findings complement the conclusions of the multivariate test and allow us to make the correct interpretation in terms of properties of individual metabolites. In particular:

  1. 1.

    One cannot definitely support the hypothesis that the majority of the metabolites have four phases in acclimation to the cold stress developed in the natural conditions. Indeed, less than 10% of metabolites that showed statistically significant changes in the period between mid-August and January, are confirmed to have the same four-phase profile;

  2. 2.

    However, for ≈60% of metabolites that changed in the period between mid-August and January, the time points proposed in OPLS-DA analysis for splitting the period into individual phases, are indeed points where one or several significant changes in metabolite levels occur. This implies that the results of the pair-wise multicomparison procedure can partly be seen as supportive for OPLS-DA analysis and provide further details. In particular, it was found that

    1. (a)

      98 metabolites (≈37% of all changed metabolites) have the first measured point in mid-August as one of their homogeneity intervals, i.e. for these metabolites the levels at August 15 and at the next collecting point on September 4 are statistically different;

    2. (b)

      33 metabolites (≈12% of all changed metabolites) possess the homogeneity interval from September 4 to October 8 (the collecting points 2–4);

    3. (c)

      68 metabolites (≈25% of all changed metabolites) have the fifth measured point (October 23) as one of their homogeneity intervals, i.e. for these metabolites the levels at the previous (October 8) and the next (November 5) collecting points are statistically different from the level of such metabolite measured on October 23;

    4. (d)

      102 metabolites (≈40% of all changed metabolites) possess the homogeneity interval from November 5 to January 2 (the collecting points 6–9).

3.2.3 Use of pair-wise comparisons for detecting metabolites with similar behaviors

The results of the pair-wise multi-comparisons can be used for detecting metabolites that have similar dynamics for the whole period of measurements or for its subintervals. In this respect, the tests in Eq. 1 provide more detailed information and can be appropriate for validating hypotheses other than the one focused on searching metabolite’s homogeneity intervals.

To illustrate this point let us consider again the time development of myo-inositol-1-phosphate depicted in Fig. 3 and explore the derived conclusion that it has four distinctive phases, see Eq. 2, of homogeneity. It was deduced from the statistical inference to support the following pair-wise relations:

  1. [1]:

    \( \bar{\mu}_1\ne\bar{\mu}_2 \)

  2. [234]:

    \( \bar{\mu}_2= \bar{\mu}_3,\;\;\bar{\mu}_2= \bar{\mu}_4,\;\;\bar{\mu}_3=\bar{\mu}_4 \)

  3. [5]:

    \( \bar{\mu}_5\ne \bar{\mu}_2,\;\bar{\mu}_5\ne \bar{\mu}_6 \)

  4. [6789]:

    \( \bar{\mu}_6= \bar{\mu}_7, \;\;\bar{\mu}_6= \bar{\mu}_8,\;\;\bar{\mu}_6=\bar{\mu}_9,\;\;\bar{\mu}_7= \bar{\mu}_8,\;\;\bar{\mu}_7= \bar{\mu}_9,\;\;\bar{\mu}_8= \bar{\mu}_9 \)

However, there are more results of pair-wise comparisons available—i.e. comparisons for the values of \( \bar{\mu}_1 \) versus \( \bar{\mu}_7, \) the values of \( \bar{\mu}_3 \) versus \( \bar{\mu}_9 \) etc.—that were not used or were redundant in searching homogeneity intervals. This information can be considered as descriptive and representative for this metabolite to recognize its relatives—metabolites that have the similar pattern of pair-wise tests’ results as for whole observation period or for subintervals.

To this end, it is natural to use a number of differences in pair-wise comparisons to quantify a distance from metabolites to a particular one on a chosen time interval. Figure 4 shows numbers of metabolites that have from 1 and up to 8 differences with results of pair-wise comparison tests performed for myo-inositol-1-phosphate in the period between September 25 and November 20 (i.e. between measurements 3 and 7).

Fig. 4
figure 4

Number of metabolites that have \( 1, 2, \ldots,8 \) differences with results of pair-wise comparison tests performed for myo-inositol-1-phosphate in the period between September 25 and November 20 (i.e. between measurements 3 and 7)

As seen, there are no metabolites that have an exact match with myo-inositol-1-phosphate in that period even though 25 metabolites were found to have exactly the same split of observation period into four distinctive phases: [1], [234], [5], [6789]. However a few metabolites (N = 6) have already one difference, much more (N = 41) have two differences and so on. The closest identified metabolites from the family with one difference are appeared to be sucrose and fructose, which is not apparent from their time evolution depicted in Fig. 5 respectively. Indeed, the intervals of homogeneity for these metabolites are

  • sucrose: [1], [234], [5], [67], [89]

  • fructose: [1], [2], [3], [4], [5], [67], [8], [9]

and differ from the ones found for myo-inositol-1-phosphate. All three metabolites (myo-inositol-1-phosphate, sucrose and fructose) are directly involved in carbohydrate metabolism leading to raffinose biosynthesis well known for establishing and maintaining freezing tolerance during the temperature stress (Guy et al. 2008). We leave the biochemical analysis of dynamics and roles of these metabolites in cold acclimation for future study. However, it is worth repeating that these responsive metabolites with coherent changes were found without any a priori information or analysis, but just by comparing the results of pair-wise tests for individual metabolites.

Fig. 5
figure 5

Development of sucrose (on the left plot) and fructose (on the right plot)—over the observation period from the mid-August 2006 to the beginning of January 2007

4 Concluding remarks

The paper is devoted to an important question that frequently appears in metabolomics studies: statistical analysis of consistency of conclusions derived by multivariate methods for individual metabolites. One expects that classification of metabolites in groups done by multivariate methods can be linked to particular features of individual metabolites. However, the precise statement to what extent changes of the metabolome as a whole are followed by appropriate modifications of each individual metabolite is often left uncertain.

We explore this problem considering the metabolite patterns dynamics associated with acquired freezing tolerance of P. obovata. As shown, OPLS-DA indicates four distinctive phases in metabolome changes for this interval of time. While, the homogeneity levels analysis computed for each metabolite rejects the OPLS-DA conclusion that majority of metabolites follow this four-phases pattern. Rather, it is followed in a weak sense: the majority of responsive metabolites have significant changes in one or several time points proposed by OPLS-DA.

The discrepancies in conclusions from both methods emphasize weak sides of MVA, and the conservatism of classical statistical procedures for grouping metabolites by their individual behaviors. The results clearly motivate developing new techniques that allow merging different methods leading to new knowledge and rigorous scientific interpretations.