# Pair-wise multicomparison and OPLS analyses of cold-acclimation phases in Siberian spruce

- First Online:

- Received:
- Accepted:

- 4 Citations
- 644 Downloads

## Abstract

Analysis of metabolomics data often goes beyond the task of discovering biomarkers and can be aimed at recovering other important characteristics of observed metabolomic changes. In this paper we explore different methods to detect the presence of distinctive phases in seasonal-responsive changes of metabolomic patterns of Siberian spruce (*Picea obovata*) during cold acclimation occurred in the period from mid-August to January. Multivariate analysis, specifically orthogonal projection to latent structures discriminant analysis (OPLS-DA), identified time points where the metabolomic patterns underwent substantial modifications as a whole, revealing four distinctive phases during acclimation. This conclusion was re-examined by a univariate analysis consisting of multiple pair-wise comparisons to identify homogeneity intervals for each metabolite. These tests complemented OPLS-DA, clarifying biological interpretation of the classification: about 60% of metabolites found responsive to the cold stress indeed changed at one or more of the time points predicted by OPLS-DA. However, the univariate approach did not support the proposed division of the acclimation period into four phases: less than 10% of metabolites altered during the acclimation had homogeneous levels predicted by OPLS-DA. This demonstrates that coupling the classification found by OPLS-DA and the analysis of dynamics of individual metabolites obtained by pair-wise multicomparisons reveals a more correct characterization of biochemical processes in freezing tolerant trees and leads to interpretations that cannot be deduced by either method alone. The combined analysis can be used in other ‘omics’-studies, where response factors have a causal dependence (like the time in the present work) and pair-wise multicomparisons are not conservative.

### Keywords

Metabolomics Multiple hypothesis test Multivariate analysis OPLS-DA Siberian spruce Cold-acclimation## 1 Introduction

Metabolomics is a technique focused on the global analysis of hundreds of metabolites, characterizing their changes in a biological sample under different conditions. On one hand the aim is to detect specific metabolic markers in response to different stimulus and on the other hand to allow exploring the overall patterns of metabolic changes (Fiehn 2002; Holmes et al. 2008; Vinayavekhin et al. 2010; Wishart 2008). Metabolomics deals not only with identification and evaluation of altered metabolic states, but it is also concerned with relating these findings to the biology and metabolism of living organisms. As a result, it facilitates the verification and discovery of new and unknown mechanisms involved in the development of observable physiological consequences (Schauer and Fernie 2006).

Gas chromatography coupled with mass spectrometry (GC-MS) is one of the most commonly used analytical techniques in metabolomics. This is due to its higher consistency and robustness as compared to liquid chromatography mass spectrometry (LC-MS) and its higher sensitivity as compared to nuclear magnetic resonance spectroscopy (NMR) (Lisec et al. 2006). The nature of the data produced in GC-MS-based metabolomics is complex and the amount of information is high-dimensional so that researchers are faced with several conceptual difficulties. They include revealing and representing internal structure of measurements in a comprehensive way and choosing an appropriate statistical strategy for data processing and evaluation in order to draw reliable and robust scientific conclusions (Gehlenborg et al. 2010; Thysell et al. 2007; Wiklund et al. 2008).

Several approaches have been successfully utilized for qualitative data acquisition and quantitative analysis of information contained in spectral data (Boccard et al. 2010). The data acquisition phase encompasses preprocessing steps including mass spectra generation, normalization procedures, and alignment and deconvolution of peaks across samples using multivariate curve resolution tools. The common statistical approach used in metabolomics data analysis is based on multivariate analysis (MVA) including methods such as principal component analysis (PCA) (Wold et al. 1987), partial least squares discriminant analysis (PLS-DA) and its extension orthogonal projections to latent structures discriminant analysis (OPLS-DA) (Trygg and Wold 2002). However, being successful in processing, clustering and representing essential trends and modifications among large subsets of metabolites the MVA classifications can be limited for analysis of the dynamics on an individual metabolite level. As a result, the conclusions drawn regarding biochemical properties of individual metabolites might be vague or inconsistent. Hence, if such dynamics are of interest, then multivariate methods should be complemented by testing relevant hypotheses for individual metabolites of particular interest.

This paper discusses the metabolomic analysis of Siberian spruce *Picea obavata* needles during cold acclimation, which can serve as an example of the importance of justification and complementary analysis of conclusions obtained from multivariate methods. As shown, OPLS-DA can be successfully applied for separation of collected samples and metabolites in groups clearly revealing four consecutive phases during the acclimation period from mid-August to January. Meanwhile, the conclusions derived from the OPLS-DA based separation goes beyond the known result in developing the freezing tolerance in *Arabidopsis* (Kaplan et al. 2004), motivating complimentary analysis. Indeed, the studies on the dynamics of the metabolome in cold acclimation in *Arabidopsis* showed that there are no clear indications of coherent time changes in metabolites that can be separated into individual phases during the pre-acclimation period. Thus the four phases in acclimation of *Picea obovata* concluded with the help of OPLS-DA require further justification and correct interpretation.

To this end, we performed analysis of dynamics of levels of each individual metabolite over the acclimation period with the help of an organized pair-wise multicomparison procedure in order to complement and correctly interpret the conclusions obtained by OPLS-DA. In addition, we observe that multicomparison tests can be valuable for grouping the data itself. The results of multicomparisons deliver individual information for each particular metabolite in the study that can be used further for identification and characterization of metabolites with closely connected functions in the cold acclimation process.

## 2 Material and methods

### 2.1 Original data

#### 2.1.1 Sample description

Data from a detailed metabolomic study of acclimation of Siberian spruce to extreme freeze tolerance were used. Briefly, this study involved needle extracts from three Siberian spruce trees growing in The Ringve Botanical Garden in Trondheim, Norway (Strimbeck et al. 2008). According to previous findings these trees develop extreme freezing tolerance even in relatively milder local climate than in species’ natural range (Strimbeck et al. 2007). Needles were collected at nine different time points every two to four weeks from August 2006 to January 2007. Samples were placed in 50 ml centrifuge tubes (Sarstedt), and frozen directly in liquid nitrogen. The metabolites were extracted from 9 to 12 mg of needles ground to powder in liquid nitrogen with normalized volumes of extraction mixture. The ratio between extraction mixture volumes and sample weight was chosen equal to 1:12. Derivatized samples were analyzed according to Gullberg et al. (2004) using an Agilent 6890N gas chromatograph equipped with a 10 m × 0.18 mm ID fused silica capillary column with a chemically bonded 0.18 m DB5-MS stationary phase. The samples were randomized to minimize the influence of systematic time drift. Each sample was injected in splitless and split (1/20) modes by a CTC Combi Pal autosampler (CTC Analytics AG, Zwingen, Switzerland) and analyzed in three batches. A series of *n*-alkanes (C12–C40) and blank control samples were run for each separate batch in order to calculate retention indexes.

#### 2.1.2 Data pre-processing

Split and splitless MS-files were pretreated separately including smoothing, base-line correction, chromatogram alignment and hierarchical multivariate curve resolution (H-MCR) according to (Jonsson et al. 2004, 2005) and combined together. Multiple peaks corresponding to the same metabolites were recalculated manually for selected masses using custom scripts. PCA was used for data overview and to calculate internal standards scores, which subsequently were used as normalization factors. The data set in the study contained at least 431 putative metabolites. Out of them 115 metabolites were unambiguously annotated by comparing their mass spectra to spectral databases containing authentic standard compounds and maintained by Umeå Plant Science Center^{1} in Umeå, Sweden, and the Max Planck Institute^{2} in Golm, Germany.

### 2.2 Statistical analysis

#### 2.2.1 Multivariate analysis: OPLS-DA

Orthogonal projections to latent structures discriminant analysis (OPLS-DA) was performed in SIMCA-P+ 12.0 (Umetrics AB, Umeå, Sweden) on mean centered and unit variance scaled metabolite responses from the original data set. The data set was examined by PCA followed by OPLS-DA to overview the relationships between extracted metabolites (variables) and samples (observations) as well as among the variables themselves and to reveal groups/trends and deviating behavior among the observations. Significant metabolites contributing to the class separation were determined using the loading column plots with confidence intervals calculated by jack-knifing. The validation used here was implemented as a cross-validation procedure, where 1/7 portion of the samples was kept out of the modeling and the data being remodeled seven times. This provided a predictive value regarding the classification for each sample, and based on this the *Q*^{2} value was calculated as a measure of the predictive robustness of the model.

#### 2.2.2 Univariate pair-wise multicomparison procedure

^{TM}R2007b (the MathWorks, MA). In particular, the mean values \( \bar{\mu}_i \) of metabolite contents collected in all nine time points

^{3}

- 1.
The samples within each of nine groups of the metabolite were checked for normality by Jarque-Bera test. To control the family-wise error rate (FWER) for these nine parallel hypotheses below the 0.05 value, the correction procedure due to Holm was used (Holm 1979; Lehmann and Romano 2005);

- 2.
The groups with normally distributed samples were checked in pairs for common variance by Bartlett’s test. The number of tests was dependent on results of the previous step, while the control FWER for testing these parallel hypothesis below 0.05, the correction procedure due to Holm was used (Holm 1979; Lehmann and Romano 2005);

- 3.
Each of the 36 hypotheses in Eq. 1 were tested by

*t*-test or Mann–Whitney*U*-test depending on whether the samples in groups have normal distribution or not, and if both normally distributed, whether the variances in groups are the same or different. To decide which test to use, the results from the previous step were taken into account. The corresponding*P*-values for tests were chosen to control the FWER below 0.05 following the Holm procedure (Holm, 1979; Lehmann and Romano 2005).

## 3 Results and discussion

### 3.1 Classification analysis by OPLS-DA

*P*< 0.05) between several acclimation phases (Fig. 1) defined by specific time points, where the metabolomic pattern was substantially changed as a whole. Namely, the division of the acclimation period for the Siberian spruce into four consecutive phases was classified by the time points of the nine available measurements:

- phase 1:
[1] (around August 15; pre-acclimation phase)

- phase 2:
[234] (from September 4 till October 8; early acclimation phase)

- phase 3:
[5] (around October 23; late acclimation phase)

- phase 4:
[6789] (from November 5 till January 2; full acclimation phase)

The OPLS-DA fitted model resulted in three predictive and five orthogonal components. 66.6% of variation in the data set (*R*^{2}*X* cum) was used to account for 96.6% of the variance in the class separation (*R*^{2}*Y* cum); for additional information see Supplementary Table 1 in Appendix. The cross-validated predictive ability of the model was 92.9% (*Q*^{2} cum), which is conclusive for supporting the presented separation into four phases.

### 3.2 Pair-wise multicomparison procedure

#### 3.2.1 Inference on normality of measurements in groups and (in-)equalities of variances for pair-wise comparisons

_{10}transformation with the similar outcome: 102 groups out of 3,879 have log

_{10}transformed measurements significantly deviating from normally distributed. In both cases, the numbers of such groups with abnormal properties (157 and 102) were around 3 and 2% of the total number of possible groups respectively and can be considered as reasonably small. Checking how these groups were distributed among the whole period of the observation, one can observe just marginal differences between the time points that the measurements were collected, see Fig. 2.

_{10}-transformation of the original data, as expected, improved and stabilized variances of measurements. Only 108 out of 431 metabolites were found for the transformed data, for which at least two or more groups of the same metabolite had statistically significant differences in variances. To conclude:

For the large portion of metabolites, measurements grouped by sample dates were normally distributed and variances of measurements done on different sample dates were equal;

However, the presence of non-normally distributed measurements as well as the presence of measurements with different variances in between groups both for substantial number of metabolites make questionable the use of standard statistical tool such as one-way ANOVA and its modifications for further searching intervals of homogeneity levels for each of metabolites. These hypotheses were tested using the pair-wise multicomparison procedure.

#### 3.2.2 Inference on class separation based on results of pair-wise multiple comparisons

*myo-inositol-1-phosphate*depicted in Fig. 3 reveals that there are four homogeneity subintervals

263 metabolites out of 431 have two or more different homogeneity intervals. In other words, 263 metabolites undergo significant changes during the observation period;

Only 25 metabolites out of these 263 have the exact split of the homogeneous intervals as predicted by OPLS-DA;

156 metabolites out of these 263 have one or several homogeneous intervals as predicted by OPLS-DA.

- 1.
One cannot definitely support the hypothesis that the majority of the metabolites have four phases in acclimation to the cold stress developed in the natural conditions. Indeed, less than 10% of metabolites that showed statistically significant changes in the period between mid-August and January, are confirmed to have the same four-phase profile;

- 2.However, for ≈60% of metabolites that changed in the period between mid-August and January, the time points proposed in OPLS-DA analysis for splitting the period into individual phases, are indeed points where one or several significant changes in metabolite levels occur. This implies that the results of the pair-wise multicomparison procedure can partly be seen as supportive for OPLS-DA analysis and provide further details. In particular, it was found that
- (a)
98 metabolites (≈37% of all changed metabolites) have the first measured point in mid-August as one of their homogeneity intervals, i.e. for these metabolites the levels at August 15 and at the next collecting point on September 4 are statistically different;

- (b)
33 metabolites (≈12% of all changed metabolites) possess the homogeneity interval from September 4 to October 8 (the collecting points 2–4);

- (c)
68 metabolites (≈25% of all changed metabolites) have the fifth measured point (October 23) as one of their homogeneity intervals, i.e. for these metabolites the levels at the previous (October 8) and the next (November 5) collecting points are statistically different from the level of such metabolite measured on October 23;

- (d)
102 metabolites (≈40% of all changed metabolites) possess the homogeneity interval from November 5 to January 2 (the collecting points 6–9).

- (a)

#### 3.2.3 Use of pair-wise comparisons for detecting metabolites with similar behaviors

The results of the pair-wise multi-comparisons can be used for detecting metabolites that have similar dynamics for the whole period of measurements or for its subintervals. In this respect, the tests in Eq. 1 provide more detailed information and can be appropriate for validating hypotheses other than the one focused on searching metabolite’s homogeneity intervals.

*myo-inositol-1-phosphate*depicted in Fig. 3 and explore the derived conclusion that it has four distinctive phases, see Eq. 2, of homogeneity. It was deduced from the statistical inference to support the following pair-wise relations:

- [1]:
\( \bar{\mu}_1\ne\bar{\mu}_2 \)

- [234]:
\( \bar{\mu}_2= \bar{\mu}_3,\;\;\bar{\mu}_2= \bar{\mu}_4,\;\;\bar{\mu}_3=\bar{\mu}_4 \)

- [5]:
\( \bar{\mu}_5\ne \bar{\mu}_2,\;\bar{\mu}_5\ne \bar{\mu}_6 \)

- [6789]:
\( \bar{\mu}_6= \bar{\mu}_7, \;\;\bar{\mu}_6= \bar{\mu}_8,\;\;\bar{\mu}_6=\bar{\mu}_9,\;\;\bar{\mu}_7= \bar{\mu}_8,\;\;\bar{\mu}_7= \bar{\mu}_9,\;\;\bar{\mu}_8= \bar{\mu}_9 \)

However, there are more results of pair-wise comparisons available—i.e. comparisons for the values of \( \bar{\mu}_1 \) versus \( \bar{\mu}_7, \) the values of \( \bar{\mu}_3 \) versus \( \bar{\mu}_9 \) etc.—that were not used or were redundant in searching homogeneity intervals. This information can be considered as descriptive and representative for this metabolite to recognize its relatives—metabolites that have the similar pattern of pair-wise tests’ results as for whole observation period or for subintervals.

*myo-inositol-1-phosphate*in the period between September 25 and November 20 (i.e. between measurements 3 and 7).

*myo-inositol-1-phosphate*in that period even though 25 metabolites were found to have exactly the same split of observation period into four distinctive phases: [1], [234], [5], [6789]. However a few metabolites (

*N*= 6) have already one difference, much more (

*N*= 41) have two differences and so on. The closest identified metabolites from the family with one difference are appeared to be

*sucrose*and

*fructose*, which is not apparent from their time evolution depicted in Fig. 5 respectively. Indeed, the intervals of homogeneity for these metabolites are

*sucrose:*[1], [234], [5], [67], [89]*fructose:*[1], [2], [3], [4], [5], [67], [8], [9]

*myo-inositol-1-phosphate*. All three metabolites (

*myo-inositol-1-phosphate*,

*sucrose*and

*fructose*) are directly involved in carbohydrate metabolism leading to raffinose biosynthesis well known for establishing and maintaining freezing tolerance during the temperature stress (Guy et al. 2008). We leave the biochemical analysis of dynamics and roles of these metabolites in cold acclimation for future study. However, it is worth repeating that these responsive metabolites with coherent changes were found without any a priori information or analysis, but just by comparing the results of pair-wise tests for individual metabolites.

## 4 Concluding remarks

The paper is devoted to an important question that frequently appears in metabolomics studies: statistical analysis of consistency of conclusions derived by multivariate methods for individual metabolites. One expects that classification of metabolites in groups done by multivariate methods can be linked to particular features of individual metabolites. However, the precise statement to what extent changes of the metabolome as a whole are followed by appropriate modifications of each individual metabolite is often left uncertain.

We explore this problem considering the metabolite patterns dynamics associated with acquired freezing tolerance of *P. obovata*. As shown, OPLS-DA indicates four distinctive phases in metabolome changes for this interval of time. While, the homogeneity levels analysis computed for each metabolite rejects the OPLS-DA conclusion that majority of metabolites follow this four-phases pattern. Rather, it is followed in a weak sense: the majority of responsive metabolites have significant changes in one or several time points proposed by OPLS-DA.

The discrepancies in conclusions from both methods emphasize weak sides of MVA, and the conservatism of classical statistical procedures for grouping metabolites by their individual behaviors. The results clearly motivate developing new techniques that allow merging different methods leading to new knowledge and rigorous scientific interpretations.

The samples were collected on August 15 (*n* = 8), September 4 (*n* = 15), September 25 (*n* = 9), October 8 (*n* = 15), October 23 (*n* = 15), November 5 (*n* = 8), November 20 (*n* = 8), December 4 (*n* = 15), January 2 (*n* = 9) with numbers of measurements given in brackets.

## Acknowledgments

The work of W.P. Schröder was supported in part by the *Swedish Research Council* (the grant 621-2008-3207). The work of A.S. Shiriaev was supported in part by the *Swedish Research Council* (the grant 2008-4369). We thank Dr. Trygve Kjellsen for providing plant material, Prof. Thomas Moritz, Dr. Krister Lundgren and Dr. Izabella Surowiec for help with GC-MS analysis and fruitful discussions, Ogonna Obudulu for help with metabolite annotations. The first and last authors would like also acknowledge the contribution of E.F. Diulgher in posting the problem and the influence on the correct interpretation of the results.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.