Discriminant Q2 (DQ2) for improved discrimination in PLSDA models
In this paper we introduce discriminant Q2 (DQ2) as an improvement for the Q2 value used in the validation of PLSDA models. DQ2 does not penalize class predictions beyond the class label value. With rigorous Monte Carlo simulations we show that when DQ2 is used, a smaller effect can be found statistically significant than when the standard Q2 is used.
KeywordsPLSDA Discrimination Q2
Q2 is defined as one minus the ratio of the prediction error sum of squares (PRESS) over the total sum of squares (TSS) of the response vector y (Cruciani et al. 1992). When the PLS method was introduced for classification, the Q2 parameter survived as a measure for class prediction ability and today is regularly used to validate discrimination models such as PLSDA (Lutz et al. 2006; Wiklund et al. 2008). One of the problems of the Q2 parameters is that it is unclear which Q2 value corresponds to a good discrimination model. Therefore, the Q2 value can be compared to a distribution of Q2 values obtained from models of the same data with randomly permuted class labels Lindgren et al. (1996), Westerhuis et al. (2008). In such a way, statistical significance (P-values) can be obtained for a given discrimination model.
1D 1H NOESY NMR spectra of urine samples of 28 male and female human subjects in the age of 35–75 years and mildly hypertensive (Systolic blood pressure: 130–179 mmHg, Diastolic blood pressure: <100 mmHg) were obtained. An exponential window function was applied to the free induction decay (FID) with a line-broadening factor of 0.5 Hz prior to the Fourier transformation. The Fourier transformed NMR data were manually phase and baseline corrected and calibrated against the reference standard TSP resonance at δ 0.0 ppm. The NMR spectra were subdivided into 550 discrete regions (‘buckets’) of equal width (0.02 ppm), from which the integral regions were determined using AMIX (Analysis of Mixtures, Bruker GmbH, Germany). The spectral region between δ 4.3–5.2 ppm was excluded from the data set to avoid spectral interference of residual water. The urine profiles were normalized to the integral of creatinine methyl peak between δ 3.05–3.10 ppm.
Monte Carlo simulations were performed by adding a predefined effect to the spectra of 14 randomly selected volunteers whereas for the other 14 individuals no effect was added. PLSDA was used to discriminate between the two groups. 25 cross model validations [Anderssen et al. (2006)] or sometimes called double cross validation [Smit et al. (2007)] were performed. In each double cross validation the samples were divided into seven groups. For each double cross validation a (D)Q2 value is obtained. Twenty-five double cross validations were performed in which the samples were distributed differently over the seven groups because of the large difference in (D)Q2 value depending on the specific selection of the samples in the seven groups. Thus 25 (D)Q2 values were finally obtained the average (D)Q2 was computed. Then 2,000 permutations were performed in which the class label was randomly permuted and for each permutation again the average (D)Q2 was computed in the same way as described above. The number of average (D)Q2 values of the permutations that are larger than the average (D)Q2 value of the original labeling, divided by 2,000, represents the P-value. Finally we repeated the procedure five times with each time a different selection of 14 individuals that received the treatment. In this way five P-values were obtained. The average of these five P-values is finally used to compare the Q2 and DQ2 values.
When an α = 0.05 significance limit would be used to reject the Null hypothesis of no effect, it can be seen that the effect size needs to be about 3.6 when Q2 is used while an effect size of about 3.4 already leads to a significant discrimination model when DQ2 is used. For the multivariate effect (effect 2) in the bottom plot of Fig. 3, the difference between Q2 and DQ2 is even larger. Here a multivariate effect size of 2.8 gives a statistical significant model when DQ2 is used while for Q2 an effect size of 3.2 is needed to become statistically significant.
In this paper the discriminant Q2 (DQ2) statistic is introduced as a replacement for the traditionally used Q2 value to represent class prediction ability. With rigorous Monte Carlo simulation it is shows that statistically significant discrimination models can be found for a smaller effect size when DQ2 is used than when the traditional Q2 is used. This is particularly beneficial in metabolomics-based discrimination problems where the biological responses can be subtle and highly variable among the individuals.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.