Skip to main content
Log in

Multiblock principal component analysis: an efficient tool for analyzing metabolomics data which contain two influential factors

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Principal component analysis (PCA) is probably one of the most used methods for exploratory data analysis. However, it may not be always effective when there are multiple influential factors. In this paper, the use of multiblock PCA for analysing such types of data is demonstrated through a real metabolomics study combined with a series of data simulating two underlying influential factors with different types of interactions based on 2 × 2 experiment designs. The performance of multiblock PCA is compared with those of PCA and also ANOVA-PCA which is another PCA extension developed to solve similar problems. The results demonstrate that multiblock PCA is highly efficient at analysing such types of data which contain multiple influential factors. These models give the most comprehensive view of data compared to the other two methods. The combination of super scores and block scores shows not only the general trends of changing caused by each of the influential factors but also the subtle changes within each combination of the factors and their levels. It is also highly resistant to the addition of ‘irrelevant’ competing information and the first PC remains the most discriminant one which neither of the other two methods was able to do. The reason of such property was demonstrated by employing a 2 × 3 experiment designs. Finally, the validity of the results shown by the multiblock PCA was tested using permutation tests and the results suggested that the inherit risk of over-fitting of this type of approach is low.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110, 203–219.

    Article  PubMed  Google Scholar 

  • Brereton, R. G. (2003). Chemometrics: Data analysis for the laboratory and chemical plant. Chichester: Wiley.

    Google Scholar 

  • Climaco-Pinto, R., Barros, A. S., Locquet, N., Schmidtke, L., & Rutledge, D. N. (2009). Improving the detection of significant factors using ANOVA-PCA by selective reduction of residual variability. Analytica Chimica Acta, 653, 131–142.

    Article  PubMed  CAS  Google Scholar 

  • Ferreira, D. L. S., Kittiwachana, S., Fido, L. A., Thompson, D. R., Escott, R. E. A., & Brereton, R. G. (2010). Windows consensus PCA for multiblock statistical process control: Adaption to small and time dependent normal operating condition regions, illustrated by on-line high performance liquid chromatography of a three stage continuous process. Journal of Chemometrics, 24, 596–609.

    CAS  Google Scholar 

  • Good, P. I. (2005). Permutation, parametric and bootstrap tests of hypotheses (3rd ed.). New York: Springer.

    Google Scholar 

  • Gower, J. C., & Dijksterhuis, G. B. (2004). Procrustes problems. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Harrington, P. B., Vieira, N. E., Espinoza, J., Nien, J. K., Romero, R., & Yergey, A. L. (2005). Analysis of variance-principal component analysis: A soft tool for proteomic discovery. Analytica Chimica Acta, 544, 118–127.

    Article  CAS  Google Scholar 

  • Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). New York: Springer.

    Google Scholar 

  • Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.

    Article  Google Scholar 

  • Kassama, Y., Xu, Y., Dunn, W. B., Geukens, N., Anné, J., & Goodacre, R. (2010). Assessment of adaptive focused acoustics versus manual vortex/freeze-thaw for intracellular metabolite extraction from Streptomyces lividans producing recombinant proteins using GC-MS and multiblock principal component analysis. Analyst, 135, 934–942.

    Article  PubMed  CAS  Google Scholar 

  • Manly, B. F. (2005). Multivariate statistical methods: A primer. London: Chapman & Hall.

    Google Scholar 

  • Qin, S. J., Valle, S., & Piovoso, M. J. (2001). On unifying multiblock analysis with application to decentralized process monitoring. Journal of Chemometrics, 15, 715–742.

    Article  CAS  Google Scholar 

  • Smilde, A. K., Jansen, J. J., Hoefsloot, H. C. J., Lamers, R.-J. A. N., van der Greef, J., & Timmerman, M. E. (2005). ANOVA-simultaneous component analysis (ASCA): A new tool for analyzing designed metabolomics data. Bioinformatics, 21, 3043–3048.

    Article  PubMed  CAS  Google Scholar 

  • Smilde, A. K., Westerhuis, J. A., & de Jong, S. (2003). A framework for sequential multiblock component methods. Journal of Chemometrics, 17, 323–337.

    Article  CAS  Google Scholar 

  • Westerhuis, J. A., Kourti, T., & MacGregor, J. F. (1998). Analysis of multiblock and hierarchical PCA and PLS models. Journal of Chemometrics, 12, 301–321.

    Article  CAS  Google Scholar 

  • Xu, Y., Cheung, W., Winder, C. L., & Goodacre, R. (2010). VOC-based metabolic profiling for food spoilage detection with the application to detecting Salmonella typhimurium contaminated pork. Analytical and Bioanalytical Chemistry, 397, 2439–2449.

    Article  PubMed  CAS  Google Scholar 

  • Zomer, S., Dixon, S. J., Xu, Y., Jensen, S. P., Wang, H., Lanyon, C. V., et al. (2009). Consensus multivariate methods in gas chromatographic mass spectrometry and denaturing gradient gel electrophoresis: MHC-congenic and other strains of mice can be classified according to the profiles of volatiles and microflora in their scent-marks. Analyst, 134, 114–123.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We want to thank Dr. Yankuba Kassama for providing his metabolomics data. We also want to thank the anonymous reviewers for many constructive suggestions. YX and RG acknowledge the Symbiosis-EU (www.symbiosis-eu.net) project (No. 211638) financed by the European Commission under the 7th Framework programme for RTD. The information in this document reflects only the authors’ views and the Community is not liable for any use that may be made of the information contained therein.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 1853 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Y., Goodacre, R. Multiblock principal component analysis: an efficient tool for analyzing metabolomics data which contain two influential factors. Metabolomics 8 (Suppl 1), 37–51 (2012). https://doi.org/10.1007/s11306-011-0361-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11306-011-0361-9

Keywords

Navigation