Abstract
Principal component analysis (PCA) is probably one of the most used methods for exploratory data analysis. However, it may not be always effective when there are multiple influential factors. In this paper, the use of multiblock PCA for analysing such types of data is demonstrated through a real metabolomics study combined with a series of data simulating two underlying influential factors with different types of interactions based on 2 × 2 experiment designs. The performance of multiblock PCA is compared with those of PCA and also ANOVA-PCA which is another PCA extension developed to solve similar problems. The results demonstrate that multiblock PCA is highly efficient at analysing such types of data which contain multiple influential factors. These models give the most comprehensive view of data compared to the other two methods. The combination of super scores and block scores shows not only the general trends of changing caused by each of the influential factors but also the subtle changes within each combination of the factors and their levels. It is also highly resistant to the addition of ‘irrelevant’ competing information and the first PC remains the most discriminant one which neither of the other two methods was able to do. The reason of such property was demonstrated by employing a 2 × 3 experiment designs. Finally, the validity of the results shown by the multiblock PCA was tested using permutation tests and the results suggested that the inherit risk of over-fitting of this type of approach is low.
Similar content being viewed by others
References
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110, 203–219.
Brereton, R. G. (2003). Chemometrics: Data analysis for the laboratory and chemical plant. Chichester: Wiley.
Climaco-Pinto, R., Barros, A. S., Locquet, N., Schmidtke, L., & Rutledge, D. N. (2009). Improving the detection of significant factors using ANOVA-PCA by selective reduction of residual variability. Analytica Chimica Acta, 653, 131–142.
Ferreira, D. L. S., Kittiwachana, S., Fido, L. A., Thompson, D. R., Escott, R. E. A., & Brereton, R. G. (2010). Windows consensus PCA for multiblock statistical process control: Adaption to small and time dependent normal operating condition regions, illustrated by on-line high performance liquid chromatography of a three stage continuous process. Journal of Chemometrics, 24, 596–609.
Good, P. I. (2005). Permutation, parametric and bootstrap tests of hypotheses (3rd ed.). New York: Springer.
Gower, J. C., & Dijksterhuis, G. B. (2004). Procrustes problems. Oxford: Oxford University Press.
Harrington, P. B., Vieira, N. E., Espinoza, J., Nien, J. K., Romero, R., & Yergey, A. L. (2005). Analysis of variance-principal component analysis: A soft tool for proteomic discovery. Analytica Chimica Acta, 544, 118–127.
Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). New York: Springer.
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.
Kassama, Y., Xu, Y., Dunn, W. B., Geukens, N., Anné, J., & Goodacre, R. (2010). Assessment of adaptive focused acoustics versus manual vortex/freeze-thaw for intracellular metabolite extraction from Streptomyces lividans producing recombinant proteins using GC-MS and multiblock principal component analysis. Analyst, 135, 934–942.
Manly, B. F. (2005). Multivariate statistical methods: A primer. London: Chapman & Hall.
Qin, S. J., Valle, S., & Piovoso, M. J. (2001). On unifying multiblock analysis with application to decentralized process monitoring. Journal of Chemometrics, 15, 715–742.
Smilde, A. K., Jansen, J. J., Hoefsloot, H. C. J., Lamers, R.-J. A. N., van der Greef, J., & Timmerman, M. E. (2005). ANOVA-simultaneous component analysis (ASCA): A new tool for analyzing designed metabolomics data. Bioinformatics, 21, 3043–3048.
Smilde, A. K., Westerhuis, J. A., & de Jong, S. (2003). A framework for sequential multiblock component methods. Journal of Chemometrics, 17, 323–337.
Westerhuis, J. A., Kourti, T., & MacGregor, J. F. (1998). Analysis of multiblock and hierarchical PCA and PLS models. Journal of Chemometrics, 12, 301–321.
Xu, Y., Cheung, W., Winder, C. L., & Goodacre, R. (2010). VOC-based metabolic profiling for food spoilage detection with the application to detecting Salmonella typhimurium contaminated pork. Analytical and Bioanalytical Chemistry, 397, 2439–2449.
Zomer, S., Dixon, S. J., Xu, Y., Jensen, S. P., Wang, H., Lanyon, C. V., et al. (2009). Consensus multivariate methods in gas chromatographic mass spectrometry and denaturing gradient gel electrophoresis: MHC-congenic and other strains of mice can be classified according to the profiles of volatiles and microflora in their scent-marks. Analyst, 134, 114–123.
Acknowledgments
We want to thank Dr. Yankuba Kassama for providing his metabolomics data. We also want to thank the anonymous reviewers for many constructive suggestions. YX and RG acknowledge the Symbiosis-EU (www.symbiosis-eu.net) project (No. 211638) financed by the European Commission under the 7th Framework programme for RTD. The information in this document reflects only the authors’ views and the Community is not liable for any use that may be made of the information contained therein.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Xu, Y., Goodacre, R. Multiblock principal component analysis: an efficient tool for analyzing metabolomics data which contain two influential factors. Metabolomics 8 (Suppl 1), 37–51 (2012). https://doi.org/10.1007/s11306-011-0361-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-011-0361-9