Data Fusion in Metabolomics and Proteomics for Biomarker Discovery

  • Lionel BlanchetEmail author
  • Agnieszka Smolinska
Part of the Methods in Molecular Biology book series (MIMB, volume 1362)


Proteomics and metabolomics provide key insights into status and dynamics of biological systems. These molecular studies reveal the complex mechanisms involved in disease or aging processes. Invaluable information can be obtained using various analytical techniques such as nuclear magnetic resonance, liquid chromatography, or gas chromatography coupled to mass spectrometry. Each method has inherent advantages and drawbacks, but they are complementary in terms of biological information.

The fusion of different measurements is a complex topic. We describe here a framework allowing combining multiple data sets, provided by different analytical platforms. For each platform, the relevant information is extracted in the first step. The obtained latent variables are then fused and further analyzed. The influence of the original variables is then calculated back and interpreted.

Key words

Chemometrics Discriminant analysis PLS-DA eCVA Latent variable Variable selection 


  1. 1.
    Smolinska A, Blanchet L, Buydens LMC et al (2007) NMR and pattern recognition methods in metabolomics: from data acquisition to biomarker discovery: a review. Anal Chim Acta 750:82–97CrossRefGoogle Scholar
  2. 2.
    Koek MM, Jellema RH, van der Greef J et al (2011) Quantitative metabolomics based on gas chromatography mass spectrometry: status and perspectives. Metabolomics 7:307–328PubMedCentralCrossRefPubMedGoogle Scholar
  3. 3.
    Almstetter MF, Oefner PJ, Dettmer K (2012) Comprehensive two-dimensional gas chromatography in metabolomics. Anal Bioanal Chem 402:1993–2013CrossRefPubMedGoogle Scholar
  4. 4.
    Álvarez-Sánchez B, Priego-Capote F, Luque de Castro MD (2010) Metabolomics analysis I. Selection of biological samples and practical aspects preceding sample preparation. TrAC Trends Anal Chem 29:111–119CrossRefGoogle Scholar
  5. 5.
    Álvarez-Sánchez B, Priego-Capote F, de Castro MDL (2010) Metabolomics analysis II. Preparation of biological samples prior to detection. TrAC Trends Anal Chem 29:120–127CrossRefGoogle Scholar
  6. 6.
    Vuckovic D (2012) Current trends and challenges in sample preparation for global metabolomics using liquid chromatography-mass spectrometry. Anal Bioanal Chem 403:1523–1548CrossRefPubMedGoogle Scholar
  7. 7.
    Bodzon-Kulakowska A, Bierczynska-Krzysik A, Dylag T et al (2007) Methods for samples preparation in proteomic research. J Chromatogr B Anal Technol Biomed Life Sci 15:1–31CrossRefGoogle Scholar
  8. 8.
    Smolinska A, Hauschild A-C, Fijten RRR (2014) Current breathomics--a review on data pre-processing techniques and machine learning in metabolomics breath analysis. J Breath Res 8:027105CrossRefPubMedGoogle Scholar
  9. 9.
    Ebbels TMD, Lindon JC, Coen M (2011) Processing and modeling of nuclear magnetic resonance (NMR) metabolic profiles. Methods Mol Biol 708:365–388CrossRefPubMedGoogle Scholar
  10. 10.
    Dallinga J, Smolinska A, van Schooten F-J (2014) Analysis of volatile organic compounds in exhaled breath by gas chromatography-mass spectrometry combined with chemometric analysis. In: Raftery D (ed) Mass spectrometry in metabolomics: methods and protocols. Springer, New York, pp 251–263Google Scholar
  11. 11.
    Eilers PHC (2003) A perfect smoother. Anal Chem 75:3631–3636CrossRefPubMedGoogle Scholar
  12. 12.
    Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11:89–121CrossRefGoogle Scholar
  13. 13.
    Xu Z, Sun X, Harrington PB (2011) Baseline correction method using an orthogonal basis for gas chromatography/mass spectrometry data. Anal Chem 83:7464–7471CrossRefPubMedGoogle Scholar
  14. 14.
    Bloemberg TG, Gerretzen J, Wouters HJP et al (2010) Improved parametric time warping for proteomics. Chemom Intell Lab Syst 104:65–74CrossRefGoogle Scholar
  15. 15.
    Nielsen NPV, Carstensen JM, Smedsgaard J (1998) Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping. J Chromatogr A 805:17–35CrossRefGoogle Scholar
  16. 16.
    Tomasi G, Van Den Berg F, Andersson C (2004) Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. J Chemom 18:231–241CrossRefGoogle Scholar
  17. 17.
    Wei X, Shi X, Merrick M (2013) A method of aligning peak lists generated by gas chromatography high-resolution mass spectrometry. Analyst 138:5453–5460PubMedCentralCrossRefPubMedGoogle Scholar
  18. 18.
    Walczak B (2000) Wavelets in chemistry, 1st edn. Elsevier, AmsterdamGoogle Scholar
  19. 19.
    Trygg J, Gabrielsson J, Lundstedt T (2009) Background estimation, denoising, and preprocessing. In: Phan-Tan-Luu R, Leardi R, Sarabia L (eds) Comprehensive chemometrics. Elsevier, Amsterdam, pp 1–8CrossRefGoogle Scholar
  20. 20.
    Van den Berg RA, Hoefsloot HCJ, Westerhuis JA (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7:142PubMedCentralCrossRefPubMedGoogle Scholar
  21. 21.
    Pluskal T, Castillo S, Villar-Briones A et al (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 11:395PubMedCentralCrossRefPubMedGoogle Scholar
  22. 22.
    Engel J, Gerretzen J, Szymańska E et al (2013) Breaking with trends in pre-processing? TrAC Trends Anal Chem 50:96–106CrossRefGoogle Scholar
  23. 23.
    Hubert M, Rousseeuw P, van der Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64–79CrossRefGoogle Scholar
  24. 24.
    Daszykowski M, Serneels S, Kaczmarek K et al (2007) TOMCAT: a MATLAB toolbox for multivariate calibration techniques. Chemom Intell Lab Syst 85:269–277CrossRefGoogle Scholar
  25. 25.
    Roussel S, Bellon-Maurel V, Roger JM et al (2003) Fusion of aroma, FT-IR and UV sensor data based on the Bayesian inference. Application to the discrimination of white grapes varieties. Chemom Intell Lab Syst 65:209–219CrossRefGoogle Scholar
  26. 26.
    Lanckriet GRG, Cristianini N, Bartlett P (2004) Learning the Kernel Matrix with semidefinite programming. J Mach Learn Res 5:27–72Google Scholar
  27. 27.
    Bach FR (2008) Consistency of the Group Lasso and Multiple Kernel Learning. J Mach Learn Res 9:1179–1225Google Scholar
  28. 28.
    Smolinska A, Blanchet L, Coulier L (2012) Interpretation and visualization of non-linear data fusion in kernel space: study on metabolomic characterization of progression of multiple sclerosis. PLoS One 7, e38163PubMedCentralCrossRefPubMedGoogle Scholar
  29. 29.
    Bro R, Nielsen HJ, Savorani F et al (2013) Data fusion in metabolomic cancer diagnostics. Metabolomics 9:3–8PubMedCentralCrossRefPubMedGoogle Scholar
  30. 30.
    Skov T, Honoré AH, Jensen HM (2014) Chemometrics in foodomics: handling data structures from multiple analytical platforms. TrAC Trends Anal Chem 60:71–79CrossRefGoogle Scholar
  31. 31.
    Wold S, Sjostrom M, Eriksson L et al (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58:109–130CrossRefGoogle Scholar
  32. 32.
    Barker M, Rayens W (2003) Partial least squares for discrimination. J Chemom 17:166–173CrossRefGoogle Scholar
  33. 33.
    Trygg J, Wold S (2002) Orthogonal projections to latent structures (O-PLS). J Chemom 16:119–128CrossRefGoogle Scholar
  34. 34.
    Tominaga Y (1999) Comparative study of class data analysis with PCA-LDA, SIMCA, PLS, ANNs, and k-NN. Chemom Intell Lab Syst 49:105–115CrossRefGoogle Scholar
  35. 35.
    Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:79–89Google Scholar
  36. 36.
    De Jong S (1993) SIMPLS: an alternative approach to partial least squares regression. Chemom Intell Lab Syst 18:251–253CrossRefGoogle Scholar
  37. 37.
    Blanchet L, Smolinska A, Attali A (2011) Fusion of metabolomics and proteomics data for biomarkers discovery. BMC Bioinformatics 12:254PubMedCentralCrossRefPubMedGoogle Scholar
  38. 38.
    Nørgaard L, Bro R, Westad F (2006) A modification of canonical variates analysis to handle highly collinear multivariate data. J Chemom 20:425–435CrossRefGoogle Scholar
  39. 39.
    Haury AC, Gestraud P, Vert JP (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS One 6:e28210PubMedCentralCrossRefPubMedGoogle Scholar
  40. 40.
    Nielsen SF (2003) Proper and improper multiple imputation. Int Stat Rev 71:593–607CrossRefGoogle Scholar
  41. 41.
    Andersson CA, Bro R (1998) Improving the speed of multi-way algorithms: part I. Tucker 3. Chemom Intell Lab Syst 42:93–103CrossRefGoogle Scholar
  42. 42.
    Wold S, Johansson E, Cocchi M (1993) PSL - partial least-squares projections to latent structures. Escom, Leiden, pp 523–550Google Scholar
  43. 43.
    Wehrens R, Franceschi P (2012) Thresholding for biomarker selection in multivariate data using Higher Criticism. Mol Biosyst 8:2339–2346CrossRefPubMedGoogle Scholar
  44. 44.
    Wehrens R, Franceschi P, Vrhovsek U (2011) Stability-based biomarker selection. Anal Chim Acta 705:15–23CrossRefPubMedGoogle Scholar
  45. 45.
    Tran TN, Afanador NL, Buydens LMC et al (2014) Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC). Chemom Intell Lab Syst 138:153–160CrossRefGoogle Scholar
  46. 46.
    Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40:D109–D114PubMedCentralCrossRefPubMedGoogle Scholar
  47. 47.
    Harris MA, Clark J, Ireland A et al (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258–D261CrossRefPubMedGoogle Scholar
  48. 48.
    Posma JM, Robinette SL, Holmes E et al (2014) MetaboNetworks, an interactive Matlab-based toolbox for creating, customizing and exploring sub-networks from KEGG. Bioinformatics 30:893–895PubMedCentralCrossRefPubMedGoogle Scholar
  49. 49.
    Kaever A, Landesfeind M, Feussner K (2015) MarVis-Pathway: integrative and exploratory pathway analysis of non-targeted metabolomics data. Metabolomics 11(3):764–777PubMedCentralCrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Analytical Chemistry—Chemometrics, Institute for Molecules and MaterialsRadboud University NijmegenNijmegenThe Netherlands
  2. 2.Department of Biochemistry, Nijmegen Centre for Molecular Life SciencesRadboud University Medical CentreNijmegenThe Netherlands
  3. 3.Department of Toxicology, Nutrition and Toxicology Research Institute Maastricht (NUTRIM)Maastricht UniversityMaastrichtThe Netherlands

Personalised recommendations