Abstract
In metabonomics it is difficult to tell which peak is which in datasets with many samples. This is known as the correspondence problem. Data from different samples are not synchronised, i.e., the peak from one metabolite does not appear in exactly the same place in all samples. For datasets with many samples, this problem is nontrivial, because each sample contains hundreds to thousands of peaks that shift and are identified ambiguously. Statistical analysis of the data assumes that peaks from one metabolite are found in one column of a data table. For every error in the data table, the statistical analysis loses power and the risk of missing a biomarker increases. It is therefore important to solve the correspondence problem by synchronising samples and there is no method that solves it once and for all. In this review, we analyse the correspondence problem, discuss current state-of-the-art methods for synchronising samples, and predict the properties of future methods.
Similar content being viewed by others
References
Listgarten J, Emili A (2005) Mol Cell Prot 4:419–434
Vandenbogaert M, Li-Thiao-Te S, Kaltenbach HM, Zhang RX, Aittokallio T, Schwikowski B (2008) Proteomics 8:650–672
Nicholson JK, Wilson ID (1989) Prog Nucl Magn Reson Spectrosc 21:449–501
Brindle JT, Antti H, Holmes E, Tranter G, Nicholson JK, Bethell HWL, Clarke S, Schofield PM, McKilligin E, Mosedale DE, Grainger DJ (2002) Nat Med 8:1439–1445
Shockcor JP, Holmes E (2002) Curr Top Med Chem 2:35–51
Wishart DS, Lewis MJ, Morrissey JA, Flegel MD, Jeroncic K, Xiong Y, Cheng D, Eisner R, Gautam B, Tzur D, Sawhney S, Bamforth F, Greiner R, Li L (2008) J Chromatogr B 871:164–173
Dixon SJ, Brereton RG, Soini HA, Novotny MV, Penn DJ (2006) J Chemom 20:325–340
Yan S-K, Wei B-J, Lin Z-Y, Yang Y, Zhou Z-T, Zhang W-D (2008) Oral Oncol 44:477–483
Nicholson JK, Connelly J, Lindon JC, Holmes E (2002) Nat Rev Drug Discov 1:153–161
Fan TWM, Lane AN (2008) Prog Nucl Magn Reson Spectrosc 52:69–117
Idborg H (2007) Analysis of metabolites in complex biological samples using LC/MS and multivariate data analysis. PhD Thesis, Stockholm University, Stockholm
Idborg-Björkman H, Edlund PO, Kvalheim OM, Schuppe-Koistinen I, Jacobsson SP (2003) Anal Chem 75:4784–4792
Jonsson P, Johansson AI, Gullberg J, Trygg JAJ, Grung B, Marklund S, Sjostrom M, Antti H, Moritz T (2005) Anal Chem 77:5635–5642
Åberg KM, Torgrip RJO, Kolmert J, Schuppe-Koistinen I, Lindberg J (2008) J Chromatogr A 1192:139–146
Miller AJ (1990) Subset selection in regression. Chapman and Hall, London
Sun J, Schnackenberg LK, Holland RD, Schmitt TC, Cantor GH, Dragan YP, Beger RD (2008) J Chromatogr B 871:328–340
De Meyer T, Sinnaeve D, Van Gasse B, Tsiporkova E, Rietzschel ER, De Buyzere ML, Gillebert TC, Bekaert S, Martins JC, Van Criekinge W (2008) Anal Chem 80:3783–3790
Anderson PE, Reo NV, DelRaso NJ, Doom TE, Raymer ML (2008) Metabolomics 4:261–272
Davis RA, Charlton AJ, Godward J, Jones SA, Harrison M, Wilson JC (2007) Chemom Intell Lab Syst 85:144–154
Danielsson R, Backstrom D, Ullsten S (2006) Chemom Intell Lab Syst 84:33–39
Jonsson P, Bruce SJ, Moritz T, Trygg J, Sjostrom M, Plumb R, Granger J, Maibaum E, Nicholson JK, Holmes E, Antti H (2005) Analyst 130:701–707
Csenki L, Alm E, Torgrip RJO, Aberg KM, Nord LI, Schuppe-Koistinen I, Lindberg J (2007) Anal Bioanal Chem 389:875–885
Forshed J, Schuppe-Koistinen I, Jacobsson SP (2003) Anal Chim Acta 487:189–199
Torgrip RJO, Aberg M, Karlberg B, Jacobsson SP (2003) J Chemom 17:573–582
Prince JT, Marcotte EM (2006) Anal Chem 78:6140–6152
Prakash A, Mallick P, Whiteaker J, Zhang HD, Paulovich A, Flory M, Lee H, Aebersold R, Schwikowski B (2006) Mol Cell Prot 5:423–432
Luedemann A, Strassburg K, Erban A, Kopka J (2008) Bioinformatics 24:732–737
Duran AL, Yang J, Wang LJ, Sumner LW (2003) Bioinformatics 19:2283–2293
Tibshirani R, Hastie T, Narasimhan B, Soltys S, Shi GY, Koong A, Le QT (2004) Bioinformatics 20:3034–3044
De Souza DP, Saunders EC, McConville MJ, Likic VA (2006) Bioinformatics 22:1391–1396
de Groot JCW, Fiers M, van Ham R, America AHP (2008) Proteomics 8:32–36
Lange E, Gropl C, Schulz-Trieglaff O, Leinenbach A, Huber C, Reinert K (2007) Bioinformatics 23:I273–I281
Eilers PHC (2004) Anal Chem 76:404–411
Tomasi G, van den Berg F, Andersson C (2004) J Chemom 18:231–241
Palmblad M, Mills DJ, Bindschedler LV, Cramer R (2007) J Am Soc Mass Spectrom 18:1835–1843
Walczak B, Wu W (2005) Chemom Intell Lab Syst 77:173–180
van Nederkassel AM, Daszykowski M, Eilers PHC, Heyden YV (2006) J Chromatogr A 1118:199–210
Kassidas A, MacGregor JF, Taylor PA (1998) Aiche J 44:864–875
Nielsen NPV, Carstensen JM, Smedsgaard J (1998) J Chromatogr A 805(1–2):17–35
Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G (2006) Anal Chem 78:779–787
Kirchner M, Saussen B, Steen H, Steen JAJ, Hamprecht FA (2007) J Stat Soft 18:4
Dynamic programming. http://en.wikipedia.org/wiki/Dynamic_programming (Accessed 26 Sept 2008)
Baran R, Kochi H, Saito N, Suematsu M, Soga T, Nishioka T, Robert M, Tomita M (2006) BMC Bioinformatics 7:530
Christin C, Smilde AK, Hoefsloot HCJ, Suits F, Bischoff R, Horvatovich PL (2008) Anal Chem 80:7012–7021
Sadygov RG, Maroto FM, Huhmer AFR (2006) Anal Chem 78:8207–8217
Suits F, Lepre J, Du PC, Bischoff R, Horvatovich P (2008) Anal Chem 80:3095–3104
Lee GC, Woodruff DL (2004) Anal Chim Acta 513:413–416
Yao WF, Yin XY, Hu YZ (2007) J Chromatogr A 1160:254–262
Fraga CG, Prazen BJ, Synovec RE (2001) Anal Chem 73:5833–5840
Pierce KM, Wood LF, Wright BW, Synovec RE (2005) Anal Chem 77:7735–7743
Listgarten J (2006) Analysis of sibling time series data: alignment and difference detection. University of Toronto, Toronto
Listgarten J, Neal RM, Roweis ST, Wong P, Emili A (2007) Bioinformatics 23:E198–E204
Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, May D, Eng J, Fang RH, Lin CW, Chen JZ, Goodlett D, Whiteaker J, Paulovich A, McIntosh M (2006) Bioinformatics 22:1902–1909
Vorst O, de Vos CHR, Lommen A, Staps RV, Visser RGF, Bino RJ, Hall RD (2005) Metabolomics 1:169–180
Fischer B, Grossmann J, Roth V, Gruissem W, Baginsky S, Buhmann JM (2006) Bioinformatics 22:E132–E140
Fischer B, Roth V, Buhmann JM (2007) BMC Bioinformatics 8(Suppl 10):S4
Jaffe JD, Mani DR, Leptos KC, Church GM, Gillette MA, Carr SA (2006) Mol Cell Prot 5:1927–1941
Åberg KM, Torgrip RJO, Jacobsson SP (2004) J Chemom 18:465–473
Sauve AC, Speed TP (2004) Normalization, baseline correction and alignment of high-throughput mass spectrometry data. Proc Gensips
Toppo S, Roveri A, Vitale MP, Zaccarin M, Serain E, Apostolidis E, Gion M, Maiorino M, Ursini F (2008) Proteomics 8:250–253
Johnson KJ, Wright BW, Jarman KH, Synovec RE (2003) J Chromatogr A 996:141–155
Chui H (2001) Non-rigid point matching: algorithms, extensions and applications. PhD Thesis, Yale University, New Haven
Nordström A, O’Maille G, Qin C, Siuzdak G (2006) Anal Chem 78:3289–3295
Skov T, van den Berg F, Tomasi G, Bro R (2006) J Chemom 20:484–497
Wu W, Daszykowski M, Walczak B, Sweatman BC, Connor SC, Haseldeo JN, Crowther DJ, Gill RW, Lutz MW (2006) J Chem Inf Model 46:863–875
Acknowledgements
The authors are thankful to AstraZeneca for financing and for access to metabonomics data from LC–MS and NMR. Helena Idborg is acknowledged for supplying the data for Fig. 3.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Åberg, K.M., Alm, E. & Torgrip, R.J.O. The correspondence problem for metabonomics datasets. Anal Bioanal Chem 394, 151–162 (2009). https://doi.org/10.1007/s00216-009-2628-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-009-2628-9