Skip to main content
Log in

Model-driven data curation pipeline for LC–MS-based untargeted metabolomics

Metabolomics Aims and scope Submit manuscript

A Correction to this article was published on 16 March 2023

This article has been updated



There is still no community consensus regarding strategies for data quality review in liquid chromatography mass spectrometry (LC–MS)-based untargeted metabolomics. Assessing the analytical robustness of data, which is relevant for inter-laboratory comparisons and reproducibility, remains a challenge despite the wide variety of tools available for data processing.


The aim of this study was to provide a model to describe the sources of variation in LC–MS-based untargeted metabolomics measurements, to use it to build a comprehensive curation pipeline, and to provide quality assessment tools for data quality review.


Human serum samples (n=392) were analyzed by ultraperformance liquid chromatography coupled to high-resolution mass spectrometry (UPLC-HRMS) using an untargeted metabolomics approach. The pipeline and tools used to process this dataset were implemented as part of the open source, publicly available TidyMS Python-based package.


The model was applied to understand data curation practices used by the metabolomics community. Sources of variation, which are often overlooked in untargeted metabolomic studies, were identified in the analysis. New tools were used to characterize certain types of variations.


The developed pipeline allowed confirming data robustness by comparing the experimental results with expected values predicted by the model. New quality control practices were introduced to assess the analytical quality of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Change history


Download references


MEM acknowledges Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET, Argentina, PUE 055 project) and the National Agency of Scientific and Technological Promotion (ANPCyT, PICT-2018-02137 and PICT-2020-01019 projects) for providing the funding. MEM is a research staff member from CONICET. We would also like to thank Dr. Christoph Bueschl for helpful discussions.

Author information

Authors and Affiliations



The manuscript was conceived and written through contributions of all authors. All authors have given approval to the final version of the manuscript.

Corresponding author

Correspondence to María Eugenia Monge.

Ethics declarations

Conflict of interest

The authors have no disclosures of potential conflicts of interest related to the presented work.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: the missing surname of the corresponding author and the incorrect email id has been updated appropriately.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1318 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riquelme, G., Bortolotto, E.E., Dombald, M. et al. Model-driven data curation pipeline for LC–MS-based untargeted metabolomics. Metabolomics 19, 15 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: