One of the great unsolved problems of modern biology concerns the substrates of membrane transporters (Cesar-Razquin et al. 2018; Cesar-Razquin et al. 2015; Girardi et al. 2020; Superti-Furga et al. 2020), many of which remain ‘orphans’ (i.e. with unknown substrates), often despite the passage of decades since their identification in systematic genome-sequencing programmes (Ghatak et al. 2019).

‘Untargeted metabolomics’ is a term nowadays commonly used to describe methods that seek the reproducible detection (and sometimes quantification) of small molecules in biological matrices (Cho et al. 2014; Dunn et al. 2013; Garg et al. 2015; Martin et al. 2015; Tautenhahn et al. 2012; Treutler et al. 2016). It is most commonly performed using chromatography coupled to mass spectrometry (e.g. (Dunn et al. 2011; Dunn et al. 2007; Dunn et al. 2012)). A variety of methods, summarised in Table 1, have been developed for measuring the human serum metabolome, with both low- and high-resolution mass spectrometry. The studies highlighted in Table 1 are those we found where annotation levels were specified, and similar LC was performed. An ideal method would be rapid, reliable, and provide data on many (1000s of) metabolites simultaneously.

In a ground-breaking study using low-resolution LC-MS, Gründemann and colleagues (Gründemann et al. 2005) recognised that incubation of cells containing different levels of a transporter of interest with plasma (as an reasonably unbiased source of candidate metabolites) might allow the discovery of transporter substrates by assessing their differential uptake into cells. They thereby discovered the transporter for the important nutraceutical (Borodina et al. 2020) ergothioneine (Gründemann 2012; Gründemann et al. 2005).

The uptake and excretion of nutrients and natural products by cells is mediated by transporter proteins of two general types: solute carriers (or SLCs) and ATP-binding cassette (ABC), transporters. Around 500 known SLC transporters are known which mediate the uptake of such compounds whereas ABC transporters are involved in efflux of these(Hediger et al. 2004). Much evidence shows that drugs hitchhike on transporters for other natural molecules for which these drugs are structurally similar (Cesar-Razquin et al. 2018; Kell 2020; Kell and Oliver 2014; O’Hagan and Kell 2017; Superti-Furga et al. 2020). Moreover, recent work by Girardi et al. highlighted the key role of SLC transporters in resistance to 60 cytotoxic compounds representative of the chemical space covered by approved drugs (Girardi et al. 2020) .

As part of a wide-ranging study into the nature of transporter substrates (e.g. (Dobson and Kell 2008; Jindal et al. 2019; Kell et al. 2013; Kell et al. 2011; Kell and Oliver 2014; Kell et al. 2018)), we recognised that the approach of Gründemann and colleagues (Gründemann et al. 2005) could be an ideal strategy for implementation using modern, high-resolution metabolomics methods. We have previously developed low-resolution methods for determining the human serum metabolome reliably (Broadhurst and Kell 2006) and over extended periods (Begley et al. 2009; Dunn et al. 2011; Kenny et al. 2010; Zelena et al. 2009), including the extensive use of QA/QC samples. Our first requirement was thus to develop a new and robust method for untargeted serum metabolomics using modern, high-resolution instrumentation. The present paper describes this method, and an initial application to a series of human cell lines.

Table 1 Selection of untargeted LC–MS studies of human serum


Cell culture

A549, K562, SAOS2 and U2OS cell lines were cultured in RPMI-1640 (Sigma, Cat No. R7509) culture media supplemented with 10% fetal bovine serum (Sigma, Cat No. f4135) and 2 mM glutamine (Sigma, Cat No. G7513) without antibiotics. Cell cultures were maintained in T225 culture flasks (Star lab, CytoOne Cat No. CC7682-4225) kept in a 5% CO2 incubator at 37 °C until 70–80% confluent. The A549 cell line was purchased from the European Collection of Authenticated Cell Cultures (ECACC, Salisbury, UK), K562 cell line was a kind gift from Dr Philip J. Day (The Manchester Institute of Biotechnology, The University of Manchester), SAOS2 and U2OS cell lines were a kind gift from Prof. Peter Gardner (The Manchester Institute of Biotechnology, The University of Manchester).

Serum incubation experiments

Harvesting cells for serum incubation experiments

Cells from adherent cell lines were harvested by removing growth media and washing twice with 5 mL of pre-warmed Dulbecco’s Phosphate Buffered Saline (PBS) without calcium or magnesium (Gibco, Cat No. 14190094), then incubated in 3 mL of Gibco™ TrypLE™ Express Enzyme (1X), no phenol red (Gibco Cat No. 12604013) for 2–5 min at 37 °C. At the end of incubation cells were resuspended in 5–7 mL of respective media when cells appeared detached to dilute TrypLE treatment. The cell suspension was transferred to 50 mL centrifuge tubes and immediately centrifuged at 300 × g for 5 min. Suspended cell lines were centrifuged directly from cultures in 50 mL centrifuge tubes and washed with PBS as above. The cell pellets were resuspended in 10–15 mL media and cell count and viability was determined using a Countess II FL Automated Cell Counter (ThermoFisher Scientific) set for Trypan Blue membrane exclusion method. Cells with > 95% viability were used for serum incubation experiments.

Incubation of cells in serum

The procedure for incubation of cells in serum is described pictorially in Fig. 1. More detailed information is provided in Supplementary Information file.

Internal standard solution mixture

An internal standard stock mixture was prepared using the following compounds and concentrations: citric acid-d4 (Cambridge Isotope Laboratories, DLM-3487), 225 µM; L-lysine-d4 (Cambridge Isotope Laboratories, DLM-2640), 112.5 µM; L-Tryptophan-(indole-d5) (Cambridge Isotope Laboratories, DLM-1092), 5.625 µM; stearic acid-d35 (Cambridge Isotope Laboratories, DLM-379), 225 µM; succinic acid-d4 (Sigma 293,075), 112.5 µM; 13C6-carbamazepine (Sigma, C-136), 2.25 µM; Leucine-d10 (Sigma 492,949), 22.5 µM and methionine-d4 (Cambridge Isotope Laboratories, DLM-2933), 22.5 µM.

Fig. 1
figure 1

Incubation of cells in serum for metabolomics analysis to determine transporter substrates. Following incubation of cells in serum, spent serum is collected after centrifugation, followed by extraction using methanol. The remaining cell pellet is washed with PBS (at 37 °C), followed by quenching and extraction of intracellular metabolites using 80% methanol. The spent medium and intracellular extracts are subsequently lyophilised (with a mixture of internal standards spiked in prior to lyophilisation) and reconstituted in water ready for analysis by LC-HRMS/MS

Sample preparation for metabolomics analysis

Fresh or spent serum samples were thawed at room temperature and maintained on ice throughout the sample preparation process. Samples were prepared by addition of 100 µL sample to a 2 mL Eppendorf containing 330 µL Methanol (LC-MS grade) and 20 µL of internal standards mix (ISTDs). The mixture of methanol and ISTDs was previously cooled at − 80 °C and maintained on dry ice when adding serum. The mixture of serum, methanol and ISTD was vortexed vigorously followed by centrifugation at 13,300 rpm for 15 min at 4 °C to pellet proteins. Multiple 75 µL aliquots (for extraction replicates) of the resulting supernatant dried in a vacuum centrifuge (ScanVac MaxiVac Beta Vacuum Concentrator system, LaboGene ApS, Denmark) with no temperature application and stored at − 80 °C until required for LC-MS/MS analysis.

Quality controls (QC) and conditioning QC samples were also prepared in this way using pooled human serum.

Extraction blanks were prepared in the same way as spent serum samples replacing serum and internal standard mix with 120 µL of water (LC-MS grade). Evaluation/system suitability samples were also prepared by replacing serum with water.

Prior to analysis, samples were resuspended in 40 µL water (LC-MS), centrifuged at 13,300 rpm for 15 min at 4 °C to remove any particulates and transferred to glass sample vials.

HPLC-MS /MS analysis of spent serum samples

Untargeted HPLC-MS/MS data acquisition was performed following methodologies and guidelines in (Broadhurst et al. 2018; Broadhurst and Kell 2006; Brown et al. 2005 ; Dunn et al. 2011 ; Mullard et al. 2015 ). Data were acquired using a ThermoFisher Scientific Vanquish HPLC system coupled to a ThermoFisher Scientific Q-Exactive mass spectrometer (ThermoFisher Scientific, UK). A resolution of 70,000 was used for MS and 17,500 for ddMS (further details provided in Supplementary Information). Chromatographic separation was performed on a Hypersil Gold aQ column (C18 2.1 mm × 100 mm, 1.9 µm, ThermoFisher Scientific) operating at a column temperature of 50 °C. Elution was performed over 15 minutes at a flow rate of 0.4 mL/min using two solvents: 0.1% formic acid in water (solvent A) and 0.1% formic acid in methanol (solvent B). as described in Table 2 below. Column eluent was diverted to waste in the first 0.4 min and the last 0.1 min of the gradient. Sample vials were stored at 4 °C in the HPLC autosampler, with 5 µL injected for positive ionisation and 15 µL for negative ionisation. MS acquisition settings are described in supplementary information 1.

Table 2 HPLC gradient elution program applied for HPLC-MS/MS analysis for ESI+ and ESI− modes

Samples were analysed following guidelines set out in (Dunn et al. 2011) and (Broadhurst et al. 2018). Briefly, blank extraction samples were injected at the beginning and end of each batch to assess carry over and lack of contamination. Isotopically labelled internal standards were added to analytical and QC samples to assess system stability throughout the batch. QC samples, prepared with a standard reference material, pooled human serum, were also applied to condition the analytical platform, enable reproducibility measurements and to correct for systematic errors.

LC-MS/MS data preprocessing and analysis

Raw instrument data (.RAW) were exported to Compound Discoverer 3.1 (CD3.1) for deconvolution, alignment, and annotation (full workflow and settings are provided in supplementary information 2). Peak areas from CD3.1 were subsequently exported as a .csv file and QC-based LOESS signal correction performed in R (version 4.0.2) as discussed in (Dunn et al. 2011) using the function of the fANCOVA package. Strict QA criteria (minimum QC coverage of 80% and maximum QC CV 30%) was applied to the resulting data.

For analysis of data for serum compound uptake and excretion by cell lines, normalised peak areas were exported as an excel file into a KNIME workflow developed in-house for data analysis and visualisation (available on request). Within this workflow, Principal Components Analysis was used to visualise trends. Subsequently, simple univariate statistical analyses were carried out on log2 transformed data using a paired t-test. Volcano plots were created using these data, with a threshold of P < 0.05 and absolute log2 fold change > 0.5 set for defining a notable change in compound abundance between time points compared. UpSet plots (Lex et al. 2014) (Supplementary Fig. 3) were then used to find unique and shared consumed or secreted compounds between cell lines.

For all data acquired, annotation and identification criteria were according to (Schymanski et al. 2014).


A RP-LC-ESI-MS/MS method capable of detecting a broad range of serum compounds

The optimisation of chromatographic (O’Hagan et al. 2005) and mass spectrometric (Vaidyanathan et al. 2003) methods typically requires trade-offs between multiple objectives. During the development of the LC-MS/MS method used here our aim was both to maximise the number of serum compounds detected but also to acquire sufficient fragmentation data for more confident identification, with both achieved within a reasonably short period. Table 3 shows a summary of the results obtained from CD3.1 using the data acquired using the LC-MS/MS method described in both ESI+ and ESI− modes after running a set of serum QC samples. The method we have developed enables the detection of a large number of metabolic features, of which around 70% can be attributed to sample-related compounds (after background subtraction and exclusion, removal of compounds not found in > 80% of QC samples, and exclusion of compounds with a QC CV > 30%, see supplementary information 2). A molecular formula could be attributed to around 80% of these metabolic features. Over 80% of these sample compounds had a QC CV < 15% demonstrating a good level of reproducibility across injections. As can be seen in Supplementary Fig. 2, peak areas of detected spiked internal standards displayed high reproducibility (< 10% RSD) and excellent mass accuracy (< 1 ppm) across QC injections throughout the run.

To improve confidence in metabolite annotation we also performed data dependent MS/MS across a range of masses in a similar way to (Mullard et al. 2015). As can be seen in Table 3 around 70% of sample compounds had associated MS/MS spectra with a large proportion of MS2 spectra corresponding to a preferred adduct ion ([M+ H]+).

As is commonly the case in untargeted metabolomics, the level of annotation and identification confidence was quite varied. In our analyses, annotation and identification ranged from identification levels 5-2 (Schymanski et al. 2014) (Table 3). We do find a small number of level 1 identifications in our study; however, our in-house mass and spectral libraries are small and still under development. For this reason, level 1 identifications will not be discussed. Results in terms of annotation and identification at various levels from both ESI+ and ESI− were comparable (Table 2); we will describe these for ESI+.

In ESI+ 3016 sample compounds had a full match to a proposed molecular formula (level 4). Of these, 1156 matched mass libraries in ChemSpider (level 3) on our searches against BioCyc, HMDB, KEGG, MassBank and NIST. In addition, around 700 compounds were matched against mass libraries provided by CD3.1 software along with our own imported ones (including the serum metabolome database (Psychogios et al. 2011) and the COCONUT Natural Products database (Sorokina and Steinbeck 2020)). Of the 2612 metabolic features with MS/MS spectra, 391 were matched with reasonable confidence (≥ 70%) to the mzCloud spectral library. Of these, 211 were fully matched against a proposed molecular formula by CD3.1 providing level 2 identification confidence. These are provided in Supplementary Information 3. These compounds represented diverse metabolite classes such as amino acids, peptides and analogues, lipids, and lipid-like molecules (including fatty acyls and steroids), as well as carboxylic acids and derivatives.

Of the remaining 165 compounds with ≥ 70% match to the mzCloud spectral library but without full match to a proposed molecular formula, a large proportion are not marked as having a full match to proposed molecular formulae for several reasons. This may include better spectral matching with other compounds sharing similar substructures, but with this providing a mass error > 3 ppm. In some cases, closer manual inspection is required whereas in others, the spectral matching provides clues as to the possible underlying substructure of the molecule in question. There is some level of duplication in annotation and putative identifications, either through close matches to mass or spectral libraries, or due to the same precursor mass appearing at more than one retention time (yet with different peak areas and intensities). Some of these can be seen in Supplementary Information 3, where 57 of 211 annotated compounds were duplicated in this way. As an example, we found Arachidonic acid eluting at 4 different retention times yet spectral matching of all these with mzCloud was > 90% in confidence. Another such example was Ecgonine, eluting at two different retention times yet both having excellent spectral matching to mzCloud (> 90% match). Further investigation may reveal this to be due to (positional) isomers and/or some compounds simply binding to the column and eluting over different times. This is of lesser importance for present purposes where we aim to find unique markers of transporter substrates.

Taking into account that these results are illustrative of a QC run, results in Table 3 suggests our LC-MS/MS method is a clear improvement on those shown in Table 1. The number of metabolic features after correction and exclusion in our study are lower than those in Ganna et al. (Ganna et al. 2015), however, the number of level 2 identified compounds (whether using (Sumner et al. 2007) or (Schymanski et al. 2014)) is nearly double. Whilst this is not the case when we compare against the number of level 2 identified compounds in (Dunn et al. 2015), we expect this will increase in our methodology as number of compounds and spectra in mzCloud grow, we match to spectral libraries outside of Compound Discoverer and we continue to increase our in-house library. Furthermore, our elution gradient is shorter (15 min in both ESI+ and ESI− vs. 22 min ES+ and 24 min ESI-) which will provide great time and cost savings.

Table 3 Summary of LC-MS/MS results of serum QC samples obtained following preprocessing using CD3.1

Application of our method to determine the consumption and excretion of serum compounds by mammalian cell lines

One of the main purposes for developing the above untargeted LC-MS/MS methodology is to apply this to measure the uptake and excretion of serum compounds by mammalian cell lines. As a proof of principle, we assessed this by analysis in ESI+ of spent serum samples from 4 cell lines (A549, K562, SAOS2 and U2OS) incubated in serum at two different densities (2 and 4 million) at two timepoints (0 and 20 minutes incubation).

A summary table of the number of compounds detected and identified with various levels of confidence can be found in Supplementary Table S1. The results obtained are comparable to those shown in Table 1, demonstrating good reproducibility of our method.

PCA reveals differences in samples (Fig. 2): the different cell lines fell into distinct groups separated in both PC1 and PC2. Furthermore, separations between time 0 and 20 min of incubation suggested differences in metabolic profile of spent serum influenced by time.

Fig. 2
figure 2

PCA scores plots of spent serum extracts following incubation for 0 or 20 minutes with 4 different cell lines and two densities. Colour: Cell line and density, shape: incubation time (Color figure online)

To confirm that the separation of cell lines as well as the effect of incubation time and cell density are the result of differences in the uptake and excretion of serum components, we performed simple univariate analyses. The volcano plots in Fig. 3 confirm this is the case; cell lines consumed and excreted serum components however, the number of compounds consumed or excreted was not always proportional to increasing density, indicating that a rich set of metabolic activities were taking place during this period.

Fig. 3
figure 3

Volcano plots showing differences in number and magnitude of serum compound consumption and excretion by different cell lines and densities over 2 min. Threshold for significant change: P-value < 0.05 and log2 Fold Change < − 0.5 or > 0.5. Left panel: 2 million cell density; right side 4 million

Fuller results will be reported elsewhere for a much larger panel of cell lines; however, we provide examples of a compound exclusively consumed by one cell line (SAOS2, Fig. 4a), another exclusively secreted by another cell line (A549, Fig. 4b) and finally one where a mix of consumption and secretion was observed (Fig. 4c).

In SAOS2 cells, an unknown compound with mass of 471.26882 matching the molecular formula C21H37N507 was consumed exclusively by this cell lines, with consumption increasing in relation to higher cell density (Fig. 4a, log2 fold change − 0.84, P = 0.0004 at 2 million density and log2 fold change − 1.83, P = 0.0002 at 4 million density). Fragmentation data matching against mzCloud suggest some possible substructures of this compound match fragments for an environmental compound on the NORMAN suspect list (Mistrik et al. 2019) as shown in Supplementary Fig. 3A.

As can be seen in Fig. 4b, A549 cells secreted γ-L-Glutamyl-L-glutamic acid whilst the other 3 cell lines did not. The increase in levels of this metabolite was nearly double when cell density was doubled (log2 fold change 0.60, P = 0.0004 at 2 million density and log2 fold change 0.93, P = 0.0002 at 4 million density) whereas in other cell lines the changes were below our threshold (log2 fold change > 0.5 and P < 0.05). The identification of this compound is at a reasonable level 2, with 90.2% match to this compound in mzCloud, and low mass error (− 0.00029 Da or − 1.06 ppm) as can be seen in Supplementary Fig. 3B.

Nicotinamide (level 2 identification as shown in Supplementary Fig. 3C) was found to be secreted by SAOS2 cell lines in a density dependent manner (log2 fold change 0.55, P = 0.0010 at 2 million density and log2 fold change 0.98, P = 0.002 at 4 million density) yet consumed by K562 (log2 fold change − 0.89, P = 0.0050) and U2OS (log2 fold change − 1.26, P = 0.0004) cell lines at 4 million density and A549 cells at 2 million density only (log2 fold change of − 1.01, P = 0.0002).

While the biological significance of the consumed or secreted compounds in Figs. 3 and 4 will be discussed elsewhere in due course, the main message from this study is that the methodology employed here is robust and reproducible, (ii) capable of measuring the transport behaviour of serum compounds in an entirely unbiased way, (iii) and shows the massive differences between individual cell lines (O’Hagan et al. 2018; Wright Muelas et al. 2019).

Fig. 4
figure 4

Cell- line specific consumed or secreted compounds a Unidentified compound with assigned molecular formula C21H37N5O7 consumed by SAOS2 cell lines only, b γ-L-Glutamyl-L-glutamic acid secreted exclusively by A549 cell lines. c Nicotinamide secreted by SAOS2 cell lines but consumed by others. X-axis labels: A549_2, A549 cells at 2 million density; A549_4, A549 cells at 4 million density; K562_2, K562 cells at 2 million density; A549_4, K562 cells at 4 million density; QC_QC, Quality Control; SAOS2_2, SAOS2 cells at 2 million density; SAOS2_4, SAOS2 cells at 4 million density; U2OS _2, U2OS cells at 2 million density; U2OS_4, U2OS cells at 4 million density. Horizontal dashed line added at the median level in QC sample to aid visualisation


We have here described an untargeted LC-MS/MS method developed to maximise the number and diversity of compounds detected in human serum whilst also acquiring sufficient fragmentation data for improved metabolite annotation confidence, all within a reasonable period of 15 minutes. The method enables detection of around 4000–5000 sample-related metabolic features in both ESI+ and ESI−, with excellent reproducibility and mass accuracy; across QC injections, ≥ 80% of sample compounds QC CVs were ≤ 15% (Table 3), and spiked internal standard QC CVs were < 10% (Supplementary Figure S2) with excellent mass accuracy (< 1 ppm).

Annotation and identification of metabolites is by far the greatest bottleneck encountered in untargeted metabolomics (Djoumbou Feunang et al. 2016; Dunn et al. 2013; Misra and van der Hooft 2016). Despite an increasing availability of mass spectral libraries (Vinaixa et al. 2016), only a small proportion of small molecules in these are derived from experimental data using pure standards and, even then, these seem to cover only around 40% of compounds within human genome scale metabolic network reconstructions (Frainay et al. 2018). Note that most serum metabolites have an exogenous source(O’Hagan and Kell 2017). Only a limited number of untargeted metabolomics studies of human serum using LC-MS/MS have sufficient details with which to compare our results (Table 1). Our method enables annotation of a significant number of metabolites at levels 4-2 (Table 3). From these, we found 226 metabolites with level 2 identification confidence in ESI+, representing diverse (and relevant) metabolite classes. The number of level 2 identified compounds using our method is also improved in comparison to results reported by Ganna et al. (Ganna et al. 2015). This is not the case when we compared (Dunn et al. 2015), however, the elution gradient in our method is shorter (15 min in both ESI+ and ESI− vs. 22 min ES+ and 24 min ESI−) which will provide great time and cost savings. The annotation and identification of metabolites to level 2 in our study is likely also limited using spectral libraries available through Compound Discoverer, namely mzCloud and local spectral libraries provided as standard with this software within mzVault. Another limitation within our reported data is the small number of level 1 metabolite identifications in our study; our in-house mass and spectral libraries are small and still under development.

In addition to maximising metabolite detection and identification, we have demonstrated the applicability of the LC-MS/MS method described to measure differences in the uptake and secretion of compounds by cell lines following incubation in human serum. This takes inspiration from work by Gründemann and colleagues (Gründemann et al. 2005) taking advantage of the complex mixture of candidate transporter substrates in human serum. The results reveal both the reproducibility of the analyses and distinct metabolic footprints(Allen et al. 2003) of different cell lines in terms of both uptake and secretion (Figs. 2 and 3).

As shown in Fig. 4, some compounds were consumed exclusively by certain cell lines and not others, whilst others were consumed by some but secreted by others. These differences are undoubtedly related to the transporter expression profiles of these cell lines. We have recently shown transporter expression to vary widely between cells and tissues which can be explained by the requirements of different tissues and cell lines for different amounts of specific substrates (O’Hagan et al. 2018). Fuller and more extensive results using a larger panel of different cell lines and time points will be reported elsewhere, and use of transcriptomic and proteomic transporter expression profiles to relate these to the potential substrates of transporter proteins.


We have developed a new, 15-min untargeted metabolomics method using LC-MS/MS that allows for the robust and convenient measurement of a large number of metabolites in human serum. The method additionally acquires fragmentation data to enable improved annotation and identification of compounds. We also describe a protocol for investigating the natural substrates of transporters by way of incubating human cell lines in serum and using the above LC-MS/MS method to measure reproducible and unbiased differences in the uptake of serum compounds.