Introduction

Urinary organic acid analysis is a pivotal technique in selective screening for inborn errors of metabolism (IEMs) (Tanaka et al 1980; de Almeida and Duran 2014). The current state of the art relies on gas-chromatography coupled to mass spectrometry (GC-MS) of derivatized compounds. By its nature, the literally hundreds of organic acids present in human urine, endogenous as well as microbiome, drugs and other xenobiotic-derived metabolites (Blau et al 2014), can be detected by GC-MS. In fact, GC-MS analysis is the forerunner of untargeted metabolomics analysis as we nowadays envision it.

GC-MS has several advantages. It has high separation efficiency of metabolites, high specificity and sensitivity, few matrix effects, and broadly covered mass spectra libraries for identification of metabolites of interest are widely available (Pasikanti et al 2008). Disadvantages are clearly present since organic acids are not volatile and require organic solvent extraction and derivatization prior to GC-MS analysis. This makes the procedure laborious, and compared to underivatized tandem-mass spectrometry methods, time-consuming in terms of analytical run-time. Moreover, a relatively large sample volume is needed, despite efforts to increase throughput and reduce sample volume (Nakagawa et al 2010).

High resolution proton nuclear magnetic resonance (NMR) spectroscopy is a good alternative for organic acid analysis (and even broader IEM screening) with minimal sample preparation and a short experimental time. NMR spectroscopy did, however, not evolve in common IEM screening because of financial constraints and its relatively low sensitivity in the low millimolar range (Moolenaar et al 2003). Alternative liquid chromatography (LC)-quantitative hyphenated tandem mass spectrometry (MS/MS) techniques have been developed that allow high-throughput (Want et al 2010; Bouatra et al 2013). LC-MS/MS has outstanding sensitivity (in low nanomolar range) and specificity but only targeted metabolites, i.e., those a-priori selected in the method, are detected and quantified. The relatively recent introduction of high-resolution (HR) mass spectrometry in the form of time-of flight (TOF) MS and Orbitrap MS specificity allowed a major breakthrough. LC-HR MS combines the analytical power of LC-MS/MS with the unbiased quality of classical GC-MS, and thus enables not only the quantification of target metabolites, but also facilitates untargeted metabolite screening. Until now, LC-HR MS, including LC-QTOF/MS that combined with a quadrupole (Q), has been widely deployed in research settings (Paglia et al 2012), including inborn errors of metabolism (Wikoff et al 2007). Apart from a qualitative untargeted metabolomics approach (Miller et al 2015), no quantitative application suitable for routine diagnostic setting has been reported in the inborn errors of metabolism (IEM) field.

Here, we present our newly developed LC-QTOF/MS method for the quantitative analysis of urinary organic acids. This method covers a panel of critical biomarkers targeting defects in branched-chain amino acid-, lysine- and tryptophan-, aromatic amino acid-, neurotransmitter-, fatty acid- and pyrimidine metabolism as well as disorders in the Krebs cycle and urea cycle, amino acylase deficiencies and various other disorders. In addition, four medication-related metabolites were included in the panel, to allow discrimination between metabolic defects and medication. Analytical and diagnostic suitability were demonstrated in a cohort of individual urine samples, including proven IEMs. To expedite clinical interpretation we introduced z-score value plots. Our results demonstrate the suitability of this new method in the routine setting of selective metabolic screening.

Material and methods

Samples collection

All procedures followed were in accordance with the ethical standards of the “Human tissue and medical research- Code of Conduct for Responsible Use” by Dutch Federation of Biomedical Scientific Societies.

Clinical validation samples

Random urine specimens used in this study were selected from multiple laboratories. The specimens without a diagnosis (N = 88) used in this study were selected from the archive of the Maastricht Laboratory of Clinical Genetics. Samples were analyzed using the presently described method 1–2 years after routine diagnostic work-up and stored at −20 °C. Only urine samples from patients without a confirmed inborn error of metabolism (IEM) or any other biochemical finding or condition that was likely to influence the biochemical read-out were included. Samples of patients (N = 99) with confirmed diagnoses of IEM were included as positive controls. Detailed information on the clinical situation or treatment was not always available. These samples were from the archives of Maastricht Laboratory of Clinical Genetics, Translational Metabolic Laboratory, Radboud University Medical Centre, Nijmegen and from the archives of the Heidelberg and Sheffield qualitative organic acid assessment schemes of ERNDIM (European Research Network for evaluation and improvement of screening, diagnosis and treatment of Inherited disorders of Metabolism; www.erndim.org).

Method comparison samples

We used 28 urine samples of the ERNDIM quantitative organic acids quality assurance scheme from the period 2009–2015 for method cross validation.

Chemicals, standards, and internal standards

All mobile phase solutions were prepared with UPLC-MS grade solvents of water, formic acid, and acetonitrile from Biosolve (Valkenswaard, the Netherlands). Supplemental Table S1 shows the 68 organic acid standards and 19 internal standards (ISTD) and their source. In the case of 2-pyrroloylglycine, hawkinsin, and 3-hydroxysebacic acid, urine samples of diagnosed patients with hyperprolinemia type II (HYRPRO2, OMIM #239510), hawkinsinuria (OMIM #140350), and long-chain 3-hydroxyl-CoA dehydrogenase deficiency (LCADD, OMIM #609016), respectively, served as surrogate standards. The four metabolites of medication were based on urine samples of patients using paracetamol (acetaminophen, glucuronide, and acetaminophensulphate), levetiracetam, and valproic acid (valproic acid glucuronide).

LC-QTOF/MS method

Frozen urine samples were thawed at 37 °C and homogenized by vortex mixing; 25 μl urine adjusted to a creatinine concentration of 0–2 mM was mixed with 350 μL 0.1% v/v formic acid in water and 25 μL ISTD mixture. The samples were analyzed on an LC-QTOF/MS system (Agilent Technologies, Amstelveen, the Netherlands) that consisted of an Infinity II 1290 UHPLC coupled to a 6550 iFunnel QTOF equipped with an electrospray ionization source. The temperature in the multi-sampler was set at 4 °C. LC-separation was performed on an Acquity C18 column UPLC HSS T3 1.8 μm 2.1 × 100 mm with Acquity VanGuard PreColumn UPLC HSS T3; 2.1 × 5 mm (Waters, Manchester, UK) at 22 °C. The injection volume was 2 μL, followed by standard needle wash. The mobile phases for the reversed-phase (RP)-LC consist of solution A) 0.1% v/v formic acid in water and solution B) 0.1% formic acid in 95% acetonitril/5% water. The gradient program of the 35 min-cycle is described in Suppl. Table S2. The MS was tuned for low mass range (up to 1700 m/z) at high resolution slicer mode + 2G Hz extended dynamic range, and run in the negative mode for full scan with parameters listed in Suppl. Table S3. Agilent reference mass solution (containing reference compounds with m/z 112.9856 and m/z 1033.9881) was infused into the MS via a 1260 isocratic pump (Agilent) for continuous mass correction.

Calibration mixture stock solutions of 68 analytes were prepared and stored at −80 °C. In each batch, 6-point calibration curves were made with freshly prepared dilutions (Table 1). The quality of the batch was continuously monitored with a quality control (QC) sample with known amounts of 57 analytes. A QC sample and blank were injected at the start and after every 24th sample throughout the analytical workflow.

Table 1 Targeted biomarker panel

Data analysis and statistics

MassHunter workstation software (Agilent) Acquisition B.06 and Quantitative Analysis for Q-TOF B.07 were used for MS data acquisition and analysis. The monitored ions and their retention times, established with standards or patients samples, and the ISTD of each analyte are listed in Table 1. The peak area was used for quantification. Each analyte was absolutely quantified based on its own standard calibration curve, except the seven aforementioned analytes based on patients’ urine, of which concentrations were calculated with a relative ISTD method assuming the analyte has the same response factor as its ISTD. All concentrations were normalized to urine creatinine concentrations obtained from clinical chemistry lab (Jaffe method).

Z-scores were calculated by comparing analyte log transformed values to the associated mean and standard deviation found in a control cohort without a diagnosed IEM (n = 46) in two age groups (0–2 years and >2 years). Missing values and values below the estimated limit of detection (LOD, concentration equivalent to 3 * S/N) were imputed with LOD before log transformation. For the analytes that were not detected in normal urines, SD was set as 0.774 to report a z-score value of 3 when the measurement result is five times the LOD.

Method analytical validation

The method was validated according to our internal validation procedure for quantitative methods based on the Dutch NEN-EN-ISO 15189 guide “Medical Laboratories-Requirements for quality and competence” (Nederlands Normalisatie-instituut (NEN) 2012).

Method cross validation

Urine samples from the ERNDIM Quantitative Organic Acids External Quality Assessment Scheme were analyzed by GC-MS and the UPLC-QTOF/MS method. The results from GC-MS analyses as performed in the ERNDIM scheme were used as a reference for the new method. The results of the medians of participants in the ERNDIM scheme were retrieved from ERNDIM archive; 98% of the participants used GC-MS (Peters et al 2016).

We compared the results of 24 spiked analytes in 28 ERNDIM samples on recovery and linearity (Martens and Weykamp 2016). Method differences were assessed per analyte by t-test or Mann-Whitney test when proper. A P-value <0.05 was considered to be significant. We also applied Passing-Bablok regression analysis (Passing and Bablok 1983). The above analyses were done in R environment, and with method comparison regression (MCR) package (Manuilova et al 2014; R Core Team 2016).

Results

Analytical characteristics of the LC-QTOF/MS method

The UHPLC coupled QTOF/MS method was able to separate and identify 74 analytes. The 2- and 3-hydroxyadipic acid isomers could not be resolved due to identical retention times and were therefore analyzed as one combined signal. Adipic acid and 3-methylglutaric acid have the same mass and comparable hydrophobicity. At normal physiological conditions, concentrations of both metabolites are low and there is a 0.05 min difference in retention time allowing adequate distinction. At pathological levels, however, the peak of the biomarker compound overwhelms another one. We need to combine the profile of biomarks and clinical presentation to resolve the identity of the peak. The suboptimal peak shapes of 2-methylcitric acid, succinylacetone (Suppl. Figure), citric acid, and oxoadipic acid hampered correct quantification, but analogue isotopic internal standards corrected for that to some extent.

Linearity and recovery

Linear calibration curves (R2 > 0.98) were obtained in water and urine for all 68 analytes (Table 1). The recovery was 100 ± 15% for the vast majority of analytes (63/68) in urine. For 2-methylcitric acid, acetoacetic acid, 4-hydroxybutyric acid, malonic acid, and pyroglutamic acid recovery was out of this range. For 2-methylcitric acid, acetoacetic acid, and malonic acid this can be explained by the chromatographic peak shape. Detection of 4-hydroxybutyric acid and pyroglutamic acid was complicated by ion suppression which could not be completely corrected for by using alternative internal standards.

Limits of detection and quantification

The majority of the analytes had an estimated limit of quantification (LOQ, concentration equivalent to S/N > 10) < 3 μmol/l. Succinic acid, oxoglutaric acid, ethylmalonic acid, 2-oxovaleric acid, 2-methyl-3-hydroxybutyric acid, 4-hydroxyphenylpyruvic acid, fumaric acid, glycolic acid, 3-methylglutaconic acid, malic acid, oxoadipic acid, 3-hydroxyisobutyric acid, 3-hydroxybutyric acid, succinylacetone, pyruvic acid, and methylmalonic acid had a LOQ ≤ 15 μmol/l. Three compounds had a higher LOQ, i.e., 4-hydroxybutyric acid (18 μmol/l), malonic acid (21 μmol/l), and acetoacetic acid (69 μmol/l).

Within- and between-run variation

Within-run variation was satisfactory for all analytes in urine, the coefficient of variation was ≤10% for the vast majority of compounds. The coefficient of variation was slightly higher for lactic acid (11%), 2-methylcitric acid (12%), malic acid (12%), and mevalonic acid (23%). For 2-methylcitric acid and malic acid the higher variation is explained by poor peak-shape, whereas mevalonic acid is unstable in the acid environment.

The between-run variation was satisfactory at low (CV < 20%) and for high calibration levels (CV < 10%) for the majority of compounds. For acetoacetic acid the CV was 13% at 93 μM; for 2-methylcitric acid CV 14% at 88 μM, poor peak-shape being the main explanation.

  1. 1.

    Comparison and correlation between GC-MS and LC-QTOF/MS methods

    The results on 24 organic acids in 28 samples from the ERNDIM quantitative organic acids scheme were used to compare the GC-MS and LC-QTOF/MS methods (Fig. 1). An acceptable recovery for an individual metabolite was defined as a recovery between 80 and 120%. For the LC-QTOF/MS method 20/24 analytes showed an acceptable recovery which was similar to medians of all participating labs in the original ERNDIM scheme (21/24) (Fig. 1). LC-QTOF/MS had significantly improved recovery for: pyroglutamic acid, tiglylglycine, and 3-hydroxyisobutyric acid. For oxoglutaric acid and mevalonic acid, the LC-QTOF/MS recovery was below 80% but still satisfactory for diagnostic purposes. Both GC-MS (medians of all labs) and LC-QTOF/MS methods had excellent linearities for all 24 analytes (Suppl. Table s4). Passing-Bablok regression analysis indicated that the LC-QTOF/MS and the GC-MS (median of all labs) methods were highly correlated and comparable. Mevalonic acid, oxoglutaric acid, 3-hydroxyisobutyric acid, pyroglutamic acid, and tiglylglycine showed a significant difference with a slope out of range 0.8–1.2 (Suppl. Table S4).

  2. 2.

    Clinical validation of the LC-QTOF/MS methods

    In the clinical validation, we analyzed 99 diagnostic samples of patients with an IEM (32 different IEMs included) and 88 control samples. Figure 2 is a summary of the z-scores on critical biomarkers for the 32 different IEM. Most IEM were easily recognized with z-scores of biomarkers clearly >2.5 in the z-score profile. In our daily practice, the actual concentrations are considered in addition to the z-scores profile. In all control urine samples, the majority of the biomarkers had z-scores between −2.5 to 2.5. Figure 3 shows representative z-score profiles of patients with isovaleric aciduria (IVA, OMIM #243500), medium chain acyl-CoA dehydrogenase deficiency (MCADD, OMIM #201450), propionic aciduria (PA, OMIM #606054), glutaric aciduria type 1 (GA I, OMIM #231670), and methylmalonic aciduria (MMA (mut0), OMIM #251000).

Fig. 1
figure 1

Comparison of the general GC-MS method with our new LC-QTOF/MS on 28 ERNDIM QC urine samples containing 24 spiked analytes

Fig. 2
figure 2

Summary of z-scores of critical biomarkers for 32 IEMs. Both urines from patients with an IEM as well as controls negatively tested for IEM (n = 46) were included. Red dot indicates a patient, blue circle is from a control

Fig. 3
figure 3

Z-score profiles of some representative IEMs: isovaleric aciduria (IVA, OMIM #243500), medium chain acyl-CoA dehydrogenase deficiency (MCADD, OMIM #201450), propionic aciduria (PA, OMIM #606054), glutaric aciduria type 1 (GA I, OMIM #231670), and methylmalonic aciduria (MMA, OMIM #251000)

For maple syrup urine disease (MSUD, OMIM #248600), 2-hydroxy-3-methylpentanoic acid and 2-hydroxy-3-methylbutyric acid were the most discriminating biomarkers with z-scores >10 in all patients. Five patients with a glutaric aciduria type I (GA I, OMIM #231670) were easily diagnosed based on elevated concentrations of 3-hydroxy-glutaric acid (z-score > 10); four of them also had significantly elevated glutaric acid. Only one GA I patient had a normal excretion of glutaric acid.

Five patients with glutaric aciduria type II (GA II, multiple acyl-CoA dehydrogenase deficiency (MADD), OMIM #231680) presented with an increased excretion of ethylmalonic acid (EMA), glycine conjugated as well as dicarboxylic acids. Glutaric acid and 2-hydroxyglutaric acid were not consistently elevated in all three cases. A mild MADD case even presented with non-significant excretion of 2-hydroxyglutaric acid (z-score 2.5) but with a clear increase in both EMA and dicarboxylic acids. The excretion of EMA in MADD is comparable in z-score to short-chain acyl-CoA dehydrogenase deficiency (SCADD, OMIM #201470).

In all MCADD patients, phenylpropionylglycine, hexanoylglycine, and suberylglycine were significantly increased with z-scores >5.0.

One patient with 5-oxoprolinuria (pyroglutamic aciduria, OMIM #266130) was included and had a high excretion of pyroglutamic acid (z-score > 10). The absence of paracetamol metabolites acetaminophen glucuronide and acetaminophensulphate excluded medication-related changes. In other patients within our sample panel on paracetamol indicated by the presence of these two medication-related metabolites, only a minor increase of pyroglutamic acid (z-score 2.5–5.0) (data not shown) was observed.

Discussion

Urine organic acid analysis is a pivotal part in the diagnostic workup of inborn errors of metabolism (IEM). We developed a LC-QTOF/MS method for the quantification of 71 metabolites in urine covering all disorders classically identified through GC-MS analysis of organic acids. This targeted analyses is aimed at finding most organic acidurias in a fast manner, with the limitation that some IEM classically detected in GC-MS analysis, such as glycerol kinase deficiency, are not detected. Obviously, in case of strong clinical suspicion additional analyses are required in these cases. However, our method can be easily expanded by adding more relevant biomarkers for other IEM. The method was validated according to Dutch NEN-EN-ISO 15189 standards underlining its fit for purpose in the routine setting of a specialized diagnostic laboratory in the field of IEM. Performance characteristics (i.e., recovery) of our new analytical UHPLC-QTOF/MS assay were comparable or even superior to the classical GC-MS analysis. Cross validation showed on linearities and recoveries that our results compared well to that of the median of all labs for the GC-MS method. The targeted approach with internal standards makes our method robust. In addition, simple sample preparation and short time from unprocessed urine sample-to-authorized lab result of an individual sample (< 3–4 h) are clear benefits for the LC-QTOF/MS assay. However, for two metabolites, i.e., oxoglutaric acid and mevalonic acid, LC-QTOF/MS had a worse recovery than the classical assay. Two cases of classical mevalonic aciduria (MEVA, OMIM #610377) were easily diagnosed based on z-score profiles with abnormal mevalonic acid. Hyper-IgD syndrome (OMIM #260920) diagnostics depends on detecting more subtle increased excretion of mevalonic acid especially in periods of episodic fever (Prietsch et al 2003). We measure only mevalonic acid which is in equilibrium with lactone form (not detected in our method) in mild acid conditions; additional HIDS patient sample (especially in crisis) analysis would be required to evaluate the feasibility of these diagnostics with our new method.

We clinically validated our new assay by evaluating 32 different IEM and in total 187 samples with a representative number of control samples showed that this new diagnostic approach with z-score profiles facilitates diagnostics of the majority of organic acidurias. Most biomarkers associated with the IEMs included as reported in literature (Blau et al 2014) were observed in significantly elevated levels (z-score > 3.0) in the z-score plots for most IEM. Not all metabolites were increased to the same extent in all patients reflecting severity of the disease and clinical condition and/or treatment status (detailed information however was not always available). In patients with MSUD, not all markers were elevated in all patients; however, due to the presence of multiple biomarkers for MSUD in the metabolite panel, this diagnosis could not be missed. Glutaric acid was not elevated in all urine samples from GA I and MADD patients. With urine organic acid analysis, 3-hydroxyglutaric acid is the diagnostic metabolite. Glutaric acid can be completely normal in some patients but 3-hydroxyglutaric acid is pathognomonic for GA I. 3-hydroxyglutaric acid can be hard to identify and quantitate since it may co-elute with 2-hydroxyglutaric acid in GC-MS analyses (Hedlund et al 2006), depending on the analytical column. In our newly established LC-QTOF/MS assay, we could separate 2- and 3-hydroxyglutaric acid quite well, which allows unambiguous annotation and quantification of both metabolites.

All MCADD patients had abnormal z-score profiles. MCADD diagnostics is usually based on acylcarnitine profiling and organic acid analysis. Browning et al (2005) showed that normal acylcarnitine levels during confirmation of abnormal newborn screening have been encountered in some fatty acid oxidation disorders. Our findings are therefore of extra information on diagnostics. For these notoriously difficult IEMs several analyses (both acylcarnitine and organic acid analysis) in different settings (fasting and fed state) could be of added value in the diagnostics. Eventually, enzyme or gene analysis confirms a diagnosis based on biochemical abnormalities.

One patient with 5-oxoprolinase deficiency (pyroglutamic aciduria) could be easily discriminated on the basis of the z-score for secondary causes of 5-oxoprolinuria like a certain drug (paracetamol) (Saudubray et al 2016). The absence of paracetamol related markers in this patient supported the diagnosis. Other causes of 5-oxoprolinuria which include severe burns, inborn errors of metabolism not involving the gamma-glutamyl cycle, e.g., X-linked ornithine transcarbamylase deficiency, urea cycle defects, tyrosinemia, and homocystinuria were, however, not included in this evaluation.

All these factors and representation of results as a long list of numbers made it easy to miss a diagnosis despite reference values. This is why we introduced z-score plots. In the broad panel of IEMs included in this clinical validation, the z-score plot proved to be a useful and easy to use tool for labs to interpret the analytical results of the LC-QTOF/MS method.

The targeted biomarkers approach and number of IEM diseases that can be diagnosed can be continuously expanded to meet the complexity of IEM diagnostics. The full scan method is flexible to include new analytes and update the method to meet this challenge. This is part of our future work. We will explore not only the negative but also the positive ionization mode to cover broad chemical groups, including amino acids, acylcarnitines, purine and pyrimidine metabolites. Eventually, the method can be a generic metabolic screening method on the LC-QTOF/MS platform whereby urine is analyzed as a front-line specimen for screening of IEMs. This would provide a valuable means to efficiently, accurately, and rapidly identify and manage many IEMs (Campeau et al 2008).