Introduction

It has been variously reported that some diabetic patients have low renal function (typically expressed as estimated GFR [eGFR]) even in the absence of proteinuria [15]. However, little else is known about the risk factors or mechanisms associated with low eGFR in patients with either type 1 [5] or type 2 diabetes [14]. Metabolomics is a technological platform for the identification and quantification of the metabolome, the collection of all small molecules present in an organism or biological sample [6]. Metabolomics has been greatly facilitated by recent developments in mass detectors, allowing techniques such as liquid chromatography/MS (LC/MS) and GC/MS to support analysis of the metabolome [6].

Interrogation of the metabolome potentially offers unprecedented insights into a disease or phenotype and provides initial access to biomarker and pathway discovery [79]. We applied these improved techniques to probe the metabolomes of Chinese type 2 diabetic patients with low eGFR. We observed striking associations between low eGFR and several urinary metabolites, which extend beyond the few known uraemic toxins. Besides offering new leads on the possible mechanisms underlying low eGFR, these metabolites could serve as novel biomarkers for the detection of chronic kidney disease.

Methods

Patients and urine samples

All patients for this study were from the Singapore Diabetes Cohort Study (SDCS). Briefly, the recruitment process of SDCS was as follows. Since 2004, all patients previously diagnosed as having type 2 diabetes and treated at primary care facilities of the National Healthcare Group Polyclinics in Singapore were invited to join SDCS. Patients with a history of mental illness were excluded. Of the patients approached, 91% agreed to participate in the study and formed part of the cohort. Consenting patients completed a questionnaire to elicit information on demographics, lifestyle factors and medical family history and also had their physical measurements taken. Random (not first morning) spot urine specimens were typically collected in the morning at the outpatient polyclinic and used for laboratory analyses. Medical records were reviewed to obtain information on their metabolic control and the presence of co-morbidities and complications including any history of non-diabetic kidney disease. Lipid measurements were performed on fasting blood samples.

The research protocol was approved by both the National University of Singapore Institutional Review Board and the National Healthcare Group Domain-Specific Review Board, and patients participating in this cohort gave informed consent.

Definitions of non-proteinuria and low eGFR

Patients in this metabolomic study were identified as being non-proteinuric using multiple spot urine samples. To thoroughly exclude the presence of proteinuria, urine samples were required to test negative on Labstix (Bayer Corporation, Elkhart, IN, USA) or Micral-Test (Boehringer Mannheim, Mannheim, Germany) or have an albumin/creatinine ratio (ACR) <3.5 μg/μmol (Exocell, Philadelphia, PA, USA) on at least two of the last three urinalyses. Most of the patients were therefore likely to be normoalbuminuric, although it was possible for some to have microalbuminuria especially if this was transient. eGFR was calculated using the simplified Modification of Diet in Renal Disease (MDRD) equation, where eGFR (ml min−1 1.73 m−2) = 186.3 × (plasma creatinine in μmol/l × 0.011)−1.154 × (age in years)−0.203 × (0.742 for women) × (1.21 if subject is black) [10]. Cases (n = 44) were defined as patients with eGFR <60 ml min−1 1.73 m−2, and controls (n = 46) had eGFR values ≥60 ml min−1 1.73 m−2. As a history of cataract was strongly associated with low eGFR in SDCS (data not shown), presence of this complication was used as an exclusion criterion to eliminate potential confounding.

Metabolomic analysis using GC/MS

Urine samples (20 μl) were incubated with 20 μl (10 mg/ml) urease enzyme for 30 min at 37°C. Then urease and other proteins were precipitated with 180 μl ice-cold methanol, which contained 10 μg/ml 9-fluorenylmethoxycarbonyl (FMOC)-glycine as an internal standard. After separation by centrifugation (16 relative centrifugal force [rcf] × 10 min, 4°C), 100 μl supernatant fraction was dried under nitrogen and derivatised with 150 μl methoxamine (50 μg/ml in pyridine, 37°C × 2 h) followed by 150 μl MSTFA (37°C × 16 h). After centrifugation (4°C, 6 rcf × 1 min), the supernatant fraction was injected into GC/MS. The derivatised sample (1.0 μl) was introduced by splitless injection with an Agilent 7683 Series autosampler into an Agilent 6890 GC System (both from Agilent Technologies, Santa Clara, CA, USA) equipped with a fused-silica capillary column HP-5MSI (30 m × 0.25 mm i.d., 0.25 μm film thickness) as reported previously [11]. The inlet temperature was set at 250°C. Helium was used as the carrier gas at a constant flow rate of 1.0 ml/min. The column effluent was introduced into the ion source of an Agilent 5973 Mass Selective Detector (Agilent Technologies). The transfer line temperature was set at 280°C and the ion source temperature at 230°C. The mass spectrometer was operated in electron impact mode (70 eV). Data acquisition was performed in full scan mode from m/z 50 to 550 with a scan time of 0.5 s. The compounds were identified by comparison of mass spectra and retention time with those of reference standards, and those available in libraries (NIST 0.5). A total of 106 peaks with specific retention times in GC/MS analyses were detected in this study.

Metabolomic analysis using LC/MS

The urine samples were diluted 1:1 with methanol (containing 10 μg/ml FMOC-glycine as an internal standard) before being vortex-mixed for 3 min. After separation by centrifugation (16 rcf × 10 min, 4°C), the supernatant fraction was injected for LC/MS analysis. LC/MS analysis was performed on an Agilent 1200 HPLC system (Agilent Technologies, Waldbronn, Germany) equipped with a 6410 QQQ triple quadrupole mass detector and managed by a MassHunter workstation. The column used for the separation was an Agilent rapid resolution HT Zorbax SB-C18 (2.1 × 50 mm, 1.8 μm; Agilent Technologies, Santa Clara, CA, USA). The oven temperature was set at 50°C. The gradient elution involved a mobile phase consisting of (A) 0.1% formic acid in water and (B) 0.1% formic acid in methanol. The initial condition was set at 5% of B. The following solvent gradient was applied: from 5% B to 100% B within 20 min, then hold for 2 min. Flow rate was set at 0.2 ml/min, and 5 μl of samples was injected. The electrospray ionisation mass spectra were acquired in positive and negative ion mode. The ion spray voltage was set at 4,000 V. The heated capillary temperature was maintained at 350°C. The drying gas and nebuliser nitrogen gas flow rates were 10 l/min and 207 × 103 Pa, respectively. For full scan mode analysis, spectra were stored from m/z 100 to 1,000. A total of 144 peaks with specific retention times in LC/MS analyses were detected in this study. The compounds were searched for using the Human Metabolome Database (www.hmdb.ca) using ion mass and further identified by either MS/MS fragmentation pattern or reference standards.

Metabolomic data preprocessing

Each chromatogram obtained from GC/MS and LC/MS analysis was processed for baseline correction and peak area calculation manually. The data were combined into a single matrix by aligning peaks with the same mass and retention time for GC/MS and LC/MS data, respectively. The area of each peak was normalised to that of the internal standard in each dataset. There was no further normalisation of the metabolites with respect to creatinine because the level of this metabolite was different between cases and controls (Tables 2 and 3).

Statistical analysis

Statistical comparison of clinical characteristics between cases and controls was performed using two-sample t tests in the case of quantitative traits. In the event where the data distribution deviated from normal distribution, Mann–Whitney tests were used. For qualitative traits, comparison between cases and controls was performed using Fisher’s exact tests.

The preprocessed metabolomic data were exported into Soft Independent Modeling of Class Analysis (SIMCA)-P (version 11.0; Umetrics AB, Umea, Sweden) for orthogonal partial least-squares discriminant analysis (OPLS-DA). To compare median signal intensities of the metabolites between cases and controls, the Mann–Whitney test was applied to each metabolite separately. The resultant p values for all metabolites were subsequently adjusted to account for multiple hypotheses testing. The false discovery rate (FDR) method of Benjamini and Yekutieli [12] was used to perform the adjustment.

To determine how well these metabolites performed in separating cases from controls, we used two approaches. In the first approach, principal component (PC) analysis was performed for metabolites with a statistically significant association in the univariate analysis. The first and second PC scores were then used as predictors in the logistic regression model, with case status as the outcome. To ensure that the PC estimates were robust, any outlying observations that lay more than four standard deviations from the mean were removed. The receiver operating characteristic (ROC) curve for the logistic model was calculated, and the AUC was used to assess the quality of prediction, with AUC closer to 1 indicating better performance.

The second approach involved selecting the best subset of metabolites that can be used to predict case status. Least absolute shrinkage and selection operator (LASSO) logistic regression [13] was used, with the optimal LASSO estimate determined using leave-one-out cross-validation. Briefly, LASSO logistic regression is very similar to the ordinary logistic regression, except that LASSO places restriction on the number of metabolites with non-zero regression coefficients. The LASSO algorithm optimally selects this subset of metabolites with non-zero coefficients. In our study, leave-one-out cross-validation was used to select this subset.

Results

Patient characteristics

Cases and controls were comparable in most clinical characteristics except that cases were older at the time of recruitment and had higher serum creatinine values, as would be expected as these variables were directly used to compute eGFR values in the MDRD equation (Table 1). Cases were also older than controls at the time of diabetes diagnosis, but this difference was borderline significant (unadjusted p = 0.0129). ACR values were similar between cases and controls (Table 1).

Table 1 Clinical characteristics of cases and controls

GC/MS analyses

OPLS-DA revealed that cases could be clearly segregated from controls on the basis of the 106 peaks detected by GC/MS (Fig. 1). Univariate analyses of the metabolite signal intensities revealed striking associations between 24 metabolites and low eGFR (unadjusted p < 0.05; Table 2). Of these 24 associations, 11 remained statistically significant after correction to account for multiple hypotheses testing (adjusted p < 0.05). In particular, the p values associated with six metabolites (oxalic acid, octanol, N-acetylglutamine, 3,5-dimethoxymandelic amide, benzamide and phosphoric acid) were very small, with the largest p value = 7.28 × 10−5 (phosphoric acid) and the smallest p value = 2.62 × 10−14 (oxalic acid). Urinary creatinine was also higher in cases than controls (p = 6.46 × 10−8; Table 2).

Fig. 1
figure 1

OPLS-DA score plot obtained from GC/MS data based on 106 peaks. Black circles, controls (n = 46); white circles, cases (n = 44). The x-axis t[1] and y-axis t[2] indicate the first PC and second PC, respectively

Table 2 Univariate analysis of metabolite signal intensities measured by GC/MS

PC analysis was next performed to determine the clustering of the 24 metabolites. The first two PCs of GC metabolites explained 56.5% of correlations between these metabolites. The first PC axis is a contrast between phosphoric acid (metabolite 4) and 3,5-dimethoxymandelic amide (metabolite 13) on the one hand and the rest of the metabolites (Electronic supplementary material [ESM] Fig. 1). The second axis was largely responsible for the separation of cases and controls, and could be characterised by a group of metabolites that tended to have higher signal intensities in controls (phosphoric acid [metabolite 4], 3,5-dimethoxymandelic amide [metabolite 13], benzamide [metabolite 7], l-serine [metabolite 6], d-glucuronic [metabolite 20], oxalic acid [metabolite 3], succinic acid [metabolite 5] and uric acid [metabolite 22]) and metabolites that were of higher signal intensity among cases, including creatinine (metabolite 10), N-acetylglutamine (metabolite 23) and octanol (metabolite 2). On the basis of these first and second PCs, the cases could be separated quite well from controls. With the diagonal line in ESM Fig. 1 used as a simple rule for separating cases from controls, all controls were placed below the line and all but five cases were located above it. In logistic regression where the first and second PC scores were used to classify participants into cases and controls, the ROC curve revealed very good discriminatory power (AUC = 0.999).

The above classification was achieved by using all 24 metabolites. However, from the univariate analysis (Table 2), it is clear that only a subset of metabolites was significantly associated after adjustment for multiple testing, and, furthermore, metabolites tended to show clustering, i.e., their signal intensities tended to vary together across the different samples, as was evident in the PC plots. We therefore next determined whether the same level of classification could be achieved with a smaller select group of metabolites. With the use of LASSO logistic regression, the following metabolites were selected as the best subset for case prediction: octanol, oxalic acid, phosphoric acid, benzamide, creatinine, 3,5-dimethoxymandelic amide and N-acetylglutamine.

Using these seven metabolites to predict case status, a classification (AUC = 0.995) was achieved that was as good as that previously derived using all 24 metabolites (ESM Fig. 2). This led to the conclusion that higher signal intensities of octanol, creatinine and N-acetylglutamine and lower signal intensities of oxalic acid, phosphoric acid, benzamide and 3,5-dimethoxymandelic amide were all independent predictors of low eGFR. Among the clinical variables, only age at diabetes diagnosis, age at recruitment and serum creatinine were statistically significant in univariate analyses (Table 1). In LASSO regression, only serum creatinine was selected into the model when the significant metabolites listed above were included. The AUC of the model with serum creatinine added is 0.996, which is very similar to the model without serum creatinine (AUC = 0.995).

LC/MS analyses

OPLS-DA revealed clear segregation of cases from controls on the basis of 144 peaks detected in LC/MS (Fig. 2). Univariate analyses revealed a total of 32 metabolites that were significantly associated with low eGFR (unadjusted p < 0.05). Of these, 19 remained significantly associated after multiple hypotheses testing had been taken into account (adjusted p < 0.05; Table 3). Of these, 17 were detected in positive ion mode, while two were determined under the negative ion mode, including indoxyl sulphate, a well-established uraemic toxin [14]. Relative to indoxyl sulphate (adjusted p = 3.03 × 10−2), several metabolites clearly showed stronger evidence of statistical association with low eGFR, with p values that were at least a magnitude smaller. These included 4-methoxyphenylacetic acid, N 6-acetyl-l-lysine, chondroitin sulphate, citric acid, phenylacetyl-l-glutamine, 2-deoxyuridine and deoxypyridinoline (Table 3).

Fig. 2
figure 2

OPLS-DA score plot obtained from LC/MS data on 144 peaks. Black circles, controls (n = 46); white circles, cases (n = 44). The x-axis t[1] and y-axis t[2] indicate the first PC and second PC, respectively

Table 3 Univariate analysis of metabolite signal intensities measured by LC/MS

PC analyses of the LC/MS metabolites using the first two PCs explained 53.5% of correlations. The metabolites had higher signal intensities among cases compared with controls, with three clusters of metabolites being evident (ESM Fig. 3). Compared with the GC/MS results, the separation of cases from controls was less optimal, with only moderate discriminatory power (AUC = 0.777), as observed using the ROC curve (data not shown). LASSO regression revealed a subset of seven metabolites (N 6-acetyl-l-lysine, caffeine, 4-methoxyphenylacetic acid, chondroitin sulphate, hyocholic acid/cholic acid/ursocholic acid, phenyl sulphate and α-hydroxyhippuric acid) that best predicted case status with an AUC of 0.870, an improvement on that achieved using PC scores (ESM Fig. 4). The potential effect of confounding on the association between metabolites and low eGFR by patient clinical variables was excluded. In univariate analyses, age at diabetes diagnosis, age at examination and serum creatinine were statistically significant at the 5% significance level (Table 1). However, only age at recruitment and serum creatinine were selected by LASSO regression when the significant metabolites listed above were included. The AUC for this model where age at recruitment and serum creatinine were added to the model was 0.978, which represents a significant improvement over the model with metabolites only (AUC = 0.870).

Validation

In an attempt to provide some kind of validation for the above GC/MS and LC/MS results, 45 individuals (23 controls, 22 cases) were randomly selected from the 90 participants and used to discover the important metabolites. These metabolites were then validated by calculating the AUC based on the 45 remaining unselected participants. This random selection was repeated ten times, yielding a range of AUC values. For GC/MS metabolites, the AUC for the validation set was consistent and ranged from 0.934 to 1.000. For the LC/MS metabolites, the range was less optimal, with AUC values of 0.477–1.000.

Discussion

Our study has succeeded in unveiling a wealth of information linking a number of urinary metabolites with low eGFR. Moreover, these novel and statistically robust associations were found in diabetic patients who were persistently non-proteinuric and thus would conventionally have been regarded to be at low risk of chronic kidney disease.

The candidate metabolites for low eGFR extended well beyond the few widely acknowledged uraemic toxins. Indeed, while we did find significant associations with urinary levels of uraemic toxins such as indoxyl sulphate, creatinine and the methoxylated form of phenylacetic acid, at least 13 other metabolites exhibited much stronger evidence of an association with low eGFR, with p values that were at least a magnitude smaller when compared with indoxyl sulphate (Tables 2 and 3). These new candidate biomarkers include oxalic acid, octanol, 3,5-dimethoxymandelic amide, N-acetylglutamine, benzamide, phosphoric acid, 2-hydroxyadipic acid, N 6-acetyl-l-lysine, chondroitin sulphate, citric acid, phenylacetyl-l-glutamine, 2-deoxyuridine and deoxypyridinoline.

Our literature review suggested that only a few of the metabolites, such as creatinine, oxalic acid and phosphoric acid, had been earlier linked to renal disease in humans [15, 16]. While raised serum creatinine serves as an indicator of renal function, urinary creatinine is primarily used in the context of a 24 h creatinine clearance test to measure GFR. Regarding oxalic and phosphoric acids, high urinary levels of these metabolites predisposed to the formation of kidney stones [17], but the effect of kidney stones on reducing renal function remained unclear [18]. It was noteworthy that the urinary levels of these metabolites were actually lower in the cases than controls in our study.

In particular, oxalic acid was detected in 44 out of 46 control participants (95.6%) but in none of the cases (ESM Fig. 5). This striking observation may be consistent with the systemic retention of this metabolite in the presence of low eGFR. In this connection, it is interesting to note that increased plasma oxalic acid levels had been previously linked with chronic kidney disease [19].

Chondroitin sulphate is a type of glycosaminoglycan that is covalently linked to proteins, forming proteoglycans. It is reportedly absent in the normal glomerular basement membrane, but its content increases soon after the onset of experimental diabetes [20, 21]. Semi-quantitative analysis suggested a reduction in total urinary glycosaminoglycan content in diabetic animal models compared with non-diabetic controls with increasing diabetes duration and this appeared to correlate with albuminuria [22]. In normoalbuminuric type 1 diabetic patients, increased urinary excretion of glycosaminoglycans, including chondroitin sulphate, was related to poorer glycaemic control and longer diabetes duration, although the association between the excretion of these metabolites and eGFR was not reported [23, 24].

An important highlight of our study is the previously unrecognised roles of the remaining metabolites in the modulation of eGFR. Even so, there may be some underlying biological plausibility. For example, cases with low eGFR had lower levels of benzamide compared with controls. Benzamide has been shown to be an endogenous inhibitor of poly(ADP-ribose) polymerase-1 (PARP-1) [25, 26], a key enzyme that has been strongly implicated in causing diabetes-associated endothelial dysfunction [27]. Pharmacological inhibition of PARP-1 ameliorated various features associated with nephropathy in both experimental models of type 1 and type 2 diabetes, including albuminuria [28, 29]. Urinary albumin and protein excretion were reduced in the streptozotocin-diabetic mouse model with the endogenous Parp1 gene constitutively ablated by gene knockout [30]. While these previous animal studies concentrated primarily on the effect of PARP-1 in ameliorating albuminuria, our present finding on benzamide provides intriguing fresh evidence of a role for PARP-1 in the modulation of eGFR in diabetic patients.

Despite the salient findings, certain limitations of our study should be acknowledged. First, our study was based on a case–control study design, and therefore we could not draw conclusions regarding causation as in a cohort. This case–control study design was, however, appropriate in the current instance because the comprehensive metabolomic profiling of a large cohort would have been prohibitive in terms of both logistics and costs. The selected metabolites here identified may now serve as prospective leads in future cohort studies. These same logistic and cost considerations also placed a necessary limit on our sample size, consequently reducing the power of our study. However, this decreased power did not appear to substantially affect our study, as we successfully detected a number of associations that remained robust after correction for multiple hypotheses testing. Third, GFR was estimated using a serum creatinine-based equation rather than being directly measured. This can be expected to have caused some degree of disease misclassification and, conceivably, made it harder to identify potential metabolites that had weaker associations with low eGFR. Another potential shortcoming was that our metabolomic study was confined to urine and not the corresponding blood specimens from the patients. This was due to the inherent technical difficulties in metabolomic analysis of this complex biofluid. Characterisation of the blood metabolome may have provided a more holistic context within which our findings could be interpreted.

To balance this, attention should be drawn to several strengths of our study. First, the metabolomic approach used was comprehensive by leveraging both GC/MS and LC/MS platforms. Second, positive and statistically robust associations were uncovered. In addition, the detection of associations with known uraemic toxins provided some degree of validation of our study design including the characterisation of our patient sample. Third, the chance of false positive findings was minimised by careful adjustment for multiple hypotheses testing. Fourth, our study focused on low eGFR in non-proteinuric patients. Aside from yielding new insight into the renal aspects of these patients, a significant benefit of our study design was the minimisation of any potential confounding due to the presence of proteinuria on the associations between the metabolomic profiles and low eGFR. Finally, to our knowledge, this is the first study of the metabolomics of eGFR in human type 2 diabetes. Previous reports have attempted to study changes in the urinary metabolome associated with type 2 diabetes per se, without any attention being paid to their possible association with the accompanying renal traits. Critically, few of these studies were on human specimens [31], with the majority being performed on experimental models of diabetes [3235].

In conclusion, our investigation of the metabolomics of low eGFR in non-proteinuric type 2 diabetic patients has yielded substantial biological insight. In addition, we have identified several potential candidate biomarkers that may prove useful for detecting and monitoring chronic kidney disease. Individually, these potential biomarkers, especially the seven GC/MS metabolites, showed highly significant associations with low eGFR and had good discriminatory power. When used collectively, the metabolomic signatures appeared to be promising for disease classification.