1 Introduction

Metabolomics, which is a sub-field of the new “omics” technologies, allows the systemic study of small-molecular weight metabolites expressed in a biological system (Ceglarek et al. 2009). These metabolites belong to diverse chemical classes such as amino acids, organic acids, fatty acids, or sugars (Chan et al. 2009; Dettmer et al. 2007). They are the final down-stream products of transcription and translation and thus are closest to the phenotype (Mamas et al. 2011). Hence, metabolomics promise to serve an important role in bridging the genotype-phenotype gap (Cascante and Marin 2008). In view of cancer disease previous research has particularly focused on the understanding of transcriptional regulation of cancer-associated gene expression, whereas less effort has been directed at investigations of metabolic alterations (Cardoso et al. 2007). Meanwhile, new mass spectrometry-based techniques allow a simultaneous and quantitative in-depth analysis of different metabolomic profiles in various biological samples (Ceglarek et al. 2009). These new approaches promise to enlighten the complex tumorigenesis associated metabolic alterations, which may accelerate the discovery of new diagnostic, prognostic, and predictive biomarkers. Despite this analytical evolution, however, profound knowledge and understanding of cancer metabolism is still lacking. This holds especially true for colorectal cancer (CRC), which is the third most common cause of cancer mortality in developed countries with an annual mortality of more than 500,000 cases (Herszenyi and Tulassay 2011). Colorectal cancer is among the top ten causes of death in Germany with an attributable fraction of 2% of all fatalities (DESTATIS, Statistisches Bundesamt Deutschland 2009). Comprising 16% of all cancer cases it is the second most common cancer type, and with 12–14% also the second most common cause of cancer death. Colonoscopy/sigmoidoscopy is still the “gold standard” in detecting colorectal carcinoma and high-risk adenomas, but its invasiveness, the experience of discomfort, the potential risks of complications, and the resources needed for the screening itself, when compared to FOBT, are disadvantages of concern (Bretthauer 2010). In contrast, the recommended iFOBT delivers sensitivity rates of 61–91% (Duffy et al. 2011), which are far from being satisfactory. Currently, there are no serum screening markers for CRC available. Carcinoembryonic antigen (CEA), the tumor marker of choice for CRC, is well suited for therapy monitoring, but also lacks the sensitivity needed for screening purposes (Shimwell et al. 2010). Earlier diagnosis of this cancer and early relapse monitoring after initial therapy are probably the best available options to improve patient survival. Established serum tumor markers such as CEA are useful to monitor the course of disease on and off treatment, but they lack the sensitivity and specificity criteria for screening stratification purposes (Tanaka et al. 2010). In this regard, mass spectrometry-based metabolic profiling may be valuable for the identification of new “disease signatures” and cancer-associated biomarkers. Preceding metabolic investigations primarily aimed at the discovery of metabolites significantly different between controls and CRC patients in serum, urine, and tissue (Wang et al. 2009) by applying standard test statistics [e.g. Student’s t-test, Wilcoxon-test, or the PCA (Principal Component Analysis)-based (O)PLS-DA (Ma et al. 2010; Qiu et al. 2009)] and revealed several potential marker metabolites. These preliminary results encouraged us to investigate the serum amino acid profiles and their alterations in CRC using a tandem mass-spectrometric approach (Fiedler et al. 2004; Mueller et al. 2003). In our study, we additionally considered two further aspects: to avoid pre-analytical flaws (Issaq et al. 2011) we strictly adhered to previously published protocols (Baumann et al. 2005; Brauer et al. 2011) as well as standardized sample processing (Ceglarek et al. 2002) and reporting (Fiehn et al. 2007). Second, to provide not only significant differences, but—as a core task of laboratory medicine—also statistically sound statements on their diagnostic surplus value, we additionally evaluated the markers we found and the conventional tumor marker CEA with respect to non-inferiority and superiority.

2 Materials and methods

2.1 Ethics statement

The study was approved by the ethics committee of the Medical Faculty of the University of Leipzig [Reg. No. 013-2005] and fulfills the requirements of the Helsinki declaration. All subjects gave written informed consent to participate in the study.

2.2 Patients and samples

Patients with CRC (n = 59) and respective controls (n = 58) were recruited at the University Hospital Leipzig in the context of a previously published study (Fiedler et al. 2009). Subjects were matched according to age and gender. Fasting blood sampling from patients was performed before initiation of specific therapy. Healthy controls called in for checkup showed no evidence of actual disease proven by physical examination and routine laboratory testing (differentials, C-reactive protein (CRP), creatinine, transaminases, alkaline phosphatase, γ-glutamyl transferase, bilirubin, tumor marker CEA). Venous serum samples were collected and stored by standardized techniques and protocols (Baumann et al. 2005; Brauer et al. 2011), including puncture of the cubital vein, 30–60 min coagulation at room temperature, centrifugation for 10 min at 1,400 g, immediate aliquotation and storage at –80°C until analysis.

2.3 Chemicals, standards and consumables

2.3.1 Materials

Methanol and isopropanol (gradient grade) were purchased from Merck (Darmstadt, Germany). The amino acid (AA) isotopes labelled standard kits (NSK-A, NSK-B, Cambridge Isotope Laboratories, Andover, USA) were used as internal standard. Water (HPLC grade) was obtained from J. T. Baker (Deventer, Netherlands). The derivatization reagent 3n butanolic HCl was made in-house by mixing 4:1 v/v of 1-butanol (for spectroscopy) from Merck (Darmstadt, Germany) and acetyl chloride (p.a.) from Sigma-Aldrich (Steinheim, Germany). 96-well polypropylene microtiter plates were purchased from Greiner Bio-One (Frickenhausen, Germany). Multifly needle sets and polypropylene serum monovettes with clotting activators were also obtained from Sarstedt. For sample storage 450 μl CryoTubes were purchased from Sarstedt (Nümbrecht, Germany).

2.4 Sample pretreatment

A sample derivatization protocol was used according to our formerly described procedures (Brauer et al. 2011; Ceglarek et al. 2002) to enhance the sensitivity of the mass spectrometric detection and thereby being able to minimize the sample volume (Harder et al. 2011). Serum samples were diluted 1:10 with methanol for protein precipitation. After centrifugation we placed 10 μl of the supernatant into 96 well polypropylene microtiter plates and diluted it with 100 μl of the internal standard solution. After evaporation at 70°C for 40 min, we added 60 μl of 3n butanolic-HCL for derivatization at 65°C for 18 min. Again, the residual solution was evaporated at 70°C for 40 min and then reconstituted with 150 μl of the mobile phase (1/1 v/v isopropanol/water). After 15 min of gentle shaking of the microtiter plate at room temperature, we analyzed the samples by flow injection analysis (FIA)-MS/MS. We aligned the samples in alternating series of 20 controls and cases on two microtiter plates and measured them in one analytical run on one day.

2.5 CEA

CEA was measured in serum samples by an electrochemiluminescence immunoassay (Roche, Germany) on Modular analytics E 170 analyzer (Roche, Germany) according to the manufacturer’s instructions.

2.6 Tandem mass spectrometry

An API 3000 tandem mass spectrometer (Applied Biosystems, Germany) using a Turbo Ion Spray Source (TIS) in combination with a HTC Pal autosampler and a PE 200 microgradient pump was used for flow injection analysis (FIA). 25 μl of the sample were directly injected at a flow rate of 80 μl/min in an analysis time of 1.5 min. We detected amino acids by a neutral loss scan of 102 in the mass range of 130–280 or multiple reaction monitoring (MRM). Quantitative analysis using internal standards was performed for 26 amino acids using ChemoView™ 1.4.2 (Applied Biosystems, Darmstadt, Germany). A comprehensive overview of mass transitions, internal standards, and performance data for the different amino acids can be found in Brauer et al. (2011).

2.7 Statistical analysis

Statistical testing was performed using ‘R’ (R Development Core Team 2008) with the packages ‘nortest’ (Gross 2006), ‘pROC’ (Robin et al. 2010, 2011), ‘BMA’ (Raftery et al. 2010), ‘car’ (Fox and Weisberg 2011), ‘care’ (Zuber and Strimmer 2010a), ‘ltm’ (Rizopoulos 2010), ‘boot’ (Davison and Hinkley 1997), ‘stats’, and ‘rattle’ (Graham 2009). To test for normality we applied the Anderson–Darling test from ‘R’s ‘nortest’ package (Stephens 2006). Gender distribution in the groups was evaluated applying Fisher’s exact test, group-related differences for the remaining parameters by Wilcoxon’s test (both from R’s ‘stats’-package). For bootstrapping, we used ‘R’s ‘sample’ function as well as the ‘boot’ package [B = 999 runs with replacement, cf. Carpenter and Bithell (2000)] to compute robust estimates for the 95% confidence intervals (CIs) of the medians as well as minima and maxima. For the CIs of the ROC curves and area under the ROC curves (AUROCs) we employed ‘pROC’s built-in boot.n function (also with B = 999 runs). Group-specific differences were evaluated by the Mann–Whitney-U-test (*P < 0.05; **P < 0.001). Kendall’s correlation plot (Fig. 1) was drawn with ‘rattle’ [see Murdoch and Chow (1996) for details]. We computed point-biserial correlation coefficients r pb by ‘R’s ‘ltm’ (Rizopoulos 2010) package, the significance levels thereof by the method proposed by Israel (Israel 2008). ROC curve and AUROC calculations were performed using ‘R’s ‘pROC’ package. Since we presumed that marker models including combinations of different amino acids and/or CEA might be superior over single amino acids and/or CEA regarding their selectivity, we also evaluated combined models. For this purpose, we selected amino acids significantly different between the groups (cf. Table 2) and, following power transformation (In-Kwon and Johnson 2000), performed a Principal Component Analysis [PCA, Eigenvalue <1, applying ‘R’s ‘princomp’ function of the ‘stats’ package (Venables and Ripley 2002)] thereon to minimize collinearities and to set up principal components for subsequent regression analysis and AUROC evaluation. We used combinations of these principal components as well as the concentrations of the significantly differing amino acids together with CEA in binary logistic regression modeling [package ‘BMA’, selection of the best fitting models and penalizing overfitting via the Bayesian Information Criterion (BIC), observing the degree of multicollinearity via the Variance Inflation Factor (VIF), model validation via the CAR score (defined as ‘the marginal correlations adjusted for correlation among explanatory variables’ with the acronym ‘CAR’ as an abbreviation for ‘Correlation-Adjusted (marginal) coRrelation’) applying package ‘care’ for the respective number of predictors, cf. Zuber and Strimmer (2010b)] to build predictors and to compare their selectivity with the selectivity of CEA alone by AUROC analysis in turn. Pvalues for the comparison of AUCs are computed as proposed by DeLong et al. (1988) for AUCs generated by the ‘pROC’ package [cf. Robin et al. (2010)]. ‘Best’ thresholds determined by the Youden index were computed using the ‘pROC’ package according to the method suggested by Perkins and Schisterman (2006). Non-inferiority and superiority testing thereafter was performed applying bootstrap techniques (B = 999, package ‘boot’) on ΔAUROC, constructing CIs thereof, and testing for embracement of Δ0 by the ΔAUROC’s CI according to the methods proposed by Liu et al. (2006), Tunes da Silva et al. (2009), and Lesaffre (2008) at a predefined δ L level of 5% which we considered to be medically reasonable (Mascha 2010) designing the study.

Fig. 1
figure 1

Correlation matrix plot of the amino acid concentrations [Kendall’s τb, circle denotes low correlation and oval denotes high correlation, positive, negative, and neutral correlations are displayed blue, red, and white, respectively (see Murdoch and Chow (1996) for details)] to display the mutual collinearities between amino acid concentrations. The top line (shaded) shows point biserial correlations r pb and significance levels for the correlation of the respective amino acid with the health state (*P < 0.05; **P < 0.01; ***P < 0.001)

3 Results

For our investigations we collected serum samples of 59 (37m/22f) colorectal carcinoma patients and 58 [26m/32f; P = not significant (n.s.)] healthy controls. Age, UICC (Union Internationale Contre le Cancer) stagings of the patients, and CEA concentrations are displayed in Table 1.

Table 1 Baseline characteristics

3.1 Descriptives

To generate robust estimators for the amino acid concentrations in the colorectal carcinoma patients and healthy controls covering the value range and the 95% confidence interval of the median for comparative studies, we applied bootstrapping techniques—the resampling results are summarized in Table 2 and displayed separately for both groups. In total, we found 19 of 26 amino acids decreased (11 thereof significantly) and 7 amino acids increased (none significantly) in colorectal carcinoma patients compared to healthy controls. Several amino acids were non-normally distributed across both study groups (Table 2).

Table 2 Median, minimum, maximum, and the 95% CI of the median for amino acids in colon cancer patients and controls

3.2 Correlations

To evaluate the correlation of the significantly different amino acids with the health state and with each other, we computed point-biserial correlations (r pb ) and Kendall’s τb, respectively. r pb ranged between –0.38 and 0.10 (with P < 0.05 for \( \left| {r_{pb} } \right| \ge 0. 1 9 9 \)), Kendall’s τ b between –0.01 (P = 0.57) and 0.54 (P < 2.2 × 10−16). The results are limned in Fig. 1.

3.3 Modelling

As we conjectured that the use of combinatory markers including multiple amino acids and/or CEA could yield additive effects and outclass single amino acids and/or CEA (Schneider et al. 2002; Wild et al. 2010), we built combined models to evaluate their selectivity in comparison with single amino acids and CEA. To control for multicollinearity, which is a significant restraint for logistic regression (Leigh 1988), we performed PCA following power transformation on the significantly different amino acids and processed the resulting principal components (PCs) as well as the corresponding single amino acid concentrations and CEA by binary logistic regression modeling. The best PCA-based model comprised CEA and PC1, the best amino acid concentration-based model CEA, Glycine, and Tyrosine with a slightly better fit (ΔBIC–2.8).

3.4 Receiver–operator-characteristics analysis

We used the single amino acid concentrations as well as the predictors gained from the two best-fitting models for AUROC analysis. The results are displayed in Table 3 for the amino acids and in Fig. 2 for the predictors compared to CEA. The best-discriminating model comprised CEA, Glycine, and Tyrosine (AUROC 0.878, 95% CI 0.815–0.941), followed by the model comprising CEA and PC1 (AUROC 0.844, 95% CI 0.773–0.916), CEA alone (AUROC 0.794, 95% CI 0.712–0.877) and Glycine (AUROC 0.707, 95% CI 0.613–0.801) as the best discriminating single amino acid. Both predictor models were not significantly different, the CEA, Glycine, and Tyrosine model, however, differed highly significant from CEA alone (P = 0.015).

Table 3 Amino acid AUROCs and their 95% CIs for the differentiation of colon cancer patients and controls
Fig. 2
figure 2

ROC curves, 95% confidence bands (B = 999) and AUROCs for CEA (red), the model consisting of CEA and principal component 1 (PC1 orange), and the model consisting of CEA, glycine, and tyrosine (green). Sensitivity and specificity and their 95% confidence intervals (B = 999) are given for the best thresholds and denoted by (☩). P-values between AUCs are computed according to DeLong et al. (1988) as described in the Sect. 2

3.5 Non-inferiority and superiority testing

Since significant difference is not an adequate measure of non-inferiority or superiority, we performed sequential testing for both with an a priori-defined acceptance criterion (equivalence limit) of δ L  = 5% ΔAUROC. We computed the lower and upper limits of the (100 − 2 δ L )% bootstrap confidence interval (0.0239–0.1422) of the estimated ΔAUROC (B = 999) as proposed by Liu et al. (2006). Referring to Mascha (2010) we deduced non-inferiority in a first step, and—as the lower CI of ΔAUROC is >0—inferred superiority of the model containing CEA, Glycine, and Tyrosine over CEA alone thereafter.

4 Discussion

We found the concentrations of 11 out of 26 serum amino acids as significantly different between CRC patients and healthy controls. To additionally evaluate the diagnostic potential of our results, we applied a bioinformatic pipeline comprising common standard test statistics as well as AUROC analysis followed by non-inferiority/superiority testing and found a model including CEA, glycine, and tyrosine as best discriminating and superior to CEA alone with an AUROC of 0.878 (95% CI 0.815–0.941).

The rationale of our study was to detect and evaluate alterations of the amino acid profile in CRC patients as a potential source of complementary metabolic markers by means of standardized preanalytics, a routinely applicable analytical method, and a computational procedure, which allows statistically sound statements (Walker and Nowacki 2010) on the diagnostic surplus value.

The recent methodological advancements in the rapidly emerging field of clinical metabolomics preceded numerous studies investigating metabolic signatures of colorectal cancer. A variety of different mass-spectrometric platforms has been applied and yielded different marker analytes. Comparable to our results, Qiu et al. (2009) also found serum lysine, leucine, threonine decreased, valine, and tyrosine decreased [the latter with both applied techniques, Gas chromatography time-of-flight mass spectrometry (GC-TOF MS) and Ultraperformance liquid chromatography-quadrupole time-of-flight mass spectrometry (UPLC-QTOF MS)], Ma et al. (2009) valine, threonine and glycine significantly decreased in CRC patients using GC-MS. Denkert et al. (2008) found in a comprehensive study alanine, methionine, threonine, leucine, isoleucine, valine, and less significantly also glycine and lysine (GC-TOF MS) increased in human CRC tissues, Chan et al. (2009) as well as Tessem et al. (2010) [both with High-resolution magic angle spinning nuclear magnetic resonance spectrometry (HR-MAS NMR)] and Hirayama et al. (2009) [Capillary electrophoresis time-of-flight mass spectrometry (CE-TOF MS)] found glycine, and Ong et al. (2010) [Gas chromatography mass spectrometry (GC-MS)] methionine and tyrosine elevated in CRC tissue samples, whereas Ma et al. (2010) (also with GC-MS) comparing pre-/postoperative samples found serum valine decreased and tyrosine increased after surgery. Another study by Qiu et al. (2010) focused on urinary metabolites and revealed lower histidine concentrations in CRC urine samples. The comparability of these studies among each other and with our study is limited due to different analytical techniques, sample material as well as provenience, and patient classificators. Actually, there is no published study comparing MS/MS-based serum amino acid profiles of CRC patients and controls (Wang et al. 2010). Additionally, the investigations of Ma et al. (2010) and Qiu (2009) primarily aimed at the detection of significant differences of the focused metabolites and not at the evaluation of their diagnostic selectivity by canonical methods like (Area under the) receiver–operator-characteristics [(AU)ROC] analysis, as e.g. Ritchie et al. (2010) performing Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) experiments and Chan et al. (2009) performing HR-MAS NMR analyses did for ultra long-chain fatty acids and tissue samples, respectively. In both of these last-mentioned studies the discriminating analytes were, as “standalone”-markers or as OPLS-DA-models, evaluated independently from the conventional tumor marker CEA and their equivalence was not surveyed.

Despite the limited comparability of the above-mentioned investigations with our study, the eleven significantly decreased amino acids we detected were—except for sarcosine, and with varying significance—also identified as marker metabolites for CRC in other studies.

The best discriminating amino acid glycine (AUROC 0.707), which was also found by Ma et al. (2009) and Tessem et al. (2010), is an important intermediate in the folate metabolism, which is especially altered in colon cancer (Stover and MacFarlane 2008). Due to its recently demonstrated bowel-protective effects and its easy applicability it is also a highly interesting candidate for therapeutic approaches (de Aguiar Picanco et al. 2011).

In biomarker research, it is a common practice to analytically determine marker molecules, compute the statistical significance of their differences between groups and to perform e.g. PCA-based analyses resulting in rotated, non-linear, and non-retraceable (and therefore non-comparable), but stunningly separating “component plots”, leaving the reader alone with their interpretation. Only few studies implemented AUROC analyses (Chan et al. 2009; Ritchie et al. 2010) and none evaluated non-inferiority/superiority. It is a core task and competence of laboratory medicine, not only to investigate potential biomarkers, but also to generate lucid test characteristics, which allow the comprehensible interpretation of their diagnostic value and their translation into clinical practice. Therefore, we augmented our study by four add-ons: first, we generated robust estimates for summary statistics of the amino acid concentrations to facilitate comparability with other studies despite the relatively small number of samples. Second, we integrated the anyway available CEA concentrations into BIC-based logistic regression modeling to utilize its auxiliary selectivity. Third, we compared amino acid concentrations themselves with the typically used principal components thereof and—fourth—specified the non-inferiority/superiority of our models compared to the conventional tumor marker CEA based on AUROC analyses.

Tyrosine was identified as marker metabolite by Qiu et al. (2009), Ma et al. (2010, 2009), Vecer et al. (1998), and Ong et al. (2010) and associated with cancer-related alterations of the TCA cycle by Hirayama et al. (2009), but in contrast to glycine, it displayed inferior selectivity with an AUROC of 0.636 in our study. However, combined with CEA and glycine, it surprisingly is part of the overall best discriminating model, suggesting that its contribution to the selectivity of the model features aspects, which are different from or stronger than other amino acids, even if the latter (e.g. histidine or alanine) individually are stronger discriminators. Interestingly, the model comprising the two amino acid concentrations was slightly more selective than that based on PCA, inferring that variance-based techniques—despite their frequent application—might not always yield optimal classifications.

Even though the AUROCs of the best-discriminating amino acids almost reach the range of CEA, they are of limited use as single discriminators. In combination with the conventional tumor marker CEA, which had an AUROC (0.794) comparable to the recent literature (Wild et al. 2010), glycine and tyrosine, however, introduced a surplus of 8%, suggesting that—regarding their AUROCs of 0.707 and 0.636 as single markers—their principal value might be of additive nature and be missed when routinely available CEA concentrations are not taken into account.

Besides these emboldening results, there are some limitations to consider: whereas the determination of glycine and tyrosine is based on isotope labeled internal standards and of quantitative nature, the concentrations of several other amino acids (cf. Table 2) have to be considered as relative. For AUROC analyses, this might not be relevant, since deviations affect both, controls and patients. Nonetheless, direct comparisons with other studies are limited for these parameters. Another criticism might arise from the considerable overlap of the amino acid concentrations between patients and controls as outlined by Issaq et al. (2011) reviewing a metabolomic tumor marker study of Kim et al. (2010) and denying potential use of the analytes displayed there as biomarkers or for population studies. However, applying the same standards on established tumor markers, even CEA would inevitably fail. To minimize this inherent overlap problem, we generated multiparametric (and thereby multidimensional) models (Robin et al. 2009; Wild et al. 2010). A third constraint is implicated in our dichotomous study design consisting of two well-separated cohorts, colorectal tumor patients on the one and healthy controls on the other side. This pilot study design is suited to maximize differences between both groups to enhance pre-test probability, but it cannot deliver differentiation between CRC and e.g. inflammatory bowel diseases. Along with the size of the cohort differences the statistical power is diminishing and the number of study participants must be raised. With our preliminary study we tried to find an optimal significance of the results balancing study size and study group dissimilarity.

5 Concluding remarks

We analyzed 117 highly standardized serum samples and generated amino acid profiles by applying a relatively simple high-throughput mass-spectrometric technique. In addition to the common reporting of the significance of the concentration difference between groups and PCA-based modeling, our bioinformatic pipeline included (AU)ROC analyses, integrative BIC-based logistic regression modeling and non-inferiority/superiority testing to exemplarily and numerically determine the diagnostic surplus value for the clinician and to avoid the ambiguity remaining with a significant difference. In comparison with the conventional tumor marker for colorectal cancer CEA, our model additionally containing glycine and tyrosine was superior. Further large-scale studies are necessary to elucidate the potential of this model also to discriminate between cancer and potential differential diagnoses.