Pancreatic carcinoma, pancreatitis, and healthy controls: metabolite models in a three-class diagnostic dilemma

Metabolomics as one of the most rapidly growing technologies in the “-omics” field denotes the comprehensive analysis of low molecular-weight compounds and their pathways. Cancer-specific alterations of the metabolome can be detected by high-throughput mass-spectrometric metabolite profiling and serve as a considerable source of new markers for the early differentiation of malignant diseases as well as their distinction from benign states. However, a comprehensive framework for the statistical evaluation of marker panels in a multi-class setting has not yet been established. We collected serum samples of 40 pancreatic carcinoma patients, 40 controls, and 23 pancreatitis patients according to standard protocols and generated amino acid profiles by routine mass-spectrometry. In an intrinsic three-class bioinformatic approach we compared these profiles, evaluated their selectivity and computed multi-marker panels combined with the conventional tumor marker CA 19-9. Additionally, we tested for non-inferiority and superiority to determine the diagnostic surplus value of our multi-metabolite marker panels. Compared to CA 19-9 alone, the combined amino acid-based metabolite panel had a superior selectivity for the discrimination of healthy controls, pancreatitis, and pancreatic carcinoma patients \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ [ {\text{volume under ROC surface}}\;\left( {\text{VUS}} \right) = 0. 8 9 1 { }\left( { 9 5\,\% {\text{ CI }}0. 7 9 4- 0. 9 6 8} \right)]. $$\end{document} We combined highly standardized samples, a three-class study design, a high-throughput mass-spectrometric technique, and a comprehensive bioinformatic framework to identify metabolite panels selective for all three groups in a single approach. Our results suggest that metabolomic profiling necessitates appropriate evaluation strategies and—despite all its current limitations—can deliver marker panels with high selectivity even in multi-class settings.

high-throughput mass-spectrometric technique, and a comprehensive bioinformatic framework to identify metabolite panels selective for all three groups in a single approach. Our results suggest that metabolomic profiling necessitates appropriate evaluation strategies and-despite all its current limitations-can deliver marker panels with high selectivity even in multi-class settings. Pancreatic cancer is the fourth leading cause of cancer death in the United States, and most patients diagnosed with pancreatic cancer develop clinical symptoms usually late in the course of the disease (Lowenfels and Maisonneuve 2006). Therefore, only 20 % of patients can be treated with a potentially curative therapy and only about 3-5 % survive at least 5 years (Michl et al. 2006). For these patients, time, especially the so called 'biomarker lead time' between the onset of asymptomatic cancer still localized to the organ of origin and clinical diagnosis (Konforte and Diamandis 2012), is crucially important (Hazelton and Luebeck 2011).
Although recent modeling studies have illustrated that bloodbased biomarkers might provide a successful tool for the early detection and differentiation of premalignant lesions, substantial methodological enhancements of unanticipated extent (Burgess 2012) are still required. Yachida et al. (2010) demonstrated a latency of about 17 years from the initiating mutation to pancreatic cancer death. Similarly, Hori and Gambhir (2011) stated ''that shedding rates of current clinical blood biomarkers are likely 10 4 -fold too low to enable detection of a developing tumor within the first decade of tumor growth'' and suggested to increase sensitivity and specificity by introducing multi-marker panels of up to 10 biomarkers. In a proof-of-principle study for evaluating the utility of multiplexed circulating biomarkers, Brand et al. (2011) investigated the selectivity of 83 proteins and their combinations. Two panels consisting of CA 19-9, ICAM-1, and OPG, as well as CA 19-9, CEA, and TIMP-1 were found to discriminate pancreatic cancer patients from healthy control subjects and from patients with benign pancreatic conditions, respectively. Since the cohorts in this study were compared separately, an integral model encompassing all three disease states was not obtained. Whereas Brand et al. (2011) focused on known tumor markers, tumor-associated peptides, etc., other studies have employed several of the emerging ''-omics'' subspecialties, such as proteomics (Fiedler et al. 2009), transcriptomics (Zhang et al. 2010), and-as the probably closest to the ''bedside'' (Van and Veenstra 2009)-metabolomics (Bathe et al. 2011;Ceglarek et al. 2009;Nishiumi et al. 2010;OuYang et al. 2011;Urayama et al. 2010;Zhang et al. 2011). The latter bears the chance to learn from the intricacies that have plagued ''-omics'' researchers over the last years, standardization (Van and Veenstra 2009), data processing (Blekherman et al. 2011) and data interpretation (Kholodenko et al. 2012), amongst others. In this study, we addressed these challenges by using a three-class study design. We collected highly standardized samples of pancreatic cancer patients, subjects with pancreatitis, and healthy controls. Following tandem mass spectrometric metabolite profiling, we evaluated the differences between groups and applied Bayesian methodology to identify multi-metabolite models as ''metamarkers'', which are selective for each of the three study groups and provide improved diagnostic performance compared to CA 19-9, the conventional tumor marker.

Patients and samples
We recruited patients suffering from pancreatic cancer (n ¼ 40), healthy controls (n ¼ 40), and patients hospitalized due to acute pancreatitis (n ¼ 26) at the University Hospital of Leipzig in the context of previously published studies (Fiedler et al. 2009;Leichtle et al. 2012). We collected cubital vein fasting samples of cancer patients and healthy controls in two independent sets. Additionally, we collected fasting serum samples of 26 patients with pancreatitis as inflammatory control group (A, C, and D, n total ¼ 106; Table 1). We adjusted subjects according to age and gender and performed blood sampling from patients before the initiation of specific therapy. Diagnosis of pancreatic cancer was confirmed by histologic examination in all cases. Healthy controls showed no evidence of actual disease in physical examination and routine laboratory testing [alkaline phosphatase, bilirubin, C-reactive protein, CA 19-9, CEA, creatinine, c-glutamyltransferase, transaminases (Roche Modular, Germany)]. Pancreatitis patients were diagnosed clinically without proof of pancreatic carcinoma during the study period. Serum samples were collected and stored (at -80°C) using standardized techniques and protocols (Baumann et al. 2005).

Chemicals, standards and consumables
Methanol and isopropanol (gradient grade) were purchased from Merck (Darmstadt, Germany). The amino acid isotope labelled standard kit (NSK-A, Cambridge Isotope Laboratories, Andover, USA) was used as internal standard. Water (HPLC grade) was obtained from J. T. Baker (Deventer, Netherlands). The derivatization reagent 3n butanolic HCl was made in-house by mixing 4:1 v/v of 1-butanol (for spectroscopy) from Merck (Darmstadt, Germany) and acetyl chloride (p.a.) from Sigma-Aldrich (Steinheim, Germany). 96-well polypropylene microtiter plates were purchased from Greiner Bio-One (Frickenhausen, Germany). Sampling material was obtained from Sarstedt (Nümbrecht, Germany). For sample storage 450 lL CryoTubes TM were purchased from Sarstedt.

Sample pretreatment
Sample derivatization was performed according to our previously described protocols (Brauer et al. 2011). Shortly, serum samples were diluted 1:10 with methanol for protein precipitation. After centrifugation we placed 10 lL of the supernatant into 96-well polypropylene microtiter plates and diluted it with 100 lL of the internal standard solution. Following evaporation at 70°C for 40 min, we added 60 lL of 3n butanolic-HCl for derivatization at 65°C for 18 min. The residual solution was again evaporated at 70°C for 40 min and then reconstituted with 150 lL of the mobile phase (1/1 v/v isopropanol/water). After 15 min of gentle shaking of the microtiter plate at room temperature, we analyzed the samples by flow injection analysis (FIA)-tandem mass spectrometry (MS/MS).

Tandem mass spectrometry
An API 3000 MS/MS (Applied Biosystems, Germany) equipped with a Turbo Ion Spray Source (TIS) in combination with an HTC Pal autosampler and a PE 200 microgradient pump was used for flow injection analysis (FIA). 25 lL of the sample were directly injected at a flow rate of 80 lL/min in an analysis time of 1.5 min. We detected amino acids by a neutral loss scan of 102 in the mass range of 130-280 or multiple reaction monitoring (MRM). Quantitative analysis using internal standards for 26 amino acids was performed using ChemoView TM 1.4.2 (Applied Biosystems, Germany). A comprehensive overview of mass transitions, internal standards, and performance data for the different amino acids is summarized in Brauer et al. (2011).

Bioinformatic analysis
Statistical analyses were conducted (unless otherwise stated) using R for Windows (Version 2.14.2) and its related CRAN packages (http://cran.r-project.org/). We tested data for normality by the Anderson-Darling test (nortest) and the gender distribution in the subgroups by the binomial test (stats). The homogeneity of variances of the quantitative routine laboratory data was evaluated with the approximative Fligner-Killeen test (stats), whereas the paired differences were investigated by Games-Howell testing (script source: http://aoki2.si.gunma-u.ac.jp/R/src/ tukey.R). Three missing CA 19-9 values were imputed by multiple imputation (MI) with 3 chains of imputation (which were averaged thereafter), aR value of 1.1, and bootstrap as random imputation method until conversion (after 7111 iterations). Three pancreatitis samples with non-random missing data as a consequence of insufficient sample volume were excluded from further analysis. The selectivity of single amino acid concentrations was assessed in all disease states simultaneously via the volume under ROC surface (VUS), which is the three-dimensional analogue of AUROC analysis (Nakas and Yiannoutsos 2004). The VUS' and their associated confidence intervals were calculated nonparametrically using B ¼ 2; 000 bootstraps and 50 k subdivisions on the amino acid concentrations (DiagTest3Grp). As we assumed a high degree of collinearity in the amino acid concentrations, we computed Kendall's s as well as its Hochberg-adjusted significance (ltm, corrplot) and plotted the correlation matrix ( Fig. 1). Based on our previous results indicating that marker Table 1 Baseline data for the two study sets (A and C) as well as the pancreatitis cohort (D) Unit or stage Data are shown as numbers or median (percentiles 2.5-97.5 %). Normality was tested with the Anderson-Darling test, non-normally distributed data are denoted in italics. N.c. not classified models comprising combinations of different amino acids and/or CA 19-9 might be superior to single amino acids and/or CA 19-9 with respect to their selectivity (Leichtle et al. 2012), we also evaluated combined models. These models consisted on the one hand of the conventional tumor marker CA 19-9 combined with different principal components (PC1, PC2, …) of the different amino acid concentrations to control for multicollinearity, which is a significant constraint on variable selection (Leigh 1988).
On the other hand we also combined CA 19-9 with mere amino acid concentrations to avoid potentially unnecessary transformation steps prior to modeling. After Yeo-Johnson transformation (car) of the amino acid concentrations, we generated PCs (princomp and factoMineR), from which the first six PCs had eigenvalues [1.0 and cumulatively covered 78.7 % of the variance. For the modeling in a threestate design, we merged sample set A with C and obtained one tumor group, one control group and one pancreatitis group (set D). In the second step we used CA 19-9 alone and combined with the PCs of the amino acid concentrations as well as with mere concentrations for Bayes-averaged multinomial logit modeling [mlogitBMA, bic.mlogit(mlogitBMA)] using Begg and Gray approximation. We validated the latter model by CAR ['Correlation-Adjusted (marginal) coRrelation'] scores (care) assuming empirical values of 1.0, 0.3, and 0.1 as responses of the pancreatic carcinoma, the healthy control, and the pancreatitis groups, respectively, in a CAR model truncated to a number of variables comparable to the penalized multinomial logit model. We computed the VUS for the four predictors, namely CA 19-9, the PCA-based mlo-gitBMA-predictor (PCA), the amino acid concentrationbased mlogitBMA-predictor (AA), and the amino acid concentration-based CAR-model-predictor (CAR) analogously to the VUS values of the amino acid concentrations. The ROC surface plots (Fig. 2) were drawn using MAT-LAB (The MathWorks, Natick, MA, USA). Since significant differences of the VUS between predictors and CA 19-9 do not imply inferiority or superiority, we performed non-inferiority and superiority testing applying bootstrap techniques on D VUS (B outer ¼ 1; 000, boot, B inner ¼ 1; 000, DiagTest3Grp, UBELIX Cluster of the University of Bern). We constructed the corresponding CIs and tested for the overlap of D 0 and the D VUS ' CI according to the methods proposed by Liu et al. (2006), Tunes da Silva et al. (2009), and Lesaffre (2008) at a predefined d level of 5 % which was considered to be medically reasonable designing this and a previous study (Leichtle et al. 2012). We visualized the performance data in a forest plot [ Fig. 3, forestplot(rmeta), R version 2.15.0, cf. Mascha (2010)].

Ethics
The study was approved by the Ethics Committee of the Medical Faculty of the University of Leipzig (Reg. No. 013-2005) and it fulfills the requirements of the Helsinki declaration. All study subjects gave written informed consent to participate in the study.
3 Results and discussion

Descriptives
We collected serum samples of 40 (20 males/20 females) pancreatic carcinoma patients, 26 (22 males/4 females, P binomial ¼ 0:0005) pancreatitis patients, and 40 (20 males/ 20 females) healthy controls. Table 1 displays the distributions of age, BMI, UICC cancer staging of the patients, the CA 19-9, bilirubin, and HbA1c concentrations in the three different sample sets. Of 26 amino acid concentrations, we found only four (arginine, glutamic acid, phenylalanine, and tryptophan) unaltered between the study groups (Table 2). Several amino acid concentrations were non-normally distributed. In order to detect sample setspecific alterations in the values, we also compared the amino acid concentrations of the tumor patients and the healthy control group between the two sample sets and found no significant differences ( For compound abbreviations, see Table 2 3

.2 Correlations
We evaluated the multicollinearity of the amino acid concentrations by generating their correlation matrix (Kendall's s and its Hochberg-adjusted significance). Kendall's s values ranged between -0.516 (aspartic acid with glutamine) and 0.709 (threonine with glutamine), the P values between 0.001 and 0.97 (Fig. 1).

Modeling
We hypothesized that combinatory markers including several amino acids and/or CA19-9 might have additive or even multiplicative effects and thus be superior to single amino acids and/or CA 19-9 in diagnostics of pancreatic cancer (Brand et al. 2011). Therefore, in addition to evaluating the single VUS of the amino acid concentrations, we also generated models based on CA 19-9 combined with PCs and Bayesian multinomial logit model averaging (mlogitBMA) as well as on CA 19-9 conjoined with amino acid concentrations. Furthermore, we used models based on CAR scores to evaluate their three-class selectivity in comparison with that of single amino acid concentrations and CA 19-9 as a validation method for the mlogitBMA   models. The best PC-based mlogitBMA-model comprised CA 19-9 and PC2. The best amino acid concentrationbased mlogitBMA-model contained CA 19-9 and aspartic acid, which both were also contained in the truncated amino acid concentration-based CAR score-model. To evaluate the selectivity of both, amino acid concentrations and modeled predictors, we computed their volume under the ROC surface (VUS) (  (Landgrebe and Duin 2007) with a value of 1: " 6: To illustrate the selectivity of CA 19-9 and the predictors, we generated true class rate-plots depicting the ROC surface (Fig. 2).

Non-inferiority and superiority testing
Since it is generally accepted that significant difference alone might not be an adequate measure of non-inferiority or superiority, we sequentially tested for both with an a priori-defined acceptance criterion (equivalence limit) of d ¼ 5 % D VUS . We computed the lower and upper limits of the (100 -2d) % bootstrap confidence interval of the estimated D VUS as proposed by Liu et al. (2006). The results are shown in Fig. 3. Following the criteria by  Mascha (2010) we deduced non-inferiority for predictors AA and CAR compared to CA 19-9. Furthermore, superiority of the AA and CAR predictors over CA 19-9 alone was derived in a second step, since the lower CI of D VUS was positive. ''The ideal biological marker(s) for cancer risk assessment and early detection must have high sensitivity and specificity, be found in a biosample obtained using minimally invasive procedures, and be analyzed using a highthroughput, cost-effective assay.'' These requirements stated by Van and Veenstra (2009) are challenging to fulfill. Particularly, in the case of pancreatic cancer diagnostics this challenge is even bigger due to the number of differential diagnoses, which are difficult to discern from malignant disease even for experienced clinicians (Gong et al. 2012). Furthermore, chronic pancreatitis patients also have a 15-fold higher risk than the general population to develop pancreatic cancer (Huggett and Pereira 2011). In order to identify biomarkers capable of discriminating different disease states, we designed a study including pancreatic cancer patients and healthy controls of two independently collected sample sets as well as an additional group of pancreatitis patients, since the principal feasibility of the metabolomic approach to pancreatic cancer was recently shown (Bathe et al. 2011;Tesiram et al. 2012;Zhang et al. 2012). Samples were processed following highly standardized preanalytical protocols and applying a routinely used tandem mass spectrometric technique. By comparing the sample groups, we found 22 of 26 amino acids altered in at least one out of ten possible comparisons. The number of different metabolites is comparable to that given by Bathe et al. (2011)  healthy volunteers identifying 18 of 60 metabolites as significantly different. In addition to the inter-class comparisons of the different sample sets, we also evaluated the inter-sample set differences in the respective classes (cancer A -cancer C and control A -control C ) and found no significant differences. Since the sample groups were homogeneous, we preferred a joint analysis in the modeling approach over a split-half design to keep the degree of random error as low as possible (Knottnerus and Muris 2003;Ransohoff and Gourlay 2010). Although the previously published metabolome profiling studies of pancreatic carcinoma are heterogeneous regarding the used massspectrometric techniques and the studied metabolites, they all share canonical variance-based evaluation strategies with two-class comparisons. Additionally, only one of the studies (Bathe et al. 2011) assessed the selectivity (e.g. AUROC or VUS analyses) of the marker metabolites. Our aim was to perform a comprehensive data analysis that also allows a clear interpretation of the diagnostic value of the markers (Leichtle et al. 2012). To this end, we implemented four unexampled features in our bioinformatic pipeline: (1) The computation of three-class VUS values of the single amino acid concentrations as an integral selectivity measure, (2) a Bayesian multinomial logit model Table 3 Volume under receiver operator characteristics curve (VUS) and 95 % confidence intervals of the amino acid and CA 19-9 concentrations as well as of a random classifier [cf. Landgrebe and Duin (2007)] with respect to the discriminatory power between pancreatic cancer patients, healthy controls, and pancreatitis patients For compound abbreviations, see Table 2 averaging procedure to extend the previously used binomial logistic regression modeling (Leichtle et al. 2012) on the three-class study design to generate multi-marker panels (including CA 19-9), (3) the VUS-based analysis of the panel predictors, and, finally, (4) their non-inferiority and superiority determination. The VUS values of the single amino acid concentrations ranged from 0.18 slightly above a random classifier to 0.85 (glutamine), which is close to the best panel predictors. As none of the previous metabolite profiling studies on pancreatic cancer performed VUS analysis, we can only rely on the utterly inconsistent P values they present, in Urayama et al.'s (2010) case 0.000021, or in Nishiumi et al.'s (2010 0.97 for glutamine. CA 19-9 alone reached the 10th rank, which is probably attributable to its low selectivity between benign and malign pancreatic diseases (Fig. 2a). Since our previous investigations (Leichtle et al. 2012) indicated a high degree of multicollinearity in the amino acid concentrations, which is known to impede many feature selection techniques (Jesneck et al. 2009;Leigh 1988), we set up a Kendall's correlation matrix to visualize the multicollinearity and its significance. As expected, the full range of correlation spanned from -0.516 to 0.709, which supported the inclusion of the frequently recommended PCbased analysis approach, albeit it has been shown that variance-based techniques might not always yield optimal predictors (Leichtle et al. 2012). To compute robust predictive multi-metabolite marker panels, we combined CA 19-9 and the PCs as well as CA 19-9 and the mere amino acid concentrations and used a Bayesian multinomial logit model averaging procedure for our categorical three-class study design (Robin et al. 2009). The first model consisted of CA 19-9 and PC2 providing a twomarker ''panel'' predictor (PCA) with a VUS of 0.604. The omission of PC1 and preference of PC2 with less contribution to explained variance during the mlogitBMA procedure is an astounding finding possibly reflecting a predilection of variables sharing covariance with CA 19-9. The second model based on amino acid concentrations included CA 19-9 and aspartic acid providing a two-marker ''panel'' predictor (AA) with a VUS of 0.891. Our results indicate that CA 19-9 provides the selectivity mainly for the discrimination between healthy controls and pancreatic cancer patients (  (Brauer et al. 2011). On the other hand, regarding the substantial impact of especially pancreatic disease on nutrition, it was not unexpected to find models different to those of our previous study on colorectal cancer (Leichtle et al. 2012). The mechanisms disturbing amino acid homeostasis and enabling the discrimination of pancreatic cancer patients from pancreatitis patients on the basis of metabolite profiles are not entirely elucidated. Schrader et al. (2009) suggested-apart from malnutrition-mainly inflammatory effects and pointed at the inverse relationship between the circulating amino acid concentrations and the degree of inflammation present e.g. in hemodialysis patients. Whether increased tumor-associated proteolytic activity (Findeisen and Neumaier 2012) contributes not only to the generation of specific peptide decay profiles, but also to the specificity of the amino acid profiles is still unknown. To validate our results and the Bayesian modeling approach, we also applied model selection techniques based on CAR scores (Zuber and Strimmer 2011) as a non-Bayesian linear alternative. Since our study covered three-more or less-independent classes, we could neither rely on a binary (CAT score) nor on a metric (CAR score) response. Therefore we assumed empirical values of 1.0, 0.3, and 0.1 as ''responses'' of the respective groups while acknowledging that such a procedure might be somewhat artificial and not necessarily justified by the underlying pathophysiology. To gain a comparable number of model variables as in the penalized mlogitBMA-model and thereby an at least limited comparability, we used a two-predictor CAR model including CA 19-9 and aspartic acid. The CAR panel predictor had a VUS of 0.871 similar to the value obtained with the Bayesian modeling approach. As the final evaluation step, we performed a twostep non-inferiority and superiority testing based on the bootstrapped D VUS and on a ± d equivalence range of 5 % as outlined in a previous study (Leichtle et al. 2012). CA 19-9's VUS ± d served as reference we tested the other predictors' D VUS against. In the first step, we observed non-inferiority only for the AA and CAR panel predictors, but not for the PCA panel predictor, whereas in the second step, we determined superiority of AA and CAR panel predictors (Fig. 3). These encouraging results indicate an improved selectivity of the models compared to CA 19-9 alone. Our study has several limitations to be considered. First, we merged the sample sets A and C to keep the degree of random error as low as possible in our modeling analysis (Knottnerus and Muris 2003). However, the ''external'' validity of the results could not thus be evaluated (Ransohoff and Gourlay 2010). Therefore, subsequent studies are necessary in order to assess the generalizability of our predictor models. Second, due to high preanalytical standardization and refinement of our bioinformatic methodology, the variability of the analytical method itself might have become the main source of bias.
With our study design and evaluation strategy, we probably have reached an analytical boundary, that still requires substantial improvements (Hori and Gambhir 2011). Therefore, new analytical techniques are necessary to reach both, superior sensitivity and stability at the same time. The third limitation of the study originates from the strong penalization of our Bayesian modeling approach: The predictor panels generated by the mlogitBMA procedure were both two-component panels consisting of CA 19-9 and another variable. Especially in the case of PC-based modeling and the selection of the second PC while leaving out the first, a considerable amount of selectivity might have been lost. On the other hand, the amino acid concentration-based model was superior in selectivity [without taking misclassification costs into account (Klawonn et al. 2011)], suggesting that PCs might not serve as optimal modeling variables when Occam's razor is strictly availed. Finally, rather than proposing a superior diagnostic metabolite model or ''meta-marker'' our results suggest that our bioinformatic framework combined with a methodology refined to sufficient sensitivity and stability might provide a valuable diagnostic tool for metabolic profiling even in the three-class differentiation dilemma of health, inflammation, and malignancy.

Short summary
Multi-marker panels have been suggested to improve the selectivity of pancreatic cancer diagnostics and its differentiation from various benign lesions. However, a comprehensive framework for the statistical evaluation of marker panels in a multi-class setting has not yet been established. Using a disease model encompassing pancreatic cancer, pancreatitis, and healthy controls, 106 standardized serum samples, and metabolic profiling, we generated models to discriminate between the three study groups.
Multi-marker models are superior to the conventional tumor marker CA 19-9 in simultaneously differentiating between pancreatic cancer, pancreatitis, and healthy controls.
Our comprehensive bioinformatic approach provides a novel framework to address a common diagnostic challenge, and thus paves the way for biomarker validation in a clinical three-class setting.