Introduction

The glioblastoma (GBM) is a high-grade astrocytoma and is one of the most aggressive brain tumors. In most cases it is diagnosed only after clinical symptoms become apparent and depends on histological investigation of tumor samples obtained by biopsy or resection. Gene and protein expression profiling of astrocytic tumors, including cDNA microarray gene expression profiling of brain tumor tissue and cell lines [111] and quantitative proteomics analysis [12, 13], have recently been proposed as additional and refined diagnostic criteria. The analysis of microvesicles of glioma cells in culture revealed RNA and proteins that are potentially useful biomarkers [14]. Secreted proteins coded by astrocytoma-expressed genes may be detectable in the circulation and are thus candidates for noninvasive verification of diagnosis. Serum biomarkers have been detected by analysis of gene expression profiles of non-brain tumors, with promising results [1517]. A few secreted proteins, including soluble CD95, YKL-40, frizzled, serum protein kinases, apolipoprotein E, cell adhesion molecules, and angiopoietins-1 and 2, have been proposed as potential serum biomarkers of astrocytomas [1827]. None of these proteins alone was sufficiently specific and sensitive to serve as a diagnostic marker.

Because previous attempts to find surrogate serum markers for GBM failed when based on a single or only few candidate factors, we attempted to analyze multiple factors to identify profiles associated with GBM. We screened astrocytoma gene and protein expression data bases for extracellular and secreted proteins differently expressed in astrocytoma. The serum concentrations of the identified proteins were determined for GBM patients by use of standard ELISA techniques. To identify a protein concentration pattern as a predictive model for the presence of a GBM, we chose to use unsupervised data mining as a first identification tool because it enables recognition of structures in the available data independent of a preset hypothesis [28]. Because of the limited sample size, we also used standard statistics for data analysis. The data show that serum profiles can be defined which are associated with GBM diagnosis and survival within the analyzed sample.

Methods

The concentrations of 14 secreted candidate proteins were identified from data banks on astrocytoma gene expression. Tumor-specific protein expression was verified on tissue microarrays by immunohistochemistry. The concentrations of the protein were analyzed in the serum of 23 patients with GBM, and 12 control subjects. The data were profiled and associated with the disease by data mining and confirmed by bootstrapping. Filters were developed for serum protein profiles that enable diagnosis of GBM and prediction of survival of more or less than 15 months post GBM diagnosis within the study cohort.

Patient recruitment

Twenty-three patients with GBM were recruited at the Departments of Neurosurgery at Massachusetts General Hospital, Boston, USA, Charité-Universitätsmedizin Berlin, Germany, Helios Klinikum Berlin-Buch, Germany, and Vivantes Klinikum Berlin-Neukölln, Germany. All tumor patients underwent gross tumor resection. Sera from nine healthy controls were acquired from the Department of Neurosurgery at Helios Klinikum Berlin. Control sera from three patients with rheumatoid arthritis treated with glucocorticoids (5–15 mg prednisolone/day) where obtained from the Department of Rheumatology and Clinical Immunology, Charité-Universitätsmedizin Berlin. The control group was age-adjusted to the study cohort (mean/median age for GBM group 55.3/58.6; mean/median age for control group 55.2/51.7). Clinical data and histopathological data for all patients were deposited in a phenotype-database (SecBase). Tumor volumes were approximated from MRI images using V(mm3) = 4/3 × π × r(sag) × r(ax) × r(cor), or from removed tumor masses, where possible. Patient data are shown in Table 1. Peripheral blood was collected before surgery and serum was separated within 2–12 h, sub-divided, and stored at −80°C until use. We determined that no significant change in serum concentrations for the tested proteins occurred in this time window. All samples and data were acquired in accordance with applicable ethical standards and approved by the local ethics review boards.

Table 1 Selected clinical data for GBM patients included in this study

Selection of candidate serum marker proteins

Genes potentially overexpressed in astrocytoma cells were identified by use of expression profiles in the serial analysis of gene expression (SAGE) database (http://cgap.nci.nih.gov/SAGE) using the Gene Expression Displayer tool as described [29]. The SAGE data for 24 astrocytoma and 12 normal brain samples were compared. RNA tags with more than a fivefold difference in their relative expression level in astrocytic tumors, and which are described by their Gene ontology (GO) annotations as coding for potentially secreted proteins, were selected. For analysis, a database was built in which the astrocytoma gene and protein expression levels were listed. GO terms were associated with each entry. In addition, clinical data, tumor features, and previous reports on the presence of specific proteins in serum, where available, were recorded in the database. The GO terms used for selection of candidate secreted proteins were extracellular/extracellular space/matrix, cell surface receptor/integral to membrane, basement membrane, cell growth/maintenance, cell–cell-signaling/cell communication, cell motility, cell adhesion, cytokine activity, and the non-GO annotation serum factor. The terms were selected for their association with extracellular location and therefore enhanced likelihood of presence in the circulation [15, 16]. Candidate serum markers were further verified using published data on gene and protein overexpression in astrocytic tumors [15, 9, 12, 13, 19, 3032].

Serum analysis

Serum samples were analyzed using commercially available ELISA kits for platelet factor 4 (PF4) (Diagnostica Stago, Asnieres-Sur-Seine, France), serotonin (IBL Immuno-Biological Laboratories, Hamburg, Germany), bone morphogenic protein 2 (BMP2), tumor necrosis factor-beta (TNF-β), stem cell factor (SCF/KITLG), insulin-like growth factor binding protein-3 (IGFBP3), chemokine (CXC motive) ligand 10 (CXCL10), interleukin-1α (IL-1α), fractalkine/chemokine (CX3C motive) ligand 1 (CX3CL1) (R&D Systems, Minneapolis, MN, USA), thrombospondin-1 (TSP1) (Chemicon International, Ternecula, CA, USA), heat shock 70-kDa protein (HSP70) (Stressgen Biotechnologies, VIC, Canada), retinol-binding protein-4 (RBP4) (American Laboratory Products Company, Windham, USA), and midkine (MDK) (CellSignals, Yokohama, Japan). Serum concentrations for fatty acid binding protein-7 (FABP7) (Sanbio, Munich, Germany) were determined by slot-blot analysis (Bio-Rad Laboratories, Munich, Germany). All samples were tested at least in triplicate. Specific concentration standards for the tested factors were always included in the assays and standard adjustment of serum samples was performed before testing.

Histology and immunohistochemistry

We compiled samples from two or three representative areas of 15 different GBM on a tissue micro array (TMA) as previously described [33]. Kidney, liver and non-neoplastic brain (temporal lobe tissue resected from epilepsy patients) specimens from different subjects served as controls. The TMA were probed with antibodies against HSP70 (Stressgen Biotechnologies), IGFBP3, TSP1 (Calbiochem, Bad Soden, Germany), BMP2 (R&D Systems), MK (CellSignals), and FABP-7 (kindly provided by Dr T. Müller, Berlin, Germany). Immunohistochemistry was performed using a BenchMark Automatic Staining Instrument (Ventana Medical Systems, Strasbourg, France). Briefly, paraffin sections were rehydrated and blocked in 10% serum. Automated cell conditioning for enhancement of antigen presentation was performed at 95°C for 8 min, at 100°C for 44 min, and at 42°C for 2 min. Primary antibodies were applied manually followed by incubation at 42°C for 30 min then subsequent automated application of biotinylated secondary antibody, horseradish peroxidase-labelled streptavidin, and DAB staining (Ventana Medical Systems) with intermediate washing steps. Finally, automated application of counterstain and bluing reagent (Ventana Medical Systems) was performed. Immunostaining was semiquantitatively scored by a histoscore index (percentage of positive cells × staining intensity) in a blinded fashion.

Data analysis

For detecting associations between the serum protein concentration data set and the clinical diagnosis and survival time data sets, the association rule discovery method [34] implemented by the software Magnum Opus of RuleQuest [35] was used. Magnum Opus employs unique k-most-interesting pattern-discovery techniques. We used the Windows version of Magnum Opus V2.3. Association rules found were selected manually to build decision trees for predictive engines [36]. In the first step all patient sets were combined to establish the decision tree (training set). The test sets are copies of the training set. Further, to validate the associations found by the applied method we performed bootstrapping, because this is generally superior to ANOVA for small data sets [37]. In this step we subsequently excluded one case from the training set and rebuilt the decision tree with the reduced training set. The excluded case was then used to test the reduced training set. The bootstrap results were obtained by repeating this procedure for all cases of the data set. They represented a well-validated and solid outcome.

For statistical analysis of serum protein concentrations for each of the 14 single candidate proteins, a t test was applied. The validity of the test was calculated by use of Fisher’s exact test.

Results

Identification and selection of proteins potentially secreted by astrocytoma

SAGE expression data revealed 328 mRNA species highly expressed or underrepresented in astrocytomas compared with normal brain tissue. Thirty-six of these were identified as potentially astrocytoma-secreted transcripts based on GO-term assignments. Thirty-two proteins were identified by screening previously published gene and protein expression data from glioma [25, 9, 12, 13, 24, 3032]. The final pool of candidate serum markers consisted of 68 proteins. Based on the availability of suitable detection systems 14 of the 68 candidate proteins were selected (Table 2).

Table 2 Diagnostic candidate proteins selected for serum profiling in healthy and astrocytoma subjects

Serum analysis of single candidate proteins

Analysis (t test) of serum protein concentrations for each of the 14 candidate proteins revealed raised serum concentrations in GBM patients compared with controls for HSP70 (p = 0.09), RBP4 (p = 0.02), serotonin (p = 0.04), and SCF (p = 0.03), and reduced serum concentrations for CXCL10 (p < 0.001). Slightly different serum levels were found for BMP2, TSP1, MDK, PF4, and CX3CL1 (p < 0.1). No differences were found between the two groups for IGFBP3, IL-1α, and TNF-β. For FABP7, results were not reproducible with the antibodies used. There was no correlation between the size of the tumor at diagnosis and the serum protein levels of any of the tested peptides, based on the limited number of subjects with available tumor volume (Table 1). No significant serum protein concentration difference was found between patients with rheumatoid arthritis (RA) and healthy subjects for the proteins tested (data not shown).

The GBM group was subdivided into two groups comprising patients surviving more than 15 month (n = 12) and those surviving less than 15 month post surgery (n = 11). Significant differences in serum protein concentration between these groups were found for TSP1 (p = 0.001) and IGFBP3 (p = 0.03) (t test).

Data mining analysis reveals protein profiles in GBM serum

Non-supervised data mining was used to propose potential diagnostic serum protein profiles consisting of at least two proteins. Thresholds of serum protein concentrations for maximum differentiation between GBM and control group were: 208 pg/ml (BMP2), 0.24 ng/ml (HSP70), 3.8 μg/ml (IGFBP3), 32.8 μg/ml (TSP1), 33.9 μg/ml (RBP4), 299 ng/ml (MDK), 1.1 ng/ml (CX3CL1), and 65.5 pg/ml (CXCL10). Except for CXCCL1 and CXCL10, protein concentrations above these thresholds were associated with a GBM. No thresholds were found for the concentrations of the remaining proteins (serotonin, SCF, MDK, FABP7, PF4, IL-1α, and TNF-β).

The defined threshold concentrations were subsequently used for identification of protein profiles by association analysis. The serum profile formed by BMP2, CXCL10, and HSP70 was associated with the clinical feature presence of GBM (Table 3, Fig. 1a). The profile correctly assigned 96% of the GBM subjects and 89% of control subjects by bootstrap validation (p < 0.0001, Fisher’s exact test).

Table 3 Potential diagnostic serum protein profile (BMP2, Hsp) associated with the clinical feature presence of a GBM
Fig. 1
figure 1

Decision trees. a Clinical feature presence of a GBM. Potential diagnostic serum protein profile BMP2, CXCL10, and HSP70. Concentrations of BMP-2 and CXCL10 in pg/ml. Concentration of HSP70 in ng/ml. b Clinical feature 15 months survival post surgery. Potential diagnostic serum protein profile TSP1, HSP70, and IGFBP3. Concentrations of TSP1 and IGFBP3 in μg/ml. Concentration of HSP70 in ng/ml. c Comparison of serum concentrations for Hsp70, IGFBP3, and TSP1 between the groups S (survival <15 months) and L (survival more than 15 months) of the study cohort

For the clinical feature 15 months survival post surgery, association was achieved with serum profiles based on concentrations of TSP1, HSP70, and IGFBP3. Whereas analysis of TSP1 alone predicts the survival chance for 80% of the GBM patients, inclusion of the other two factors increased predictability to 100%, as validated by a bootstrap algorithm (p < 0.0001, Fisher’s exact test) (Table 4, Fig. 1b, c). The two survival groups were not distinguished by different treatment schemes, because these were comparable—standard palliative therapy including radio and chemotherapy was used for both groups (Table 1). There was also no association between tumor size and survival. Although the mean age for the long-term survivor group was 50.7 years compared with 60.3 years for the short-term group (non-significant, p = 0.095, t test), the serum protein profile did not correlate with age.

Table 4 Potential diagnostic serum protein profile (TSP1, HSP70, and IGFBP3) associated with the clinical feature 15-month survival post surgery

Immunohistochemical detection of secreted candidate proteins in glioblastoma

Immunohistochemistry showed strong cytosolic expression of HSP70 and FABP7 in most GBM (Fig. 2). Some tumor cells showed nuclear presence of FABP7. Strong nuclear and perinuclear immunostaining of tumor cells was detected for TSP1, whereas IGFBP3 was expressed moderately in the cytoplasm. MDK expression was diffusely present in a minority of the tumors. BMP2 expression was negligible (not shown). In control brain sections IGFBP3 was not detectable. HSP70 was only detected in neurons and in a small number of astrocytes, TSP1 was strongly expressed and FABP7 expression was found in reactive astrocytes. Histoscore data for individual tumors and serum levels of matched samples did not correlate, even when tumor size was taken into account (not shown).

Fig. 2
figure 2

Tissue array based detection of protein expression in astrocytoma by immunohistochemistry. Magnification is ×400 for all sections. a Overview section of the tissue array used after detection of HSP70 by immunostaining. Each section represents an individual tumor or control tissue. Non-tumor control tissues are labeled with arrowheads. b, c FABP7 detection in b GBM and c control brain tissues. d, g HSP70 detection in d GBM and g control brain tissues. Arrows neurons, arrowheads glial cells. e, h IGFBP3 detection in e GBM and h control brain tissues. f, i TSP1 detection in f GBM and i control brain tissues

Discussion

This is the first study to search systematically for a diagnostic serum profile in GBM patients. Single molecular markers are mostly insufficient to follow the dynamics of diseases such as cancer, and will be replaced by multiple marker profiles [38]. The number of proteins needed for a defined profile is not known, and most likely varies between diseases. In gene expression-based diagnostics, profiles often consist of <10, but some times of more than 20 genes [32, 39]. To obtain an estimate of this number in GBM, we analyzed serum samples of a small cohort of GBM patients and controls and adopted data mining to detect protein profiles. The data mining method used here delivered a testable hypothesis.

Currently, few secreted proteins expressed by astrocytomas have been proposed as potential serum markers [1827]. When tested in serum, none of these was sufficiently specific to serve as a diagnostic marker. We evaluated the serum concentrations of 14 proteins, which were selected by screening gene and protein expression profiles of astrocytomas for proteins potentially released by the tumor cells [15]. Included were also two classes of cytoplasmic proteins associated with cell stress (HSP70) [40] and with neural stem cells (FABP7) [41]. Members of both protein classes have previously been found in serum and proposed as biomarkers [42, 43]. More generally, evaluation of the serum proteome revealed that numerous typically cytoplasmic proteins are found in serum [44].

In contrast with single biomarkers, in a complex diagnostic profile of multiple proteins it may not be necessary for the concentrations of each protein to be significantly different between two groups. Our results are in agreement with this assumption. Using the complete dataset to identify patterns of proteins by applying data mining, it was possible to associate combinations of serum proteins with the clinical diagnosis “presence of a GBM”. A profile with a relatively small number of proteins (BMP2, CXCL10, HSP70) was sufficient to correctly assign 96% of the GBM and 89% of the control subjects. One subject from the GBM and one from the control group were misdiagnosed using this profile.

The proteins constituting the identified GBM profiles are functionally diverse and not directly correlated with tumor status. The interferon-γ (IFN-γ) inducible chemokine CXCL10, concentrations of which were reduced in GBM patients, attracts CXCR3-receptor-carrying cells. Besides its possible transmitter function in the brain [45], it inhibits angiogenesis and has anti-tumor activity in vivo [46] and its reduced concentration may thus provide a growth advantage for GBM. HSP70 is an anti-apoptotic chaperone with tumor-promoting activity and has been described as overexpressed in GBM, although its overall role in this type of cancer remains unclear [47]. Increased HSP70 antibody levels have been found in serum of lung cancer patients, indicating that it may be a biomarker for several tumor types [48]. BMP2 is a member of the transforming growth factor beta (TGFbeta) family, which inhibits growth in gliomas [49] and medulloblastomas [50], whereas the latter tumors are able to secrete BMP2 themselves. On the other hand, BMP2 is linked to angiogenesis [51, 52], which is consistent with the glioblastoma being one of the most vascularized tumors in man [53]. The role of BMP2 in glioblastoma has not yet been elucidated. TSP1 has tumor-inhibitory properties because of its anti-angiogenic function. Reduced serum levels of TSP1 have been reported in small-cell lung cancer patients [54]. Upregulation of TSP1 has been shown to be expressed in microvascular hyperplasia during astrocytoma progression [55] and may explain the increased serum levels found in patients with short survival. IGFBP3 mediates p53-dependent apoptosis in GBM [56] and was associated with enhanced survival in the GBM group in this study.

Our data suggest a method for hypothesis generation and provide only a first indication of the presence of GBM specific serum profiles. Prospective and retrospective validation of the data on larger sample sizes will be necessary to evaluate and, perhaps, adjust the presented profiles further.

Conclusion

The study shows that robust serum profiles for GBM are identifiable by data mining based on a relatively small study cohort. The presence of a glioblastoma was distinguished by serum concentrations of BMP2, CXCL10, and HSP70 with a sensitivity of 96% and specificity of 89% using the decision tree. The TSP1, HSP70, and IGFBP3 serum profile could be used to assign a survival prognosis in our data set. These findings will be a basis for validation with a larger sample size and with recurrent glioma.