Introduction

Lung cancer is the leading cause of cancer-related deaths in the United States, with an estimated 224,390 new cases and 158,080 deaths for 2016 [1]. Non-small cell lung cancer (NSCLC) is a highly heterogeneous disease accounting for 85% of all lung cancer diagnoses and is commonly divided into three subtypes: adenocarcinoma (ADC), squamous cell carcinoma (SqCC), and large cell carcinoma (LCC). SqCC is an aggressive subtype accounting for 25%–30% of NSCLC cases, making it the second most common type of lung cancer [2]. SqCC is notoriously difficult to treat as patients tend to be older, present with advanced disease, and have centrally located tumors [3, 4]. Smoking is strongly associated with SqCC requiring clinical attention to other comorbidities when considering therapeutic options [5]. Five-year survival rates for SqCC patients are reasonable in stage I and II patients following surgical resection but drop precipitously among patients with stage III or disseminated disease [6, 7].

There has been significant improvement in NSCLC patient survival over the last 20 years due, in part, to clinical deployment of targeted therapies [8]. Over the past several years, there have been important strides made using “checkpoint inhibitors” for immunotherapy of NSCLC, particularly SqCC [9]. In determining which patient subsets benefited from checkpoint inhibitor therapy, measuring of tumor and microenvironmental biomarkers also with determining the number of neoantigens created were important [10]. SqCC is not strongly associated with established lung cancer driver mutations found in ADC such as EGFR, ALK, or KRAS [11, 12]. Recent work is beginning to highlight the complexity of SqCC with somatic mutation rates higher than that seen in glioblastoma, breast, or colorectal cancers [13]. Moreover, identification of potentially targetable amplifications in FGFR1, PDGFR, MET, and PIK3CA have bolstered the notion that targeted therapy for SqCC patients will be possible [13,14,15,16]. However, the need for novel drug and immunotherapy targets and biomarkers remains unmet.

A critical issue for the management of lung cancer patients is proper histologic subtyping, which ultimately informs therapeutic choice. Better understanding of NSCLC has led to discoveries that are differentiating standard of care between ADC and SqCC patients, making proper pretreatment diagnostics essential. Surveillance, Epidemiology and End Results (SEER) data from 2012 illustrate the complexities of lung cancer pathologic diagnosis with nearly 22,000 patients in the U.S.A. per year given “unclassified” or “not otherwise specified” (NOS) diagnoses [17]. While adequate tumor material is obtained at the time of surgical resection, in nearly 70% of lung cancer cases surgical resections is not possible leading to the need of obtaining tissue for diagnostic and biomarker studies to less invasive techniques such as core needle biopsies (CNB) [18]. Although standard histology is the current gold standard for lung cancer pathologic diagnosis, molecular and immunohistochemical (IHC) approaches can aid in subtype characterization [19, 20].

Recent advances in mass spectrometry (MS) have enabled quantitative proteome analysis in tissue with high reproducibility, sensitivity, and accuracy in a comparatively short time [21]. As important tools for biomarker discovery, MS technologies could be potentially deployed to (1) facilitate clinical diagnosis of a particular NSCLC subtype, (2) elucidate pathways that are aberrantly activated, (3) identify individuals with higher risk of disease progression and recurrence, and (4) identify patients who will likely respond to specific therapies, including immunotherapy. Frozen tissue from resection or CNB would be ideal for proteomics analysis, but is not always available. Most clinical tumor samples, including CNBs, are preserved either in OCT for immediate pathology diagnostics use or are fixed in formalin and paraffin embedded (FFPE) for immunohistochemistry and long-term storage [22, 23]. However, formalin fixation produces inter- and intra-cross-linked proteins and peptide modifications that could compromise subsequent mass spectrometry-based proteomic analyses [24]. Conservative estimates suggest that at least 15%–20% of the proteome become inaccessible after FFPE fixation [25]. A number of other studies have also provided evidence that after deparaffinization, similar proteins could be identified between frozen and FFPE specimens of the same tissues [26]. Alternatively, OCT-embedded tissues are not exposed to fixatives and may better represent the original tissue proteome. OCT compound is a cryo-preservative consisting of nonreactive ingredients, including polyvinyl alcohol (PVA) and polyethylene glycol (PEG) [27]. Proteomic studies with OCT-embedded specimens, however, are challenged by the interference of the OCT polymer with MS analysis as they compete with peptides for LC column binding during sample preparation and separation, and suppress ionization leading to reduced peptide identification [28]. Toward this end, several methods have been developed to specifically remove OCT from tissue samples for downstream proteomic analysis [27, 28]. However, proteomic analysis of OCT-embedded samples still represent a difficult challenge when total amounts of protein are limited [29].

To begin to address this problem, we developed a simple protein extraction method from OCT samples to deliver to LC-MS/MS for analysis. As an example of this, we used resected tumor tissue from a lung cancer patient whose pathologic diagnosis was “poorly differentiated squamous cell carcinoma.” To demonstrate LC-MS/MS compatibility with other clinical pathology techniques, we optimized a workflow to purify whole proteome from SqCC chunk tissue embedded in OCT compound. Using a facile trichloroacetic acid (TCA)-based protein precipitation procedure, we were able to successfully remove interfering OCT polymers from the tumor protein preparation and rapidly identify more than 9200 proteins using shotgun proteomic analysis. Comparisons of OCT-embedded chunk tissue with frozen tissue derived from the same tumor demonstrated excellent correlation in both number and identity of the detected proteins, indicating excellent recovery of the proteome. Classification of the most highly expressed tumor proteins revealed a strong cell proliferation theme, characteristic of “poorly differentiated” tumors. We then applied our TCA protocol to more clinically challenging specimens of OCT-embedded CNBs. Utilizing a tandem mass tag (TMT)-based quantitative proteomic platform [30], we successfully identified and quantified more than 5400 proteins in OCT embedded CNBs from tumor and normal SqCC tissues. Replicate TMT analysis of CNB samples showed near perfect correlation. Quantitative comparison of differentially expressed proteins between CNB-tumor and CNB-normal samples substantiated the poorly differentiated SqCC pathology diagnosis. We were able to quantify proteins relevant to squamous tumor biology and immune modulation, as well as identify potentially novel biomarkers and drug targets. Following the removal of the OCT compound using our protocol, no enrichment or systematic protein loss can be observed when comparing datasets for compartments/organelles.

Materials and Methods

Tissue Lysis

Chunk tissues (frozen or OCT-embedded tumor samples) were placed in a pre-chilled clean mortar filled halfway with liquid nitrogen, and were ground into powder with a pestle. The powder was then transferred to a tissue grinder (Fisher # K885452-0020), and was homogenized with an SDS lysis buffer (3% SDS, 10 mM HEPES, pH 7.0, 2 mM MgCl2, 0.05% v/v universal nuclease 20 U/mL). Core-needled biopsy (CNB) samples (OCT-embedded normal or lung tumor samples) were directly placed into the tissue grinder, and homogenized with the abovementioned SDS lysis buffer. The samples were centrifuged at 18,000 × g and the supernatant was collected for subsequent proteomic analysis. The normal lung CNB sample was collected from the same lobe as the tumor within 30 min after the surgery procedure. The specimen was collected such that the normal sample is far from the tumor, and the normal sample was confirmed microscopically to be free of tumor contamination. All the surgically resected samples and CNBs were from the sample SqCC patient.

Trichloroacetic Acid (TCA) Precipitation

Ice-cold 100% (w/v) TCA was added to the lysates to a final concentration of 20% TCA. After incubation on ice for 20 min, precipitated proteins were collected by centrifugation for 20 min at 16,000 × g at 4 °C. The supernatant was discarded and the pellets were washed with 500 μL ice-cold 10% (w/v) TCA. This washing step was repeated four times. The pellets were then washed with 1 mL pre-chilled acetone (–20 °C) for another four times. After the final wash, the supernatant was carefully removed and the pellets were solubilized in freshly prepared 8 M urea buffer containing 50 mM Tris, 10 mM EDTA, pH 7.5.

Protein Digestion and Sample Preparation for MS Analysis

For chunk tissue samples, proteins were reduced by adding DTT to a final concentration of 5 mM and incubated at room temperature for 15 min. Proteins were then alkylated by adding iodoacetamide to a final concentration of 50 mM and were incubated in dark for 20 min. Proteins were digested with Lys-C (Wako, VA, USA) at 1:100 enzyme/protein ratio for 2 hours, then were diluted with freshly prepared 100 mM Ammonium Bicarbonate solution to 2 M urea. Proteins were then digested with Trypsin (Promega, WI, USA) at 1:100 enzyme/protein ratio overnight at room temperature. Peptides were desalted using solid-phase C18 extraction cartridges (Sep-Pak, Waters, MA, USA) according to manufacturer’s instructions and were lyophilized.

Desalted peptides then were resuspended in 200 mM HEPES (pH 8.5) to a final concentration of 1 μg/μL. Peptides were fractioned by basic pH reversed phase HPLC (bRPLC) [31] at a flow rate of 0.2 mL/min on a ZORBAX 300 Extend-C18 column (Narrow Bore RR 2.1 mm × 100 mm, 3.5 μm particle size, 300 Ǻ pore size). Buffer A is (10 mM ammonium formate in H2O, pH 10.0). Gradient was developed from 0% to 70% buffer B (1 mM ammonium formate, pH 10.0, 90% ACN). Seventeen fractions were collected, which were lyophilized, desalted, and analyzed by LC-MS/MS.

For core-needle biopsy samples, proteins were quantified by the BCA method [30]. Proteins were reduced, alkylated, and digested using the abovementioned protocol. Three replicate analyses were performed for each condition, i.e., 126, 127, and 128 for SqCC core-needle biopsy sample, whereas 129, 130, and 131 for the normal lung core-needle biopsy sample from the same patient. For each TMT channel, approximately 15 μg peptides were reacted with the corresponding amine-based TMT six-plex reagent (Thermo Fisher, CA, USA) for 1 h at room temperature. Hydroxylamine solution was added to quench the reaction and the labeled peptide samples were combined. Then the sample was desalted and fractionated by bRPLC.

LC-MS/MS Analysis

The LC-MS/MS experiments were performed as described previously [30]. Briefly, peptides were separated on a hand-pulled fused silica microcapillary (75 μM × 15 cm, packed with Magic C18AQ; Michrom Bioresources, Auburn, CA, USA) using a 75 min linear gradient ranging from 7% to 32% acetonitrile in 0.1% formic acid at 300 nL/min (Thermo EASY-nLC system). Samples were then analyzed on an LTQ Velos Pro Orbitrap mass spectrometer (Thermo, San Jose, Cal USA) using either a top 20 CID method (for protein identification) or a top 10 higher-energy collisional dissociation (HCD) method (for TMT protein quantification). Precursor ions were selected using an isolation window of 2 Th, and dynamic exclusion was enabled in which the same precursor ion was excluded from repeated MS2 analyses for 60 s.

MS/MS spectra were searched against a composite database of the human IPI protein database (ver. 3.60, which contains a total of 80,412 entries) and its reversed complement using the Sequest (Rev28) algorithm. Search parameters allowed for a static modification of 57.02146, 229.16293, and 229.16293 Da for Cys (Carbamidomethyl), Lys (TMT), and peptide N-terminus (TMT), respectively. A dynamic modification of oxidation (15.99491 Da) was considered for Met. Search results were filtered to include <1% (at both peptide and protein levels) matches to the reverse database by the linear discriminator function [32]. Protein TMT ratios were determined using the procedure as described previously [30]. Specifically, a 0.03 Th window was scanned around the theoretical m/z of each reporter ion (126:126.127725; 127:127.124760; 128:128.134433; 129:129.131468; 130:130.141141; 131:131.138176) to detect the presence of these ions, and the maximum intensity of each ion was extracted. For each reporter ion channel, the observed signal-to-noise ratio was summed across all quantified proteins and was normalized (equal amounts of proteins were digested and labeled for the six different channels). Gene ontology (GO) analyses were performed using David and Panther databases [30, 33, 34].

Meta-Analysis of Genes Expression in Lung ADC and SqCC

RNAseq data were downloaded from the TCGA portal. The archive filenames were “unc.edu_LUAD.IlluminaHiSeq_RNASeqV2.1.13.0” for ADC and “unc.edu_LUSC.IlluminaHiSeq_RNASeqV2.1.10.0” for SqCC. These files represent level 3 (processed) data. The unzipped files consisted of 490 ADC and 489 SqCC samples, one file per sample. The FPKM values were pooled into one spreadsheet, log2-transformed, and median-normalized across all samples. Expression comparison between ADC and SqCC was performed and box plots generated with significance determined by unpaired t-test. To assess differences between overlapping RNA expression distributions for each gene in SqCC versus ADC comparisons, a Kolmogorov–Smirnoff test was applied with D- and P-values calculated.

Results

Tissue with residual OCT compound contains polymers like polyvinyl alcohol and polyethylene glycol, which can interfere with LC-MS/MS performance [27, 28]. Supplementary Figure S1 shows an example of typical polymer signals generated from direct analysis of unprocessed OCT-embedded samples (MS1 spectrum extracted from retention time at approximately 59 min). In this case, the peptide signals were completely suppressed by ions derived from OCT polymers, which are in large fold-excess.

Because OCT contains polymers that are mostly water-soluble, we reason that they could potentially be removed by the TCA precipitation method [35]. We generated lysates from surgically resected SqCC samples that were embedded in OCT, and performed TCA precipitation. The extracted proteins were extensively washed to ensure complete removal of the OCT polymers.

Shotgun LC-MS/MS proteome analysis followed by database query resulted in the identification of 9228 proteins (1% FDR at the protein level, 7339 proteins identified with two or more unique peptides) in the OCT-embedded tissue sample. As a comparison, we also performed similar shotgun proteomic analysis on a frozen sample (both the OCT-embedded and frozen specimen were derived from the same tumor), from which we identified 9637 proteins (1% FDR at the protein level, 7710 proteins identified with two or more unique peptides). A total of 8583 proteins were commonly identified between the two samples (Figure 1a). Using gene ontology (GO) analysis (www.pantherdb.org/), we observed no bias in protein recovery from any particular compartment or organelle and similar numbers of categorized proteins are recovered from either OCT embedded or frozen tissue (Figure 1b). Moreover, we observed no bias in identification of the whole proteome based on analysis of GO biological process with over 70% of proteins annotated in the top five processes, which include metabolic, cellular, biological regulation, localization, and development (Figure 1c). These results indicate that our TCA-based protocol resulted in excellent recovery of the proteome from OCT-embedded samples and did not have any appreciable effects on the number or identity of proteins detected during MS analysis.

Figure 1
figure 1

Qualitative evaluation of the TCA-based OCT-removal protocol for surgically resected tumor samples. (a) Proteins identified from a frozen surgically resected SqCC tumor, and the same tumor that was embedded in OCT. The Venn diagram shows the proteins that were commonly identified from the two types of samples. Gene ontology analysis shows the proteins identified from the frozen and OCT-embedded samples were equally represented in terms of cellular component (b) and biological process (c). (d) The number of peptides identified from several representative proteins in the abovementioned two samples. Peptide numbers were normalized by GAPDH and actin

SqCC is usually identified by its classic pathologic features, which include keratinization, intercellular bridges, which distinguish it from other NSCLC subtypes [36]. One of the highly expressed proteins identified from our proteomic analysis is KRT17 (Figure 1d), but many of the other keratins were not detected. This was not a surprising result because the patient tumor used for this study was deemed “poorly differentiated” in standard pathology analysis. Recent work from Wilkerson et al. proposed a SqCC mRNA expression-based classification schema with four subgroups comprised of primitive, classic, basal, and secretory [37]. Amongst the four proposed classes, the “primitive” tumors exhibited the most poorly differentiated features, had the lowest overall survival (OS), worst recurrence free survival (RFS), and exhibited the strongest cell proliferation mRNA signature with MCM6 as the most predictive biomarker [37]. Interestingly, MCM6 was one of the highly expressed proteins detected in this SqCC tissue (Figure 1d).

Using our MS data, we chose to examine two groups of proteins; one group related to cell proliferation, which includes MCM family members, Ki67 and PCNA, and a second group commonly associated with overexpression in squamous tumors such as KRT17, SERPINB4, SERPINB3, SERINB5, DCUN1D1, and S100A2 [36, 38,39,40,41]. The total peptide spectra count for all selected proteins from both OCT and frozen samples were normalized using GAPDH and ACTB. In this case, we observed no significant difference in the expression of these two groups of proteins between the two samples (Figure 1d). Moreover, GO analysis (David 6.7) revealed that the top 1000 highly expressed proteins (highest number of detected peptides) were enriched for translation and RNA processing, which are strongly related to cell proliferation (Supplementary Figures S2). These results indicate that the TCA precipitation procedure successfully removes OCT and does not significantly influence the number of proteins detected or the biological pathways they represent.

We then set out to demonstrate the applicability of our TCA-based protocol to more clinically challenging samples by using OCT-embedded CNB tumor and normal specimens. CNB samples preserved in OCT are difficult for quantitative proteomic analysis not only because of the purification required but also due to the minimal amount of tumor as demonstrated in Figure 2a. Specifically, after TCA precipitation, we performed BCA quantification, and found that a total of 42 μg and 166 μg protein was recovered from the CNB-normal and CNB-tumor sample, respectively.

Figure 2
figure 2

Quantitative proteomic analysis of OCT-embedded core needle biopsy (CNB) samples. (a) A photograph of the OCT-embedded CNB sample. Red and white arrows indicate the tissue and OCT compound, respectively. (b) Three technical replicate experiments were performed for each sample, i.e., 126,127, and 128 for CNB-tumor; 129, 130, and 131 for CNB-normal. The TMT reporter ion intensity (Signal-to-noise, SN) from all the peptides of a protein was summed, and was plotted, on a Log2 scale, for the technical replicate experiments. (c) Representative proteins that are overexpressed in CNB-tumor compared with CNB-normal

To assess the reproducibility of our method, we performed triple replicate analyses for each sample (channels 126,127, and 128 for CNB-tumor and channels 129,130, and 131 for CNB-normal). After TMT labeling, peptides were combined, and were subject to bRPLC fractionation. A total of 17 fractions were collected, which were analyzed by LC-MS/MS experiments. From this set of samples, we were able to identify and quantify 5411 proteins (1% FDR at the protein level, 4505 proteins quantified based on two or more peptides). We achieved excellent correlation in signal intensities between TMT-replicates of tumor tissue (126 to 127, R2 = 0.9972) and normal tissue (129 to 130, R2 = 0.9941) (Figure 2b, Supplementary Figure S3A, B).

After proper quality control analysis, we interrogated the dataset for protein expression differences between CNB-tumor versus CNB-normal. To determine the expression level for each protein, we summed signal-to-noise (SN) intensities for channels 126,127, and 128 for CNB-tumor and SN intensities of channels 129,130, and 131 for CNB-normal. Each data point for the graph in Figure 3a represents the log2-transformed value of the tumor/normal ratio for a single protein. We selected proteins that were up- or down-regulated by at least 3-fold in tumor compared with normal and performed GO analysis. Figure 3b demonstrates that proteins up-regulated in tumor tissue are enriched for biological processes such as translation, ribosome biogenesis, and DNA replication, which are supportive of the cell proliferation signature noted in poorly differentiated SqCC. Moreover, cellular component analysis for the tumor up-regulated protein indicates that these proteins represent a diverse array of cellular organelles, including membrane-enclosed lumen, nucleolus, and ribosome (Supplementary Figure S3C).

Figure 3
figure 3

Gene ontology analysis of the SqCC proteome. (a) Comparison of protein expression between CNB-tumor and CNB-normal samples (ratio converted to a log2 scale). The OCT-embedded CNB-tumor and CNB-normal samples were subject to the TCA-based OCT removal protocol. Proteins were extracted and digested. The resulting peptides were labeled with the TMT reagents, and were quantified using LC-MS/MS experiments. Proteins that are up-regulated or down-regulated by at least 3-fold are highlighted. Biological processes that are represented by these proteins are indicated in (b) (up-regulated) and (c) (down-regulated)

When we looked at the GO of the down-regulated proteins in the tumor samples (Figure 3c) we noted these proteins are strongly linked to acute inflammatory response, response to wounding, cell adhesion, and coagulation with most of the proteins associated with the plasma membrane and the extracellular compartment (Supplementary Figure S3D). Down-regulation of cell adhesion related proteins (e.g., NCAM1, NCAM2, CADM1, and DPT) have been associated with more aggressive tumors and suggests that the poorly differentiated SqCC tumor may have increased mobility, invasive capacity, and poor prognosis [42,43,44]. Another set of tumor-associated down-regulated proteins that drew our attention was related to inflammatory response and wound healing (e.g., C1QA, C4BPA, C4A, C5, C7, C8A, C9, and C3). Many of these proteins are serum proteins and their downregulation in tumor specimens may reflect the high perfusion of normal lung tissue. These results nevertheless suggest that the microenvironment of the tumor is significantly different from that of the normal lung, with important immune components differentially expressed in these two types of samples.

We then chose a panel of the highly expressed proliferative as well as SqCC marker proteins identified from Figure 1d and investigated their expression in tumor versus normal comparisons (note that all the surgically resected samples and CNBs were from the sample patient). Indeed, we found that the MCM family (see below for further discussion) as well as MKI67, PCNA, KRT17, SERPINB4, SERPINB3, SERINB5, DCUN1D1, and S100A2 were expressed more than 2-fold higher in tumor compared with normal CNBs (Figure 2c and Figure 4). The protein showing the greatest bias in tumor expression was S100A2, which has been identified previously as a potential squamous lung cancer marker [39]. TMT-MS analysis showed that the S100A2 protein level was up-regulated by more than 10-fold in the tumor. For example, we observed strong signal for an S100A2 peptide (YSCQEGDK) in the tumor sample, whereas it was virtually undetectable in the normal CNB specimen (Figure 4c). We also show similar results for two other proteins, KRT17 (Figure 4a) and SERPINB5 (Figure 4b), both of which have been strongly associated with squamous lung cancer [41, 45]. One distinct advantage of quantitative proteome analysis over standard pathology techniques is the ability to simultaneously monitor multiple closely related proteins. We successfully identified and quantified six related MCM polypeptides (MCM2 through MCM7), which are known to function as a DNA helicase complex required for DNA replication [46]. We highlight the data for MCM6 as it has been proposed to serve as a biomarker for primitive or poorly differentiated SqCC tumors (Figure 4d) [37]. Analysis of 489 SqCC and 490 ADC patient samples from the TCGA mRNA expression dataset indicate that these markers (KRT17, SERPINB5, S100A2, and MCM6) are more highly expressed in SqCC than ADC (Figure 5). Further analysis using the Kolmogorov–Smirnoff test (KS test) shows that although the expression distributions for each gene somewhat overlap, SqCC versus ADC comparisons are significantly different (Supplementary Figure S5). We believe that the identification and quantitation of these proteins highlights the potential applicability of TMT mass spectrometry as a diagnostic tool. The successful identification of predictable squamous markers motivated us to look into the data for less well characterized protein biomarkers and potentially interesting drug targets. For example, we identified DHFR (dihydrofolate reductase, target for methotrexate) as a protein that is highly overexpressed in the SqCC tumor-CNB sample (~8-fold compared with normal) (Supplementary Figure S4A).

Figure 4
figure 4

Representative overexpressed proteins identified from the OCT-embedded core needle biopsy samples. The MS2 spectra are shown for (a) KRT17, (b) SERPINB5, (c) S100A2, and (d) MCM6. Three replicate analyses were performed (126, 127, and 128 for CNB-tumor, and 129, 130, and 131 for CNB-normal). The TMT ion cluster is shown with the following channel information: channel 1, 2, 3, 4, 5, 6 for TMT 126, 127, 128, 129, 130, and 131, respectively

Figure 5
figure 5

Meta-analysis of the expression of representative SqCC-specific proteins in different lung cancer subtypes. RNA expression levels for (a) KRT17, (b) SERPINB5, (c) S100A2, and (d) MCM6 were extracted from publicly available TCGA datasets on lung SqCC, ADC, and SqCC matched normal tissues, and are shown as box and whisker plots (*** = unpaired t-test, P < 1E–15)

Considering the strong correlation of smoking with SqCC along with the proliferative signature noted in poorly differentiated (primitive) tumors, we examined proteins involved in DNA damage repair [47].We focused on proteins up-regulated by more than 3-fold) and discovered four proteins (FEN1, POLA1, SOD2, and TRIP13) involved with double-strand break repair (P-value = 0.066, data not shown). From this group, we provide more data on TRIP13 (Supplementary Figure S4B), which is a protein that has recently been shown to bind to DNA-PKcs and mediate non-homologous end joining (NHEJ) repair of DNA-strand breaks. Overexpression of TRIP13 is associated with aggressive tumors, and may contribute to platinum-resistance in squamous head and neck cancers [48].

Finally, to extend these findings, we further examined the top 200 most overexpressed proteins in tumor versus normal comparison and entered them into the Drug Gene Interaction Database (DGIdb, http://dgidb.genome.wustl.edu/). After curating the list, we chose six examples of genes and their associated drugs, some of which are already used in the cancer clinic and some that are in various stages of clinical development (Table 1). As an example of a novel squamous drug target identified in this screen, we focused on KIF11, which is involved in spindle dynamics during mitosis [49]. KIF11 mRNA is also overexpressed in TCGA comparison of SqCC to ADC (data not shown). The effectiveness of drugs targeting kinesin motors like ispinesib and filanesib are being explored in clinical trials currently [50], but we are not aware of any publications discussing these drugs in lung SqCC.

Table 1 Potential Druggable Proteins that are Overexpressed in the SqCC Tumor Sample (OCT-embedded core needle biopsy samples)

Discussion

In this study, we provide a simple method for purifying whole proteome from OCT embedded lung tumor tissue. The use of OCT embedding in the cancer clinic and surgical suite is common because of its simplicity and the ability to aid in the delivery of quick diagnostic information about unknown tissue masses. To our knowledge, the long-term tissue and molecular stability of OCT tumor-embedded specimens has not been well studied but it is believed that they are stable for reasonable durations if care is taken to keep them frozen [23]. OCT-tumor embedded specimens are not widely used by pathologists who need to perform general H and E staining and downstream IHC subtyping analysis as FFPE produces superior results for sectioning and staining. Tumors fixed and embedded in FFPE have proven to be very stable over long durations (decades) and have been successfully used in many archival genomics studies. However, for characterization of whole proteome, the inherent nature of fixation becomes problematic in terms of protein recovery and extensive chemical modification of native proteins. A number of attempts have been made to characterize the proteome from FFPE using LC-MS/MS, with varying success [51,52,53]. Conservative estimates suggest 15%–20% of the proteome may be inaccessible after formalin fixation [25]. With regards to OCT-embedded specimens, which do not involve any chemical cross-linking reagents, whole proteome analysis could be complicated by the large amounts of soluble polymers present in OCT [27, 28]. We have addressed this issue by developing a pipeline that allows for quantitative proteome analysis using a simple TCA-based OCT removal and protein purification scheme.

Implementation of our protocol resulted in identification of more than 9200 and 5400 proteins from OCT-embedded surgically resected and CNB samples, respectively. We demonstrated excellent concordance with respect to protein identity and intracellular localization between frozen SqCC tumor tissue and OCT-embedded tissue from the same patient. Pathology (H and E and IHC) subtyped this patient’s SqCC tumor as “poorly differentiated,” which has been noted for strong cell proliferation signatures in other studies. MS examination of the most highly expressed proteins in tumor tissue was in agreement, as many of the proteins are suggestive of high proliferative capacity.

Because resected tumor is not always available, we challenged the detection limits of our platform using OCT-embedded core needle biopsies of both tumor and normal tissue. Using a quantitative TMT platform, we were able to identify and quantify more than 5400 proteins from OCT-embedded tumor and normal tissue. Moreover, we were able to demonstrate that results from our TCA-based protein purification protocol led to the identification of predictably up-regulated squamous specific proteins as well as potentially novel squamous related proteins. It is important to note that although the magnitude of change as measured by the TMT assay could be suppressed due to co-isolated ions, the proteins defined as regulated using the TMT approach are usually undistorted [54].

The most recent report from the World Health Organization (2015) has begun to recognize the inherent heterogeneity in lung cancer and the need for further classification of lung cancer based on expanded IHC for small biopsies, cytology, and resected specimens [55]. Publication of these new guidelines (the first since 2004) is tacit recognition of the rapid discoveries regarding the heterogeneity and targeted therapeutic options being discovered in lung cancer [56]. This means that more high quality, in-depth pathologic diagnostics will be required for lung cancer since that is the clinical basis for determining therapeutic regimen. Although IHC is and remains the gold standard for clinical diagnostics, the results of our analysis are encouraging in terms of moving forward with the use of LC-MS/MS as a future tool that could aid in pathology and diagnostic analysis. In this regard, the ability to detect a larger panel of proteins in a high throughput manner, while difficult for standard clinical pathology, is a distinct advantage of the proteomic technologies, which could potentially provide more information with regards to tumor biomarkers and diagnostics.