Identification of novel mutations in FFPE lung adenocarcinomas using DEPArray sorting technology and next-generation sequencing

Formalin-fixed paraffin-embedded (FFPE) tissues are utilized as the standard diagnostic method in pathology laboratories. However, admixture of unwanted tissues and shortage of normal samples, which can be used to detect somatic mutation, are considered critical factors to accurately diagnose cancer. To explore these challenges, we sorted the pure tumor cells from 22 FFPE lung adenocarcinoma tissues via Di-Electro-Phoretic Array (DEPArray) technology, a new cell sorting technology, and analyzed the variants with next-generation sequencing (NGS) for the most accurate analysis. The allele frequencies of the all gene mutations were improved by 1.2 times in cells sorted via DEPArray (tumor suppressor genes, 1.3–10.1 times; oncogenes, 1.3–2.6 times). We identified 16 novel mutations using the sequencing from sorted cells via DEPArray technology, compared to detecting 4 novel mutation by the sequencing from unsorted cells. Using this analysis, we also revealed that five genes (TP53, EGFR, PTEN, RB1, KRAS, and CTNNB1) were somatically mutated in multiple homogeneous lung adenocarcinomas. Together, we sorted pure tumor cells from 22 FFPE lung adenocarcinomas by DEPArray technology and identified 16 novel somatic mutations. We also established the precise genomic landscape for more accurate diagnosis in 22 lung adenocarcinomas with mutations detected in pure tumor cells. The results obtained in this study could offer new avenues for the treatment and the diagnosis of squamous cell lung cancers.


Introduction
Formalin-fixed, paraffin-embedded (FFPE) tissues are used for diagnostic purposes in patients with cancer because FFPE tissues are well-stained immunohistochemically and are storable at room temperature which is a convenient and cost-effective environment (Greytak et al. 2015). Nextgeneration sequencing (NGS) technology through FFPE tissue has also been attempted to use as a valuable tool for cancer genetic diagnostic purposes (Einaga et al. 2017;Ying 2016). However, there is a huge obstacle in obtaining the accurate NGS data from FFPE tissue, which is difficulty in identifying the somatic and tumor-specific variants in the FFPE tissue due to sequencing artifacts, the lack of normal samples, and heterogeneities in FFPE tissue (Bernstein et al. 2002;Do et al. 2013;Wong et al. 1998). Therefore, NGS data from FFPE tissue is insufficient for assessing the risk of cancer (Petersen et al. 2016). To date, a traditional method such as Sanger sequencing of blood, saliva, and buccal smear has been used to diagnose cancer. The hematoxylin and eosin (H&E) staining slide is reviewed by a pathologist (Snow et al. 2014). However, recent studies have shown that pure tumor cells and pure stromal cell are sorted from blood cells and live cell lines through Di-Electro-Phoretic Array system (DEPArray system) based on the electro-kinetic principle (Fabbri et al. 2013;Fuchs et al. 2006). Additionally, this technology enables the pure tumor cells be sorted from small clinical samples and samples with low tumor cellularity such as FFPE samples (Bolognesi et al. 2016) and can be an efficient research method to avoid bias from heterogeneity of FFPE samples of adenocarcinoma which is the most common type of lung cancer (Calvayrac et al. 2017;Dunne et al. 2016). Although many laboratories have researched for lung adenocarcinoma, most of them have stored the FFPE samples due to difficulty in collecting fresh lung adenocarcinoma tissues and FFPE is the standard method for preserving the most archived pathological specimens for the long-term (Lin et al. 2009). Therefore, development of a new technology is needed for analyzing greater quality of examination to make a more accurate diagnosis of lung cancer in FFPE samples. Here, we performed pure tumor cell isolation from FFPE samples via DEPArray technology and demonstrated more precise genetic analysis using genetic variants from the sorted pure cells.

Materials and methods
Information of 22 FFPE lung adenocarcinoma samples FFPE lung adenocarcinomas were obtained from Korean patients of Seoul National University Hospital in South Korea. The storage time was between 12 and 61 days. Twenty-two FFPE tissue sections (50 μm thickness) were obtained from lung adenocarcinoma tissue block using a standard microtome. After dissociation, the number of the total cells was between 39,000 and 675,000 (Supplementary Table 1). After sorting process via DEPArray system (Silicon Biosystems, Bologna, ITALY), pure tumor cells (100-300), pure stromal cells (100-300), and other minority putative tumor cells (50-90) were isolated from the dissociated cells from 22 FFPE lung adenocarcinomas ( Supplementary Fig. S1).
Cell isolation from FFPE samples FFPE tissue sections (50 μm thickness) were washed with 10 ml of 100% xylene for 10 min at room temperature. After three times washing with xylene, the samples were rehydrated with 100% ethanol, 70% ethanol, 50% ethanol, and Milli-Q water. After the deparaffinization processes, samples were kept with heat-induced antigen retrieval (HIAR) solution (10 mM sodium citrate buffer) for 5 min at room temperature and for 1 h at 80°C. Then, the samples were cooled down for 20 min at room temperature and washed with 10 ml of RPMI 1640 (Gibco) at room temperature. After the processes, the samples were dissociated with dissociation buffer (0.1% collagenase Ia (Sigma), 0.1% dispase (Life tech), RPMI), and then filtered with 100-μm mesh nylon filter into 15-ml tube. The samples were washed with ice-cold PBATw (0.05% tween 20, PBS, 1% BSA).
For sorting process, 5000~10,000 stained cells were loaded into DEPArray system and were analyzed to isolate pure cells via the software of DEPArray system. Keratin−/Vimentin+ population, Keratin+/Vimentin− population, and Keratin+/ Vimentin+ population were gated and sorted by DEPArray system for pure cells (Keratin−/Vimentin+ population, pure stromal cells; Keratin+/Vimentin− population, pure tumor cells; Keratin+/Vimentin+ population, other minority putative tumor cells).

Targeted sequencing
The next-generation sequencings were performed by using the Ion AmpliSeq Cancer Panel v2 (Life Technologies) that can detect 2800 COSMIC mutations of 50 oncogenes and tumor suppressor genes.
The Ion Torrent Libraries were prepared with the Ion Ampliseq library kit 2.0 (Life Technologies), quantified by the Qubit dsDNA HS Assay kit (Life Technologies), and the sizes of libraries were analyzed with Agilent Bioanalyzer 2100 system. The enrichment process for libraries was performed using the Ion Personal Genome Machine (PGM) Template OT2 200 Template Kit and the Ion One Touch 2 instrument. The prepared libraries were pooled on a 316™ Chip (Life Technologies) per six libraries and sequenced the Ion Torrent Ion Personal Genome Machine (PGM) system™ (Life Technologies). All procedures for targeted sequencing for the Ion AmpliSeq Cancer Panel v2 (Life Technologies) were conducted according to the manufacturer's protocol.

Data analysis
The sequenced data were processed with Torrent Suite 4.4.3 and were aligned to the Homo sapiens hg19 reference genome. Variants were generated by the Torrent Variant Caller and annotated by Annovar (Wang et al. 2010) that used databases such as dbSNP138 (Smigielski et al. 2000), clinvar (Landrum et al. 2016), 1000 genomes, polypen2, the exome aggregation consortium (EXAC), and sorting tolerant from intolerant (SIFT) algorithm (Ng and Henikoff 2003). The variants were visually validated by using The Integrative Genomics Viewer (IGV) (Robinson et al. 2017;Thorvaldsdottir et al. 2013). False-positive variants were excluded because they were found in misalignments.

Somatic mutation and germline mutation analysis
Somatic mutations and germline mutations were analyzed with variants called in sorted pure stromal cells and variants called in pure sorted tumor cells.

Pathway analysis
Pathway analysis was performed for genes having mutations in each tumor utilizing Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa 2002). Mutational spectra for mutated genes were screened on published papers and were manually searched the KEGG pathway database.

Summary of workflow
It is an important factor for accurate cancer diagnosis and precise treatments to detect specific variants in FFPE samples (Mafficini et al. 2014a). Attempts have been made to identify the variants in FFPE samples, but there were several obstacles because of technical issues including the heterogeneity of FFPE tissues and sequence artifacts in DNA from FFPE (Adank et al. 2006). We sorted pure stromal cells and pure tumor cells from 22 lung adenocarcinoma formalin-fixed paraffin-embedded (FFPE) blocks via DEPArray system to perform a more precise genetic variant analysis of FFPE pure tumor tissue. We respectively found variants from pure stromal cells and pure tumor cells collected from each of the 22 FFPE samples via DEPArray technology to improve homogeneity of tumor cells and to identify somatic mutations. Pure double-positive cells (keratin+/vimentin+) were also recovered from four FFPE samples to analyze cells excluding stromal cells and tumor cells in FFPE sample. We extracted DNA from sorted cells and unsorted cells. The DNA samples were sequenced with cancer hotspot panels (Life Technologies, Waltham, M A USA) on Ion Torrent PGM (Life Technologies, Waltham, MA USA). Functional effect of the variants was predicted by polypen2 and SIFT. The results for variants were analyzed to explore the heterogeneity and characteristics of FFPE samples (Fig. 1).

Heterogeneity of FFPE samples
Although FFPE samples were designed to diagnose tumors, FFPE blocks included non-tumor cells such as stromal cells. It is difficult to extract pure tumor DNA from FFPE samples. Heterogeneity of FFPE has been detected in FFPE samples previously. Significant differences in variants were displayed even in the same tumor FFPE samples (Mafficini et al. 2014). Enhancement of homogeneity in FFPE tumor samples is very important for developing targeted gene therapies. To improve homogeneity of tumor cells and to detect tumor variants for a more accurate cancer diagnosis and research, we analyzed cell populations in FFPE lung adenocarcinoma and sorted the stromal cell population (Keratin −/Vimentin+), the tumor cell population (Keratin+/ Vimentin−), and the double-positive cell population (Keratin+/Vimentin+) from 22 FFPE lung adenocarcinoma samples via The DEPArray System ( Fig. 2 and Supplementary Fig. S1). We analyzed variants in sorted pure tumor cells and sorted pure stromal cells to investigate the heterogeneity in FFPE samples and discovered 34 tumor-specific somatic variants in sorted tumor samples. We found that different mutation patterns were shown in each subgroup, sorted from FFPE samples ( Fig. 3

Improved detection of variants in sorted cells from FFPE samples
To improve the accuracy of detection of tumor variants, we isolated 100~300 pure tumor cells, and sorted pure tumor cells were sequenced for detecting variants in cancer hot spot regions. Using DEPArray technology and NGS sequencing, we identified 20 stromal-specific variants, which would cause bias for accurate diagnosis, in sequencing data of unsorted FFPE samples. We also found 34 tumor-specific variants detected in only sorted tumor cells (Fig. 3). The allele frequencies of sorted tumor cell variants were increased by 1.3-10.1 times in three tumor suppressor genes such as TP53, PTEN, and RB1 (Fig. 4a) and by 1.3-2.6 times in three oncogenes such as KRAS, CTNNB1, and EGFR (Fig. 4b).
Allele frequencies of the all gene mutations were increased by 1.2 times in sorted cells (Fig. 4c). These suggests that the more accurate mutation information was detected through DEPArray technology and NGS sequencing.

Novel mutations detected by sorted cell sequencing and characteristic of somatic mutations in lung adenocarcinomas
Thirty-four somatic mutations across 16 genes were identified in 22 pure sorted lung adenocarcinomas. Sixteen mutations of 34 somatic mutations were novel and unreported in dbSNP, COSMIC, EXAC, and 1000 genome database (Table 1). We found four novel mutations by the sequencing of unsorted cells, but revealed 12 more novel mutation by the sequencing of sorted tumor cells ( Supplementary Fig. S2). One hundred twenty-six germline mutations were also discovered, and three mutations of them were unpublished in dbSNP, COSMIC, EXAC, and 1000 genome database (Supplementary Table 3). Especially RB1 (p.I680T) of 16 newly identified somatic mutations were evaluated to deleterious in PROVEAN and SIFT Step 1: Deparaffinization & Dissociation Step 3: Loading on DEP Array Step 2: Staining with antibodies Step 5: Ion Torrent PGM Sequencing Cancer Hotspot Panel V2 Step 4: Identification Step 4: Sorting of 3 type cell population Tumor cells

Discussion
Nowadays, we have incorporated next-generation sequencing (NGS) technology from a research environment into clinical practice (Shen et al. 2015). Accuracy and precision of NGS technology are required for making a clinical diagnosis (Pinho 2017). To identify the causes and to develop strategies for prevention, diagnosis, and treatment of lung adenocarcinoma, it is very important to classify somatic variants developed in cancer based on mutagen and germline variants passed from a parent to   Table 3). However, in the case of somatic mutation analysis, we discovered 20 somatic mutations including 4 novel somatic mutations by the sequencing of unsorted cells, and 14 more somatic mutation including 12 novel mutations by the sequencing of sorted tumor cells ( Supplementary Fig. S2b). These imply that sorted cell sequencing is more accurate for somatic mutation diagnosis. These imply that germline mutations were detected fully by traditional next-generation sequencing, but tumorspecific somatic mutation, which is significant factor for cancer diagnostics, was observed more sensitively by sequencing from sorted pure tumor cells.
We found that there are epithelial-to-mesenchymal transition (EMT) sub-populations in FFPE samples. Epithelial mesenchymal transition causes embryonic development and cancer progression. Epithelial-to-mesenchymal transition (EMT), which indicates the conversion of epithelial cells to migratory mesenchymal cells, has been shown by intermediate keratin/vimentin expression ratios (Polioudaki et al. 2015), and we sorted stromal and tumor cells with vimentin antibody and keratin antibody. Further study with sorted cells as keratin/vimentin expression ratios is needed for assessing EMT characteristics in lung adenocarcinoma.  As the results of current study, DEPArray system is a very useful tool to identify mutations from small amount of tumor cells, to avoid false-positive mutation and to find the most accurate mutations from FFPE tumor samples. However, the system also has a limitation that the system is difficult to handle large number of cells from large volume of cancers because of sorting time and the expenses.
In conclusion, we successfully established precise mutational analysis of lung adenocarcinoma and identified 16 unreported somatic mutation and 10 germline mutations in block using sorted technology-applied NGS method. Newly detected mutations and our accurate mutational profiling, using sorted technology-applied NGS method, will be suitable to research main causes of adenocarcinoma and critical factors for precision medicine of lung adenocarcinoma. Additionally, characteristics of all variants were considered because somatic variants were a feature of cancer and germline variants are a cause of heritable diseases.