Establishment and characterization of 38 novel patient-derived primary cancer cell lines using multi-region sampling revealing intra-tumor heterogeneity of gallbladder carcinoma

Gallbladder carcinoma (GBC) is a lethal biliary tract malignant neoplasm. Patient-derived primary cancer cell lines (PDPCs) are appropriate models to explore biological characteristics and potential therapeutics; however, there is a lack of PDPCs in GBC. In this study, we aimed to establish and characterize the GBC PDPCs, and further investigated the intra-tumor heterogeneity (ITH). Multi-region sampling (3–9 regions) of the operable tumor tissue samples was used to establish PDPCs. Short tandem repeat genotyping for cell authentication and karyotyping was performed, followed by whole-exome sequencing and RNA sequencing to assess the ITH at the genetic and transcriptional levels, respectively. Thirty-eight PDPCs were successfully established from seven GBC patients and characterized. ITH was observed with a median of 38.3% mutations being heterogeneous (range, 26.6–59.4%) across all patients. Similar with other tumor types, TP53 mutations were always truncal. In addition, there were three genes, KMT2C, CDKN2A, and ARID1A, with truncal mutations in at least two patients. A median of 370 differentially expressed genes (DEGs) was identified per patient. Distinct expression patterns were observed between major histocompatibility complex (MHC) class I and II genes. We found the expression of MHC class II genes in the PDPC samples was closely regulated by CIITA, while that of MHC class I genes were not correlated with CIITA expression. The PDPCs established from GBC patients can serve as novel in vitro models to identify the ITH, which may pave a crucial molecular foundation for enhanced understanding of tumorigenesis and progression. Supplementary Information The online version contains supplementary material available at 10.1007/s13577-021-00492-5.


Introduction
Gallbladder carcinoma (GBC) is one of the lethal biliary tract cancers with limited therapeutic options and unsatisfactory treatments [1,2]. Surgical resection is the preferred treatment option, but many GBC patients are not suitable for curative surgery when they are detected [3,4]. Patients with advanced or metastatic GBCs have a poor prognosis with a 5-year survival rate of less than 10% [5,6]. Among all the potential reasons, the lack of understanding regarding the tumorigenesis and progression has been proposed as a major obstacle to discovering a new strategy for more precise and effective treatment to improve GBC patients' prognosis.

3
GBCs were described, which may be regarded as the potential candidates for personalized targeted therapy. Actually, treatment development and improvement may require not only appreciation of genetic alterations but also understanding of intra-tumor heterogeneity (ITH) [14], which is a wellestablished phenomenon and may foster tumor adaptation, and drug resistance to chemotherapy and molecularly targeted agents [15]. Some studies have demonstrated the universal prevalence of ITH in many types of tumors including renal-cell carcinoma, lung adenocarcinomas and glioblastoma [16][17][18]. However, little is known about the ITH and its impact on the progression of GBC.
Patient-derived primary cancer cells (PDPCs) derived from tumor tissue samples are not only attractive for testing drug sensitivity and exploring the biological functions, but also preferable for the tumor features using genomic sequencing [19]. Moreover, previous studies on hepatocellular carcinoma and intrahepatic cholangiocarcinoma (ICC) have shown that PDPCs can enable an accurate assessment of ITH [14,19]. In the present study, we established thirty-eight PDPCs using multi-region sampling and took advantage of them combined with whole-exome sequencing (WES) and RNA sequencing (RNA-seq) to explore the ITH in GBC.

Patients and clinical samples
Patients with GBCs were enrolled in this study from August 2015 to March 2016. All patients provided fresh tumor tissue samples that were obtained from the clinical sample bank at the Department of Biliary I, Shanghai Eastern Hepatobiliary Surgery Hospital, Navy Military Medical University, and samples were collected. All the patient was diagnosed by surgical pathology. Clinical information of these patients is shown in Table 1. Based on the seventh edition of the American Joint Committee on Cancer (AJCC) staging system for esophageal cancer [20], one patient was stage IIa, four patients were stage IIIa, and two patients were stage IIIb. All patients provided written informed consent for their samples to be examined and their clinical data to be utilized. This study was approved by the Institutional Review Board of Shanghai Eastern Hepatobiliary Surgery Hospital, Navy Military Medical University (No. EHBHKY2015-02-010).

Establishment and culture of PDPCs
The spatially distinct multi-regions of the operable GBC tissue samples were collected from seven patients and PDPCs were established by following the standard procedures. In brief, fresh tissues were washed with sterile phosphatebuffered saline (PBS) and cut into 1 mm 3 pieces. Then, small pieces of the tissue were placed into 10 cm 2 dishes and incubated with DMEM/F12 medium with 10% fetal bovine serum (FBS) (Gibco, Thermo Fisher Scientific, Inc., Waltham, MA, USA), 1% non-essential amino acids, and 1% penicillin/streptomycin (Thermo Fisher Scientific, Inc., Waltham, MA, USA). The cells were trypsinized with 0.25% trypsin (Gibco, Thermo Fisher Scientific, Inc., Waltham, MA, USA) when they reached about 30% confluence and were transferred into 24-well plates for subculture. Thereafter, all PDPCs were grown in a 37 °C incubator at 5% CO 2 with DMEM/F12 medium supplemented with 10% FBS. The mycoplasma testing has been done for all PDPCs established in this study. The passage of each PDPC used in the following experiment was different and the passage information for each PDPC in the assays ranged between 15 and 25.

3
DNA Analyzer (Applied Biosystems Inc., Foster City, CA, USA). For karyotyping, exponentially growing GBC PDPCs were exposed to colchicine (0.01 mg/ml) for 16 h and then to hypotonic treatment (0.075 mol/L KCl) for 20 min. After fixation in a methanol and acetic acid mixture (3:1 by volume), cell suspensions were dropped onto ice-cold slides. Slides were then treated in trypsin for 30-60 s and stained with Giemsa. Chromosomes from at least 20 metaphases per sample were analyzed under a microscope.

WES and RNA-seq
All genomic DNA and total RNA samples were shipped on dry ice after extraction, and WES was performed as previously described [21]. DNA library preparation, and sequence capture and next-generation sequencing were performed at WuXi Next CODE (Shanghai, China). Deep sequencing of exome captured DNA was performed on an Illumina NovaSeq S4 PE150 instrument. RNA-seq was performed as previously described [22]. Randomly interrupted mRNA and the first cDNA strand were synthesized, and then a second cDNA strand was synthesized. RNA-seq library construction and hybrid capture and sequencing were performed at WuXi Next CODE (Shanghai, China). RNA-seq was performed using an Illumina HiSeq X Ten PE150 instrument.

Phylogenetic analysis and ITH estimation
In each individual, both non-silent and silent somatic mutations occurred in at least one sample were used to construct the phylogenetic tree. A binary presence/absence matrix was generated according to the occurrence of mutations across all samples in each individual. Phylogenetic trees were built across all individuals based on the binary matrix using the Penny program in the PHYLIP package (version 3.679, http:// evolu tion. genet ics. washi ngton. edu/ phylip. html) with the parsimony ratchet method. Trees were manually reconstructed; thus, making the trunk/branch lengths scaled in proportion to the number of mutations. For each individual, variants were classified as trunk, shared and private mutations if they were identified ubiquitously across all the PDPC samples, in more than one but not all samples, and in only a single sample, respectively. For the convenience of description, shared and private mutations together were further classified as non-trunk mutations. To estimate the ITH for each individual, a percentage was calculated as the number of trunk mutations divided by the number of nontrunk mutations.

Identification of putative driver mutations and analysis of mutational signatures
To identify driver mutations in GBC patients, a list of 573 candidate genes was collected from the COSMIC cancer gene census (December 2018) and recent large-scale GBC and cholangiocarcinoma (CCA) sequencing studies [7,[32][33][34]. Putative driver mutations were then identified from variants in these genes if they met one of the following criteria: (1) predicted to be nonsense, splicing, or frameshift mutations and (2) predicted to be deleterious in at least one of the SIFT and PolyPhen programs (annotated using ANNOVAR) for missense mutations.
To extract the composition of mutational signatures in each PDPC, deconstructSigs [35] was performed on somatic SNVs (both synonymous and non-synonymous) for trunk and non-trunk mutations separately. Mutational signatures were based on the Wellcome Trust Sanger Institute Mutational Signature Framework [36]. Overall distribution of mutational signatures was compared between trunk and non-trunk mutations within each individual.

Identification of somatic copy number variations (CNVs)
To identify CNVs in each PDPC, CNVkit [37] (version: v0.8.5) with default parameters was run on the mapped reads. A Panel of Normal (PON) was built as a normal control using a set of 10 blood samples from other individuals with the same library method. The initial copy number profile of each PDPC was derived using log2 ratio of read coverage between the tumor and the PON by each segment. Segments located on chromosomes X and Y were excluded from further analysis. Segment results in log2 ratio of all samples were combined together to identify regions that were significantly amplified or deleted across our cohort using GISTIC2.0 [38]. The GISTIC2.0 command line was as follows: gistic2 -b output -seg samples.seg -mk samples. marker -refgene hg19.mat -genegistic 1 -smallmem 1 -broad 1 -brlen 0.8 -conf 0.99 -ta 0.3 -td 0.3 -qvt 0.01 -armpeel 1 -savegene 1 -gcm extreme. The GISTIC2.0 peak regions at the cytoband level were then selected to examine ITH in CNVs.

Gene expression profiling
Raw reads from RNA-seq were mapped to the human reference genome hg19 using STAR [39] (version: v2.5.2b) with default parameters. RSeQC [40] (version 2.4) was run on the mapped reads to visualize the read distribution over genome features, such as CDS exon, 5′UTR exon, 3′ UTR exon, Intron, and Intergenic regions. Raw counts of each gene in RefSeq (February 2017) for each sample were calculated using featureCounts [41] (version 1.5.2). The expression profile of each sample at the count level was normalized to reads per kilo base per million mapped reads (RPKM) using an in-house R script. All profiles were combined together as an integrated expression matrix for further analysis. To investigate the ITH at the gene expression level for each GBC patient, lowly expressed genes (average log 2 RPKM < 1) were first removed within each patient. Next, the most varied genes (median absolute deviation log 2 RPKM > = 1) across all samples in each patient were selected for further analysis. A hierarchical cluster analysis was performed on these genes and samples were classified into two groups based on the hierarchical tree. Differentially expressed genes (DEGs) were then identified as fold change > = 2 or < = 0.5 between the two groups. To visualize ITH in the expression profiles for each patient, a heat map was drawn on these DEGs. To explore the potential biological relevance of these DEGs, Gene Ontology (GO) enrichment analyses were performed using R package ClusterProfiler [42].

Statistical analysis
The differences between trunk and non-trunk mutations were evaluated using proportions test with continuity correction. P < 0.05 was considered statistically significant. Statistical analysis was performed using R software, version 3.5.0 (R Foundation for Statistical Computing).

Establishment and characterization of PDPCs
In this study, we successfully established thirty-eight PDPCs derived from spatially distinct multi-regions of the operable tumor tissue samples of seven GBC patients (Table 1), ranging from three to nine PDPC samples each patient (Fig. 1). To characterize the 38 PDPCs, morphology, karyotype examination and short tandem repeats (STRs) were performed ( Supplementary Fig. 1, Supplementary Table 1 and  Supplementary Table 2). Furthermore, comparison analyses of STRs of these GBC PDPCs with those from American Type Culture Collection (ATCC) and Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ) suggested that GBC PDPCs did not match those in existing databases, Fig. 1 Flowchart of study design. Multiple spatially separated tumor tissue regions (ranging of 3-9 regions) from seven patients with GBC were sampled for primary culture. After the characterization, the established PDPCs were then subjected to WES and RNA-seq analysis for the ITH of GBC at the genetic and transcriptional levels devoid of cross contamination with other known cancer cell lines. In addition, the results of single-nucleotide polymorphisms (SNPs) showed that PDPCs from the same patient were clustered together ( Supplementary Fig. 2), which confirmed the origin of these PDPC samples.

Genomic architecture and ITH of mutations in GBC
WES was performed on the genomic DNA from 38 PDPCs. A total of thirty-eight PDPCs (range, 3-9 per patient) were sequenced, achieving a median target depth of 136X. A total of 809 non-silent mutations (affecting 734 genes) and 291 silent mutations were identified in these 38 PDPCs. To assess the ITH and clonal evolution in GBC patients, a phylogenetic tree was constructed based on the occurrence of mutations in multiple PDPCs derived from each patients (Fig. 2). The phylogenetic trees exhibited different evolution models across different patients, with a median of 38.3% of variants having spatial heterogeneity (range, 26.6-59.4%). The phylogenetic tree structure varied across GBC patients. For example, the phylogenetic tree of patient 902 had a longer branch than its trunk, whereas the one of patient 4160 displayed a much more homogenous mutational pattern. Notably, patient 902 had the highest ITH in our cohort and two PDPCs (R3 and R5) showed huge big difference compared to the other seven PDPCs. The patients harbored mutations in different candidate driver genes (displayed near the trunk of each phylogenetic tree) in the trunk branch, exhibiting distinct driven patterns across these patients.

Mutation status of putative driver genes
To better understand the cancer genome evolution of GBC, a list of 573 manually curated genes was further inspected to identify the potential driver mutations. There were 24 (24/573) genes with putative driver mutations in at least one sample of our cohort, and the majority mutations in these genes were truncal (Fig. 3). Notably, most of these genes (19 out of 24 genes, except TP53, KMT2C, CDKN2A, ARID1B, and ARID1A) were exclusively mutated in patients, indicating diverse driver events during tumorigenesis in each patient. Consistent with previous studies in other cancer types, TP53 was the most recurrently mutated gene (4 out of 7 patients) in our cohort and mutations of this gene were always truncal despite distinct mutation types across different individuals. Similarly, KMT2C (MLL3) had the same mutation frequency as TP53 and mutations in this gene were also truncal across different patients. There were two genes, CDKN2A and ARID1A, with truncal driver mutations in two out of seven patients.
Although most mutations were homogeneous across different PDPCs, a few mutations showed heterogeneity. The ARID1B mutation was truncal in patient 902 while only one PDPC from patient 1405 harbored the ARID1B mutation. A nonsense mutation of MUTYH was found in 5 out of 9 PDPCs from patient 902, while a missense mutation of FGFR4 was identified in 2 out of 4 PDPCs from patient 1405. Meanwhile, a frameshift deletion of PTEN and NF2 was discovered in 3 out of 4 PDPCs from patient 668 and in 2 out of 3 PDPCs from patient 4256, respectively.

Dissecting mutational spectra and signatures
Next, we analyzed the mutational spectra of mutations on both trunk and non-trunk to determine the dynamics of mutagenic processes in GBC. The C > T transition was the most dominant change in GBC, which was consistent with a previous study of GBC [7]. The C > A transition and the T > C transition were the second and the third dominant changes in GBC (Fig. 4a). The C > T transition is prevalent in many tumors [16,17], while the C > A and T > C changes are considered the characteristic signatures of GBC genome [7]. These results demonstrated that these established PDPCs inherited the mutation patterns from the source GBC tissues. The distribution of the six mutation classes in trunk and non-trunk mutations varied from patient to patient. The overall distribution was significantly different (P < 0.001, proportion test) between trunk and non-trunk mutations for patients 1279, 4160, 4256, 668 and 902, while no statistical differences were detected for patients 1405 and 1436 (Fig. 4a). In addition to the different distribution in the six mutation classes, differences in the 96 trinucleotide mutational signatures were also observed (Fig. 4b). We next deconstructed contributions of individual mutational signatures to each patient, and identified several dominant signatures in these tumors, including Signature 1 (associated with age), Signature 2 (associated with APOBEC), Signature 4 (associated with smoking), Signatures 6 and 15 (associated with DNA mismatch repair), and Signatures 7 and 17 (Fig. 4c). Moreover, the number of contributed signatures in non-trunk mutations was higher than that in trunk mutations for most of the patients, except patient 1436. Signature 1 was always dominant for trunk mutations across all patients, while signature compositions were quite complex for nontrunk mutations in most patients. For example, signature 6 was dominant in patient 902, signature 15 was dominant in patient 1405, and signatures 1 and 4 were almost equivalent in patient 668.

ITH of CNVs in GBC
We next analyzed the ITH at the CNV level in GBC, focusing on the cytoband level regions. The 23 and 22 CNV peaks were identified as amplifications and deletions in our cohort using GISTIC2.0, respectively (Fig. 5a, b). Notably, six oncogenes were ubiquitously amplified in at least one patient, including PTK6, HRAS, NOTCH1, CCND1, FGFR3, and TERT. CDKN2A deletion was ubiquitously identified in 4 patients. ITH varied at the CNV level across different Fig. 4 Dissecting mutational spectra and signatures. a The distribution of variant changes between trunk and non-trunk mutations. The differences between trunk and non-trunk mutations were evaluated using proportions test with continuity correction. * < 0.05, 0.001 < ** < 0.01, *** < 0.001, n.s., no significance. b Mutational signatures of all trunk and non-trunk mutations was inferred by decon-structSigs. Signatures were displayed according to the 96-substitution classification defined by the change class and sequence context. c Pie chart showed contributions of mutational signatures to each patient. Signatures were based on the Wellcome Trust Sanger Institute Mutational Signature Framework patients. For example, most CNVs (both amplifications and deletions) were ubiquitously identified across different samples of patient 1279, while the CNVs in patient 902 showed much more heterogeneity (especially for deletions) in patient 902.

ITH of the gene expression profile in GBC
Gene expression profiling is a powerful technique widely used in cancer research. To investigate the ITH at the gene expression level in GBC, RNA-seq was performed on an Illumina X Ten instrument for all the 38 PDPCs. A median of 25.7 M read pairs was obtained per sample. To explore the ITH in each GBC patient, candidate DEGs were used for visualization and analysis. A median of 370 DEGs (range, 114-693) was identified per patient according to our method (Fig. 6). Various degrees of ITH were observed in out cohort. There were 114 DEGs identified across samples from patient 1436, which indicated the lowest ITH; while patient 668 exhibited the highest

Heterogeneous expression profile of human leukocyte antigen (HLA) genes
To explore the potential biological relevance underlying these DEGs, we performed a GO enrichment analysis on these DEGs for each patient. It was found these DEGs were involved in cancer-related biological processes, such as cell motility, cell migration and cell proliferation, in several patients (Supplementary Fig. 3). Interestingly, DEGs identified from patient 902 and patient 1436 were enriched in biological processes related to the innate immune response and immune system process. Further investigation revealed that DEGs involved in these processes included several MHC class II genes, such as HLADG (CD74), HLA-DPA1, HLA-DRA, HLA-DRB1, and HLA-DRB5. We then extended to all the HLA genes (including 9 MHC class I and 16 MHC class II genes) to investigate the expression profile across all our patients. Notably, distinct expression patterns were observed in these genes across different patients (Fig. 7a). Most of the MHC class I genes (6 out of 9 genes, except HLA-G, HLA-L, and HLA-J, which were always lowly expressed) were always highly expressed across all patients, while the expression patterns of MHC class II genes were quite complex across different patients. In general, expression of MHC class II genes was much lower than that of MHC class I genes. Several genes (such as HLA − DPB2, HLA − DRB6, HLA − DOA, HLA − DQA2, and HLA − DQB2) were barely expressed across all samples, while some other genes, such as HLADG (CD74), HLA-DPA1, HLA-DRA, HLA-DRB1, and HLA-DRB5, were found to be expressed in a part of PDPCs in some of the patients. For example, two PDPCs (R3 and R5) from patient 902 did not express these genes, while the remaining PDPCs from this patient had a relatively high expression. Moreover, these genes were ubiquitously expressed across all PDPCs from patient 1405 and patient 4160. In summary, expression of these MHC class II genes varied across different PDPC samples and GBC patients, indicating the transcriptional ITH in GBC.
To investigate which gene may regulate the expression of these MHC class II genes, we next performed a paired association analysis using these MHC class II genes and the other genes in the profiling across all 38 PDPCs. The expression of CIITA was most correlated to MHC class II genes (HLA − DRA, HLA − DRB5, and HLA − DRB1, R 2 = 0.87, 0.82 and 0.79, respectively; Fig. 7b), while its expression was not correlated to MHC class I genes ( Supplementary  Fig. 4).

Discussion
GBC is an aggressive carcinoma with poor prognosis. Due to relatively low frequency compared to other cancers, the mutation spectrum of this cancer is limited [7,8]. Rare study assessing ITH of GBC has been reported. To our knowledge, this is the first time to integrate PDPC model, multiregional WES, and RNA-seq to investigate the genomic and transcriptomic ITH of GBC.
Most previous studies have adopt the sampling in different regions of tumor tissue to assess the ITH of various carcinomas [16,17]. However, low tumor purity of sampling the tumor tissue samples resulted in the impacts on the accuracy of ITH assessment [14]. PDPCs were derived from tumor tissue samples with their high purity and cell population representativeness [43]. Using PDPC models, previous studies identified a median of 60.3% ITH index in ICC [14] and a mean 39.7% ITH index in hepatocellular carcinomas [19]. Several previous studies have reported the PDPCs of GBC [44][45][46][47][48][49], but none of them explored the ITH of GBC. In this study, we successfully constructed 38 GBC PDPCs, which is the most in all the GBC reports, from the tumor tissue samples of seven patients. Overall, a median 38.3% ITH score of GBC was identified in the PDPC model for the first time.
A limited number of putative driver mutations were identified in each GBC patient, and majority of the mutations were exclusive to patients. TP53 mutation was ubiquitously truncal across the PDPCs of four out of seven GBC patients. In addition, there were three genes, KMT2C, CDKN2A, and ARID1A, with truncal mutations in at least two GBC patients. Besides that, we found that 12 CCA PDPCs derived from three patients were identified to carry the mutations of recurrently putative driver genes including TP53, KMT2C, ERBB2, and JAK3 ( Supplementary  Fig. 5). The mutational results suggested that our GBC and CCA PDPCs recaptured some genomic characteristics of the tumor tissue samples of patients [7][8][9]. Therefore, in addition to describing ITH features, these PDPCs have the potential to be powerful tools for the studies on drug sensitivity or resistance as well as functional researches.
Transcriptomic analysis has been widely used in cancer studies through a single-sampling approach [50,51]. However, little is known about the intra-tumoral diversity at the expression level. Herein, we identified transcriptional ITH in GBC. GO enrichment analysis with heterogeneously expressed genes across different samples revealed several biological processes related to tumor development. Notably, the expression patterns of HLA genes were found to be highly diverse from patient to patient. As already known, MHC class II genes are constitutively expressed in antigenpresenting cells (APCs). However, the expression of these genes is undetectable or very low in tumor cells. As previously reported [52], the expression of these genes was highly correlated to CIITA in our cohort, which confirmed the regulatory role of CIITA in these genes. CIITA is solely expressed on professional APCs; however, its expression can be induced by interferon-γ. In a previous study, HLA class II gene expression was identified as a favorable prognostic marker in colorectal carcinoma [53]. Thus, our study provides a clue for the prediction of prognosis in GBC patients.
Although the GBC PDPCs were successfully established and were used to clarify the genomic and transcriptomic ITH, the epigenetics and metabolic ITH were not figured out in this study. In addition, there are several studies on ITH and the drug sensitivity [14,19,54]. The relationship between these ITH of GBC and drug sensitivities was not explored. Moreover, we could not explore the association between these ITH and the prognosis of GBC patients due to the lack of therapeutic and prognostic information. Several studies have reported that ITH was considered to be associated with other tumor prognosis [55][56][57]. Thus, treatment and prognosis information is required to create a comprehensive picture between GBC ITH and prognosis.

Conclusion
Collectively, we have established and characterized the 38 PDPCs from seven Chinese patients who were diagnosed with gallbladder carcinoma. In addition, we integrated GBC PDPC model, WES, and RNA-seq to reveal the ITH at the gene mutation and transcription levels, which may provide an important molecular foundation for enhanced understanding of tumorigenesis and progression in GBC. These established PDPCs derived from GBC patients can serve as new in vitro models for investigating ITH.