Introduction

It is an obvious fact that cancer cell lines bear only a partial resemblance to their origin. This issue has been addressed many times [16]; however, thus far, no practical guidelines on how to address this incongruity have been proposed, despite the fact that the basic and preclinical research is to a high extent based on cell line analysis (mostly due to the lack of a better model for a large scale use). We have already described the discrepancies in the TP53 mutational profile between surgical samples and cell lines [7]. Moreover, we have recently reported differences in the status of several tumor suppressor genes (TSGs) between glioma cancer cell lines and surgical specimens [8]. Since a better model is not available, recognizing such differences might facilitate the adequate result interpretation, and thus, may be useful in various research projects from comparative biology studies to in vitro drug testing. Therefore, we decided to perform a comparison of in vivo and in vitro mutational profiles for other tumor suppressor genes. This would be the next step in understanding the discrepancies between the surgical samples and cell lines both in the frequency of homo- and heterozygous mutations and in their effect on protein sequence and structure.

Currently, the foundations of oncology are based on Knudson hypothesis (two-hit hypothesis), according to which tumorigenesis requires the elimination of both alleles of given tumor suppressor gene [9]. However, this model does not recognize the influence of the first allele elimination on cell biology. The exceptions to this pattern, such as the single-heterozygous mutations of TP53, APC, or PTEN in tumor samples, are usually explained as a consequence of dominant-negative effect (DNE), gain of function (GOF), and haploinsufficiency. Recently, Berger et al. proposed an extension of the two hit model, the continuum/quasi-sufficiency model, suggesting that one allele elimination/mutation is sufficient to importantly influence cell biology and certain genes, such as PTEN, can be entirely eliminated only at the latest stages of carcinogenesis, since their action temporarily protects preneoplastic cells against senescence/apoptosis [10]. This extended model inspired our analyses, since we assumed that cancer cell lines represent the most advanced stages of carcinogenesis. Furthermore, various neoplastic cells require special conditions to survive in vitro, for example, those with IDH1 mutation or EGFRvIII [6, 11]. All these premises encouraged us to inquire if (and why) the status of tumor suppressor genes differs between surgical tumor specimens and cell lines.

Materials and methods

The mutation frequency and microsatellite instability (MSI) data was extracted from the Sanger Institute Catalog Of Somatic Mutations In Cancer (COSMIC) database (v61 release) which gathers information on somatic mutations taken from the literature and in-house sequencing in human tumor samples and tumor cell lines [1215]. For each gene in the database, the frequency of normal and mutated samples was extracted. Moreover, the mutated samples were divided into groups according to mutation zygosity (homozygosity, heterozygosity) and type (Table 1). Two main mutation groups were distinguished: mutations modifying or partially abolishing the protein function (mostly missense mutations) and mutations completely abolishing the protein function (mostly nonsense mutations and whole gene deletions) (Table 1). Splicing site mutations were not classified into either group because of their usually unpredictable effect on protein structure and function. The analyses were performed for each gene with Fisher’s exact test for 2 × 2 contingency tables calculated with Matlab 2010b (Mathworks) and R 2.15.1 package. The analysis comprised the comparison of homo- versus heterozygous mutations and those partially versus completely abolishing the protein function mutations in all their possible combinations.

Table 1 Classification of mutation type on the basis of the effect on protein structure and function

Results

Six hundred seventy cancer-associated genes were analyzed in 142,961 samples (137,708 tumor samples and 5,253 cell lines). The analysis of the summarized data (Supplementary tables) revealed the significant discrepancies in the frequencies of different mutation types between the surgical samples and cell lines. In oncogenes, as expected, there are almost no mutations resulting in the lack of protein and the proportions of missense heterozygous and missense homozygous mutations are retained in cell lines (exemplary data in Table 2: oncogene). In TSG, we observed two distinct recurrent patterns: a significant increase in the rate of homozygous mutations with the retention of the missense/nonsense proportion in cell lines or a simultaneous increase of the proportion of both homozygous and nonsense mutations in cell lines. TP53 is an exemplary gene following the first pattern. In tumor samples, the frequencies of heterozygous and homozygous mutations are comparable (both for missense and nonsense), while the homozygous mutations are dominant in cell lines (the missense/nonsense proportion was retained; data and results of the statistical analyses in Table 2: classical two hits). Moreover, RB1, NF1, NOTCH1, and PTEN presented similar changes. The second pattern may be exemplified by the SMAD4 gene, for which the proportion of nonsense homozygous mutations significantly increases in cell lines, while that of other mutation types decreases. For CDKN2A and APC, the mutation proportions were initially (in tumor samples) strongly shifted towards nonsense homozygous; however, the increase of this mutational type with the subsequent decrease of the other types of mutations was statistically significant. BRCA2, SOCS1, STK11, MSH6, and SMARCA4 followed such a pattern as well (Table 2: three hits). Furthermore, we observed a group of genes for which the proportion of nonsense homozygous mutations in tumor samples was similar or higher than in the previously mentioned genes in cell lines. No further increase of this proportion was observed in cell lines; however, this may imply that such genes follow a similar pattern, but the process is already advanced in tumor samples (Table 2: three hits/final stage). Finally, we observed several less commonly analyzed genes which may potentially be classified as following the second pattern; however, the number of described samples was not large enough for a reliable analysis (Table 2: three hits/small group).

Table 2 A comparison of subgroups of homo/heterozygous mutation occurrence (HO/HE) and occurrence of mutations resulting in modified protein or lack of protein (MS/NS) in human cancer cell lines and surgical samples

Since the number of cases with the reported status of zygosity was relatively low, we performed an additional analysis of the proportion of homo-/heterozygous and missense/nonsense mutations separately to verify the observed discrepancies in more numerous groups. In most genes, the results of both analyses were concordant (data in Table 3). Two of the genes initially classified as following the first pattern (PTEN and RB1) showed a statistically significant increase in the proportion of nonsense mutations in cell lines in comparison to tumor samples. On the other hand, for two genes classified as following the second pattern, (SOCS1 and MSH6) the increase of the proportion of nonsense mutations in cell lines was not observed. In the genes potentially following the second pattern, the analysis of wider groups implies the increase in the proportion of homozygous and nonsense mutations; however, the statistical analyses do not confirm its significance.

Table 3 A comparison of homo/heterozygous mutation occurrence (HO/HE) and occurrence of mutations resulting in modified protein or lack of protein (MS/NS) in human cancer cell lines and surgical samples

The analysis of microsatellite data from 810 cell lines and 720 primary samples revealed that the frequency of MSI was moderately higher in the former. High-frequency MSI (MSI-H) was detected in 66 cell lines and in 36 primary samples (8 vs. 5 %, p = 0,014). Any MSI (including high- and low-frequency MSI-L) was detected in 106 cell lines and in 69 primary samples (13 vs. 8.5 %, p = 0.036) (Table 4).

Table 4 A comparison of microsatellite instability frequency in human cancer cell lines and surgical samples. Fisher’s exact test results

Discussion

In general, it is well known that cancer cell lines cannot be seen as a direct representation of the tumors; however, the extent of the differences between them may be underestimated. A deep analysis of mutation databases offers insight into this issue from another perspective. The presented study showed that homozygous mutations of many tumor suppressor genes are significantly more frequent in cell lines than in tumor samples. Similarly, nonsense mutations of such genes occur more frequently in vitro than in vivo. The quasi-sufficiency hypothesis proposed by Berger et al. offers a partial explanation for these discrepancies [9] (justifying the higher incidence of single heterozygous mutations in tumor samples, but not the preference of cell lines towards nonsense mutations). PTEN is an exemplary gene with such characteristics. Preneoplastic cells require a heterozygous mutation of this gene (with its function partially retained) during the early stages of carcinogenesis, as the cells without PTEN activity become senescent or die. Apparently, the hyperactivation of the PI3K pathway may be unfavorable at the early stages of carcinogenesis [16]. Most authors suggest that the complete PTEN elimination requires a prior neutralization of the genes required for oncogene-induced senescence (e.g., TP53, CDKN2A) [1719].

Both missense and nonsense mutations of the PTEN gene almost equally eliminate the phosphatase activity of the protein; thus, the gradual change from missense heterozygous to nonsense homozygous would not be expected in this case [20]. On the other hand, genes such as BRCA2 and SOCS1 show the complex differences between surgery samples and cell lines—a significantly higher frequency of homozygous and nonsense (i.e., resulting in the complete lack of protein) mutations in the latter (Table 2). For such genes, we propose the “three hit” model (Fig. 1a). At the initial stages of carcinogenesis, a missense TSG mutation is sufficient/optimal through altering (yet not completely eliminating) the protein function. Next, a nonsense mutation is generated within the other allele. Finally, the allele with the missense mutation is deleted, causing the lack of protein. Other forms of in vivo changes are also possible, e.g., the missense mutation may be directly changed into a nonsense one etc.

Fig. 1
figure 1

The hypotheses potentially explaining the differences in mutational profiles. a In vivo transformation of missense into nonsense mutations; this process may occur through various mechanisms, for example, a missense mutation of one allele is followed by a nonsense mutation of the other alleles and next by a deletion of the allele with the missense mutation. Missense mutations are marked as yellow spots, nonsense mutations as blue ones. b In vitro selection of tumor specimens which may give rise to stabilized cell lines. Obviously, the mutational status of a single gene may not determine the stabilization efficiency; however, the cumulative influence of all mutations may affect the probability of the successful cell line stabilization. The color of the tumor represents the mutation type of given gene (legend in the top right-hand corner), which is further reflected by the color of the cap of the respective culture flask. Stabilized cell lines are marked as yellow bottles with underlined labels; the others are marked as orange bottles with normal labels. c In vitro selection of cells within a tumor specimen which may give rise to a stabilized cell line. Again, within a heterozygous tumor the cells with certain molecular profiles may be more predisposed to the stabilization as a cell line. The color of the cells within tumors represents the mutation type of given gene (legend in the top right-hand corner), which is further reflected by the color of the cap of the respective culture flask. Stabilized cell lines are marked as yellow bottles with underlined labels; the others are marked as orange bottles with normal labels

These data may also offer an alternative explanation, assuming that among the multiple independent carcinogenic pathways, these with nonsense mutations are more adaptable in vitro and allow for more efficient cell line stabilization. This may refer both to intratumoral heterogeneity and to differences between cases (Fig. 1b, c). Therefore, cell lines would represent only a subgroup of cells/cases observed in vivo. Here, the transformation into the more advanced tumor stages is not accompanied by the missense to nonsense change. Nonetheless, still, the nonsense mutations may hypothetically cause some biological changes associated with the ease and effectiveness of cell line stabilization, e.g., the cells isolated from tumors with nonsense mutations may be more proliferative. CDKN2A is an exemplary gene whose molecular characteristic supports the in vitro adaptation hypothesis.

These hypotheses may seem mutually exclusive; however, both may be partially responsible for the observed differences. Still, irrespective of the underlying mechanism, such discrepancies are an incentive to consider the respective genes as following the quasi-sufficiency hypothesis.

Finally, we inquired whether the defective DNA damage response and repair mechanisms might be responsible for the observed differences in mutational profiles. For that purpose, we compared the MSI detection rates in cell lines and primary samples. Although the differences were statistically significant, microsatellite instability may not be the sole explanation of the observed results, due to its generally low detection rates (Table 4). Nonetheless, it may be one of the aspects affecting the cell line stabilization effectiveness.

The database analysis provides an additional argument that average cells from a cell line are more advanced in tumorigenesis than average cells from the corresponding surgical sample, e.g., the frequencies of detected TSG mutations are much higher in cell lines; however, the more thorough analysis of cell lines has to be emphasized.

In conclusion, we report the preferential character of nonsense mutations of TSGs in the most advanced cancer cells. Apparently, during the early stages of tumorigenic transformation, the complete elimination of many TSG may be lethal or otherwise unfavorable and becomes possible only when the required cellular/molecular context is achieved. Our hypothesis of “three hits” is a modification of the “continuum” model by Berger et al. [10]. Clearly, the potential transformation of missense to nonsense mutations during carcinogenesis requires more data. Alternatively, the presented data may be explained by the hypothesis that cell lines originating from tumors/cells with nonsense mutations are more easily stabilized in comparison to those with missense mutations. The differences in the mutational characteristics of TSG between tumor samples and cell lines may indicate the lack of the appropriate in vitro representation of tumors in vivo, which is particularly important from the drug testing perspective.

This study was supported by the National Science Center Grant No. 2011/01/B/NZ4/07832. Calculations in this paper have been carried out within Upper Silesian Center for Scientific Computational Science and Engineering (POIG.02.03.01-24-099/13).