Introduction

Cancer cell clones evolve over the lifespan a tumour[13]. The selective pressures driving this clonal evolution are myriad and may include microenvironmental factors, immune system surveillance, competition with other cancer and somatic cells, and selective killing of cancer cells by surgery, chemotherapy and radiation[29]. Two features of cancer portend intense natural selection among cancer cells. The first is the observation that cancer cells (at least in the later stages of growth) experience a high rate of cell death[10]. The second is the greatly increased rate of mutations in cancer cells[1116]. For example, a recent large scale study identified mutations in 11% of protein coding genes examined over 756 cancer cell lines[17]. Many of these mutations, even if they change the resulting protein sequence of the gene product may be considered to be “passenger” mutations that do not contribute to oncogenesis[16] and are of no significance to the cancer cell[3, 12]. Indeed mutations in non-essential genes may even be adaptive to the cancer cell as they shed costly metabolic processes irrelevant to reproduction of the cancer cell[3].

The high mutation rate and rapid cellular turnover may be expected to form an intense environment for natural selection where mutations arise and are tested for functional importance through competition with other cells. Eventually, this environment may lead to the situation where many genes have been rendered nonfunctional by mutations and the subset of genes that have been important for the survival and multiplication of the cancer cells will have been preserved through constant selection of functional versions of these genes.

Evolutionary biologists have identified a number of methods for detecting molecular evidence of natural selection[18]. These, so-called “tests of selection” attempt to differentiate neutral evolution (i.e. genetic drift) from Darwinian selection. One commonly used method compares ratios of synonymous and non-synonymous base substitutions. This approach has the advantage of being robust with regards to population growth[18], a confounding factor particularly important in the context of cancer cell growth. Synonymous base substitutions change the exonic base pair sequence but conserve the translated amino acid sequence (because of the degenerate nature of the DNA code). In contrast, nonsynonymous base pair substitutions change both the base pair sequence as well as the translated amino acid sequence. An increased rate of synonymous to nonsynonymous base substitutions provides evidence that the base sequence in question is or has been under natural selection to conserve the amino acid sequence (purifying selection). Less commonly, a sequence may exhibit an increased rate of nonsynonymous to synonymous base substitutions, indicating the base sequence in question has been under natural selection to change the ancestral amino acid sequence (diversifying selection). Perhaps the best described example of this is the diversifying selection shaping the peptide binding grooves of MHC class I molecules[19]. We might expect that the majority of selection pressures on cancer cells would be in the form of purifying selection to maintain the function of essential genes. However it is also possible that diversifying selection also plays a role in cancer cell evolution, possibly in facilitating the exploitation of new microenvironments.

Here we test the hypothesis that due to the high mutation rates and increased cell turnover in cancer cells, genes of importance to the survival of the cancer cell should show molecular evidence of natural selection. Furthermore, we predict that in the majority of cases this selection would be in the form of purifying selection.

Materials and methods

As an initial test of this hypothesis we obtained cancer-derived DNA sequences from GenBank using the search parameters “carcinoma expression library", "cancer-associated transcript”, "tumour-associated transcript" and “Homo sapiens”. We did not attempt to obtain an exhaustive list of all available transcripts but rather sought a convenience sample of different genes where at least two different examples of the same gene sequence from cancer tissue could be obtained. We did not include animal model-derived sequences or experimental cell line sequences. To determine if these genes show natural selection in non-cancerous tissues, Genbank was again used to find non-cancer versions of the same genes. In cases where we could not locate two non-cancer sequences from among the GenBank entries, we isolated the relevant sequences from the NCBI reference sequences primary and alternate assemblies. The sequences used in this study are all publically available from NCBI; the sequence references are given in Table1.

Table 1 Gene sequences used in the analyses

Analyses were performed using the Molecular Evolutionary Genetics Analysis (MEGA) software Version 5[20]. Following sequence alignment using the ClustalW method, the Nei-Gojobori Z-Test of Selection[21] was used to calculate the synonymous to nonsynonymous base substitution rates and the associated statistical probabilities. P-values of less than 0.05 were considered significant.

Results

A total of 46 cancer-derived genes represented by 139 sequences were identified (Table1). No sequences were derived from propagated cell lines. However, we were unable to determine what proportion of examples were from primary tumors vs metastatic tumors. Of the 46 genes, nine genes showed evidence of purifying selection and 1 showed evidence of diversifying selection (Table1). Six genes showed molecular evidence of selection only in cancer associated sequences (all in the form of purifying selection), four genes showed evidence of selection only in non-cancer associated sequences (three cases of purifying selection and one case of diversifying selection), and finally four genes showed molecular evidence of selection in both cancer and non-cancer associated sequences (three cases of purifying selection and one case of diversifying selection; Table1). Table1 also gives the GenBank accession numbers for all sequences used as well as sequence divergence estimates (p-distances) and the results of the Nei-Gojobori Z-tests of selection.

If signatures of selection become more common as mutations accumulate in a cancer-associated sequence, we might expect to see greater nucleotide divergence estimates in examples showing significant selection. To test this, we compared p-distances in the 10 examples showing molecular evidence of selection in the cancer associated sequences with the 36 examples not showing evidence of selection in the cancer associated sequences. The mean p-distance of sequences showing evidence of selection was 0.125, while the mean p-distance of sequences not showing evidence of selection was 0.082 (unpaired t-test, p=0.398).

Discussion

We describe a proof of principle test of a method of identifying molecular signatures of natural selection in cancer-derived gene sequences. We also show that in a sample of 46 genes the cancer and non-cancer derived sequences show different patterns of selection.

As a cancer grows and evolves and different genes come under selection pressure, natural selection may be expected to record evidence of this selection in the proportion of synonymous to nonsynonymous base substitutions as we have discussed here. Even if that particular gene later becomes non-functional through further mutations, evidence of prior selection pressure would be expected to persist. Thus a list of genes showing molecular evidence of selection only in cancer cells could be considered to be those genes which have been important to the survival of the cancer cell up to that point on time. In essence, this provides us with a method to determine which genes have been integral to the survival the cancer cell.

There are several potential weaknesses to our study. First, a different number of sequences were available for the various genes we examined. With a greater number of sequences we may expect a greater power to detect signatures of selection. To test such an effect we compared the mean number of sequences from genes which showed selection (3.17) to the mean number of sequences from genes which did not show selection (3.27). The difference was not statistically significant (p=0.134, unpaired t-test). Therefore, although this is a potential theoretical concern, we can find no evidence of this in our data.

Second, we do not have information about the geographic or racial origins of the individuals from whom the cancer and non-cancer gene sequences were derived. It is possible that increased variability noted for some genes could be due to these factors.

Third and perhaps most importantly, the choice of the model to calculate dN/dS as well as the test interpretation are both potentially controversial. The Nei-Gojobori method is perhaps less conservative than a maximum likelihood model but at the same time if the majority of sites in a protein evolve under purifying selection (as we might expect in a functionally essential gene in a tumour) the dN/dS statistic has reduced sensitivity to detect positive selection[22]. Moreover, the behaviour of dN/dS statistics when applied to polymorphisms within a population may behave differently than when applied to fixed mutations between species[23]. Whether cancer cells from the same tumour and/or from tumours from different individuals are sufficiently diverged to be considered analogous to different species[24] is a critical unanswered question. Therefore, because of these uncertainties, we decided to use the simple Nei-Gojobori statistic for this preliminary analysis. As major cancer sequencing initiatives begin producing whole genome sequences from paired cancer/normal samples from the same patient, this question will become more important. Further work should critically examine the optimal statistic to be used for these analyses.

Although we could not detect a statistically significant difference in the mean p-distances between cancer associated sequences showing evidence of selection and those that did not, there was a trend toward greater p-distances among the sequences showing selection and so our inability to demonstrate a difference may be a factor of the limited sample size.

Parenthetically, the process postulated here, where relentless mutation in cancer cells results in either mutational inactivation of genes or positive selection to maintain their function gives a functional explanation for why more advanced cancers invariably show what pathologists refer to as “de-differentiation”; as Mueller’s ratchet[25] removes all but the reproductively essential genes.

It will be obvious that the ability of gene sequences to display evidence of natural selection is based both on a high cancer cell mutation rate and an increased cancer cell proliferative rate which together provide the raw material on which selection can act. As these conditions likely are greater in more advanced cancers, we would expect to see greater molecular evidence of selection in later stage cancer cells. Indeed, comparison of early and later stage cancer cells could provide a roadmap of when particular genes experience selection pressure and therefore when these genes are important for tumorigenesis. Furthermore, because the molecular signatures of selection would be expected to persist for many generations of cancer cells, late stage cancers would be expected to contain a molecular record of genes conserved at essentially any stage of the clonal evolution of the cancer cell, even if that gene is no longer under selection pressure or even is no longer functional. By this line of reasoning, genes which are epigenetically silenced would be shielded from selection and may be expected to eventually be subject to loss of function mutations, even if they maintain molecular evidence of prior natural selection during tumorigenesis.

We caution that our results with regards to specific genes should be interpreted as preliminary only. Our sample was based only on publicly available sequences and encompassed a number of different malignancies making any conclusions about gene function based on these findings premature. Furthermore, this approach may not distinguish between driver genes which promote oncogenesis and non-driver genes nevertheless essential for cancer cell growth and reproduction. However, the application of previously described methods could be used to distinguish these[16, 17].

As new databases of cancer genomes become available[14, 1727], a future direction for this work will be to apply these techniques to whole genome sequences of cancer cells. This could be performed at the level of the tumour as a whole to look at genes important across a sample of tumours of the same type or it could be applied to single cells to explore the genes of importance in particular microenvironments such as metastatic deposits. This approach, combined with oncogenetic reconstruction of cancer clonal lineages using the same sequencing data could provide a powerful new tool to identify candidate genes of functional significance for potential targeted therapies as well as providing new insights into the evolutionary mechanisms of cancer cell clonal evolution.

Conclusions

Genes may be under different selection pressures within a cancer as compared to normal tissues. In this paper we proposed a method to answer the question of what genes are important to a cancer cell. The high mutation rates and rapid cell division present in cancer suggests that functionally important genes will show evidence of selection. We could therefore, in an indirect manner, observe what genes a cancer cell needs to survive. The genes that are important could then form a list of possible targets for therapeutic intervention.