Background

Non-Hodgkin lymphoma (NHL) is a heterogeneous group of hematological malignancies that in aggregate constitutes the 5th highest cause of cancer mortality in the United States [1] and Canada [2]. NHL subtypes vary in presentation, survival expectation, morbidity and responses to treatment. Chromosomal translocations are so characteristic of NHL that many genes now known to be important in the development of cancer, such as BCL2 [3], were originally discovered due to their position at recurrent translocation breakpoints in NHL tumours.

During development and differentiation, the DNA of B- and T-cells is subject to double stranded breaks necessary for the rearrangement of immunoglobulin genes. Genes functioning in double-stranded break repair are involved in successfully controlling and repairing these breaks, thus protecting the genome from molecular events that could lead to cancer. This study examined four genes with key roles in maintaining genome stability: the MRN complex, MRE11, RAD50 and NBS1, and the Bloom syndrome gene (BLM). We have previously shown association with NHL of a genetic variant in H2AX, which encodes a histone involved in signalling the presence of double stranded breaks [4]. The MRN complex forms foci at sites of double stranded breaks induced by ionizing radiation or immunoglobulin rearrangements during B- and T-cell development, sensing DNA damage and initiating DNA repair [57].

The chromosome instability syndromes (reviewed in [8]) form a group of rare autosomal recessive diseases characterized by an increased risk of cancer. This group includes ataxia-telangiectasia (AT, OMIM 208900), Nijmegen breakage syndrome (NBS, OMIM 251260), Bloom syndrome (OMIM 210900) and Fanconi anemia (OMIM 227650). NBS includes an increased risk of lymphoid malignancies [9], particularly B-cell lymphoma [10, 11]. Some patients with an NBS-like phenotype have mutations in RAD50 [12]. Hypomorphic mutations in MRE11 result in an AT-like disorder (AT-LD). NBS and AT-LD share many features, including immunodeficiency and genome instability caused by failure of timely activation of cell cycle checkpoint pathways [1316].

Mutations in NBS1 cause aplastic anemia and acute lymphoblastic leukemia [17, 18]. RAD50 variants have also been associated with an increased risk of sporadic [12], but not necessarily familial breast cancer [19, 20]. MRE11 inactivation has been identified in colorectal cancer cell lines and primary tumours [21], suggesting that inactivation of the MRN complex could be a frequent event in cancers.

Bloom syndrome is also marked by a predisposition to cancer, particularly lymphoma and leukemia in young patients [22]. Although homozygous loss of Blm in mice leads to embryonic lethality, heterozygotes show increased risk of neoplasia, with augmented T-cell tumourigenesis [23]. This haploinsufficiency is supported by the increased risk of cancer in BLM heterozygotes of Ashkenazi Jewish descent [24], although there is some controversy regarding this finding [25]. This illustrates BLM's role in response to DNA damage [26], particularly during DNA replication stress [27].

While both Nbs1 [28] and Mre11 [29] null mutants are inviable in vertebrates, the hypermorphic Rad50 Smutation causes hematopoietic stem cell failure so that mice that do not die of lymphoma die of bone marrow attrition [30], highlighting the delicate balance the MRN complex exerts on cell survival. This is illustrated by the dosage sensitivity to this mutation and the bidirectional phenotypic rescue in Rad50 S/S Atm -/- mice [31], leading the authors to speculate that while mutations that cause gross chromosomal instability would have a wide array of outcomes, less severe mutations would primarily affect tissues developed from a limited number of precursor stem cells. Since the hematopoietic system is such a system, this reinforces the need to look for variants in genes already known to be associated with severe genetic disorders, with the rationale that varying degrees of mutation severity affect the spectrum of possible effects.

To systematically investigate the role of NBS1, MRE11, RAD50 and BLM in susceptibility to NHL, we carried out re-sequencing of these four genes to establish the spectrum of genetic variation in NHL cases, and genotyped 797 NHL cases and 793 controls. Just as total inactivation of a gene and attenuation of its activity lead to different phenotypes in mice, we expected that subtle variation in DNA repair genes could be pertinent to NHL risk in the general population, while complete inactivation of these genes leads to rare and severe syndromes.

Methods

Study population

The methodology has been described previously [32, 33]. Informed consent was obtained as approved by the joint University of British Columbia/British Columbia Cancer Agency Research Ethics Board. All HIV-negative NHL cases diagnosed in British Columbia from March 2000 to February 2004, residing in the Greater Vancouver Regional District and greater Victoria (Capital Regional District), aged 20 to 79 were invited to participate. Cases were reviewed and coded using the World Health Organization classification by an experienced lymphoma pathologist (RDG). Population controls were identified from the Client Registry of the British Columbia Ministry of Health and were frequency matched to cases by sex, age, and area of residence in a 1:1 ratio. 828 cases and 848 controls completed at least part of a study questionnaire; however, only those subjects with DNA available were used in this study. Table 1 summarizes the characteristics of the 797 cases and 793 controls available for analysis.

Table 1 Characteristics of the Study Population.

DNA extraction and sequencing

Genomic DNA was extracted from whole blood (in 10% of cases from a mouthwash or saliva sample) using the PureGene DNA isolation kit (Gentra Systems) following manufacturer's instructions. DNA was then quantified using PicoGreen (Molecular Probes) in a Victor2 fluorescence plate reader (Perkin-Elmer).

The genomic sequences for all genes were downloaded from the UCSC genome browser [34]. All coding and non-coding exons were sequenced, as well as 1000 base pairs upstream of transcription start. Conserved non-coding sequence regions (CNS regions) were identified using the VISTA genome browser [35]. The six most highly conserved CNS regions with at least 100 base pairs of at least 70% identity with the mouse and rat homolog were also sequenced.

Primers were selected for all amplicons using Primer3 [36]. The -21M13F (TGTAAAACGACGGCCAGT) forward or M13R (CAGGAAACAGCTATGAC) extensions were added to the 5' ends of the forward and reverse PCR primers, respectively, to allow uniform sequencing conditions. PCR and sequencing reactions were carried out as previously described [37]. Primers and conditions used in PCR reactions are listed in Additional file 1. The quality of sequencing reads was assessed using Phred [38, 39], potential variants identified by Polyphred version 5 [40] and all sequences assembled with reference sequences using Phrap [41] and viewed in Consed version 12 [42].

Haplotypes of variants with minor allele frequency (MAF) >5% in the sequence data were inferred using PHASE v2.1.1 [43, 44]. Four tagSNPs were selected for each gene using TagSNP, version 1.1 [45]. Three additional SNPs of potential functional relevance in NBS1 were also tested.

Genotyping

TaqMan® was used for all genotyping. Assays were designed using the Assays-by-Design service (Applied Biosystems). Primers and probes used are listed in Additional file 2. 10 ng of each sample was aliquoted in 384-well plates and the DNA dried down at room temperature. TaqMan reactions were carried out in 5 uL volumes as per the manufacturer's protocols. Fluorescence data was obtained in the ABI PRISM 7900 HT, after 10 min at 95°C, followed by 40 cycles of 92°C for 15 s and 60°C for 1 min. The SDS2.2 software (Applied Biosystems) was used to assign genotypes to individual samples.

Statistical Analyses

Statistical analyses were carried out as described previously [32]. Briefly, all controls were tested for deviation from Hardy-Weinberg equilibrium. Odds ratios (OR) and 95% confidence intervals were estimated using logistic regression. These analyses were conducted using SPSS version 15, with adjustment for sex, age group (categories: 20-49, 50-59, 60-69, 70+), residence (Vancouver or Victoria), and for ethnicity (Caucasian, Asian, South Asian, Mixed, Unknown/Refused) when all cases and all controls were analyzed together. Heterozygotes and rare homozygotes were combined for analysis when the number of rare homozygotes was less than five. Tests were not performed when the sum of the number of heterozygotes and rare homozygotes was less than five for cases or controls. Tests for trend were conducted when there were at least five samples in each genotype category for both cases and controls. Multiple testing correction was carried out by the false discovery rate (FDR) method [46]. Because we tested nineteen markers, the p-value of the most significant marker must be below the threshold of 0.0026 to be considered significant. The haplotypes inferred were analyzed as categorical variables and assessed for risk effect using R version 2.1.1 [47]. Haplotypes with frequency <4.5% were combined into a "rare" category.

Results

Re-sequencing for variant discovery

We sequenced DNA samples from 87 NHL cases to survey the germline genetic variation in the NBS1, MRE11, RAD50 and BLM genes in NHL patients in our population. By using recent methods [48] the number of unseen variants using data from deep sequencing projects (such as ENCODE [49]) can be estimated. Using such methods, the sequencing of 174 chromosomes in our population is expected to have revealed 99.99% of SNPs with a MAF of 1% or more, and 76% of SNPs with a MAF of 0.5% or more. Samples were derived from 74 cases with B-cell NHL and 13 with T-cell NHL (see Additional file 3). The number of amplicons bi-directionally sequenced for each gene is shown in Table 2. In total, 63 amplicons were used. On average, 91.6% of sample-amplicon combinations produced good quality reads in both directions, and 96.6% of sample-amplicon combinations produced good quality reads in at least one direction.

Table 2 Gene statistics summary. For NBS1, 7 SNPs were genotyped - 4 chosen as tagSNPs and 3 chosen for functional interest.

Re-sequencing revealed 114 variants (Additional file): 12 small deletions or insertions (10.5%), 73 (64%) transitions and 29 (25.4%) transversions. Twenty-nine variants (25.4%) were in coding regions, with 17 (58.6%) non-synonymous mutations, 4 of which were ranked as "probably" or "possibly damaging" by PolyPhen [50]. Only one of these, BLM_X13_(2603)_C/T, was observed more than once, with a MAF of 5.6%. Fifty-five (48%) variants were "singletons", meaning the minor allele was only observed once in this data set of 87 samples, or 174 chromosomes. Forty-one (36%) variants were "common", with MAF of at least 5%. 59% of variants were previously described in dbSNP (build 128) [51]; their rs numbers are included in Additional file 4. Of the common polymorphisms (MAF ≥5%), 14% were novel.

Overall, sequence variation was found at 34 of 12,352 nucleotides in coding regions (or 8 of 3,805 nucleotides in RAD50, 13 of 2,265 nucleotides in NBS1, 2 of 2,127 nucleotides in MRE11, and 11 of 4,155 nucleotides in BLM) and at 95 of 17,257 nucleotides in non-coding regions (or 21 of 5,226 nucleotides in RAD50, 32 of 3,777 nucleotides in NBS1, 20 of 3,872 nucleotides in MRE11, and 22 of 4,382 nucleotides in BLM). The K a/K s value for these four genes together is 0.6 (or 0.56 for RAD50, 0.75 for NBS1, 0.50 for MRE11, and 0.54 for BLM), indicating moderate negative selection.

Linkage Disequilibrium (LD) calculations were performed in sequence data using Haploview v4.0 [52]; singletons were excluded from these calculations. r 2 values for pairwise combinations of SNPs in each gene are shown in Additional files 5, 6, 7 &8.

Genotyping

Haplotypes were inferred using the 41 variants that were observed more than once in the sequence data, using PHASE v2.1.1 [43, 44]. The number of haplotypes inferred for each gene is indicated in Table 2. Haplotype tagging SNPs (tagSNPs) were selected using TagSNP version 1.1 [45]. Nineteen variants were chosen for genotyping and are indicated in bold in Additional file 4.

The 19 tagSNPs were genotyped in 797 cases and 793 controls, with an average genotype call rate of 97.6%. Their respective MAFs, as calculated using all 1590 samples, are in Additional file 2. The concordance of genotypes (in the 87 samples that were sequenced) between the independent methods of sequencing and TaqMan genotyping was complete; no discrepancies were found. As a quality assurance measure, we also genotyped the 19 SNPs in DNA samples from five three-generation CEPH families (purchased from Coriell Cell Repositories, NJ, USA) and confirmed that the alleles segregated according to Mendelian inheritance.

NHL association tests

We compared all European ancestry controls against all European ancestry NHL cases, all B-cell NHL, all T-cell NHL and major subtypes individually. One of the variants, MRE11_5UP_(-1456)_C/T, was excluded from analysis due to deviation from Hardy-Weinberg equilibrium in controls. Results for the two most common subtypes - diffuse large B-cell lymphoma (DLBCL) and follicular lymphoma (FL) - and results suggestive of association with Marginal Zone lymphoma/Mucosa-Associated Lymphoid Tissue (MZ/MALT) are shown in Table 3; see Additional file 9 for all results. RAD50_IVS22(+24)_A/G showed a possible association with DLBCL that was strong enough to influence the overall NHL analysis (p-trend of 0.022 for DLBCL). Another example is RAD50_IVS7(-38)_C/T in MZL, with an OR of 3.39 (95% CI: 1.48-7.75, p = 0.004).

Table 3 Regression analysis in European samples for all SNPs in selected subtypes.

Analyses for all NHL were performed separately for the Asian and South Asian cases (see Additional file 10). NBS1_3UTR_(+273)_G/A (rs1063053) gave an OR of 5.3 (95% CI = 1.023 - 27.579, p = 0.004) in samples of South Asian ethnicity.

Combined analyses of all samples from all ethnicities were also performed, adjusting for ethnicity in the model (data not shown); some SNPs (usually the same as in the European ancestry only analysis) again showed results suggestive of association but failed to reach p < 0.05 upon correction for multiple testing. The ethnic diversity of our study population could mask a real signal and so we focused on the European subpopulation.

The haplotypes inferred from individual SNP genotypes were also tested for association with NHL using R version 2.1.1 (data not shown). No haplotype was more significantly associated with NHL than the individual SNPs forming that haplotype.

Discussion

RAD50, NBS1, MRE11 and BLM were re-sequenced in 87 NHL cases to characterize the variation in these genes in NHL cases in our population. All genes had similar numbers of variants and similar nucleotide diversity, albeit slightly greater for NBS1 (Table 2). All four genes showed evidence of negative selection, as indicated by a K a/K s value of less than one (0.56 for all four genes combined), which we would expect for genes involved in such a conserved and critical process as DNA repair. The most variable gene, NBS1, also showed the lowest conservation.

Two SNPs in RAD50 were suggestive of association with specific NHL subtypes (Table 3). RAD50_IVS7(-38)_C/T was suggestive of association with MZ/MALT (p = 0.004). The low frequency of this allele (MAF 2.6%), and the low incidence of MZL/MALT (12% of our cases) make it difficult to conclusively implicate this marker in a single study. Interestingly, MZL lymphomas usually develop in tissue subjected to chronic antigenic stimulation, for example gastric MALT lymphoma which arises as a result of chronic Helicobacter pylori infection. Such tissue, with persistent and accelerated cell lymphoid cell proliferation, may be uniquely susceptible to neoplastic transformation associated with faulty DNA repair. Our results may serve to highlight specific mechanistic hypotheses for further testing in other association studies, or for in vitro functional studies. Mechanisms of tumourigenesis, and the basis for NHL susceptibility, may differ between NHL subtypes. Observations such as ours, if replicable, will help us understand the basis for the diversity of NHL types.

We did not find that variants in NBS1 conferred an increased risk of lymphoma, as in most other studies [5357], although there remain some contradictory positive reports [5861]. In contrast, non-synonymous mutations in NBS1 have been shown to be associated with acute lymphoblastic leukemia in German [17] and Polish [62] children. A study by Rollinson et al [63] of haplotypic variation in NHL found no increased risk associated with haplotypes of NBS1 and RAD50; however, they observed the variant rs601341 in MRE11 to have a protective effect on FL and a protective effect of an MRE11 haplotype on DLBCL. We did not sequence the part of intron 18 where rs601341 is located and so did not explicitly test this SNP. The difference between our results and those of Rollinson et al. could be the result of a SNP-specific effect, and/or the different populations studied.

Although there have been other studies of susceptibility to NHL looking at the genes addressed in this study, most have relied on the genotyping of rare variants discovered in studies of the rare recessive syndromes discussed above. Genotyping was generally done using single-strand conformation polymorphisms [17, 53, 54, 56, 58, 61, 62] or by TaqMan [63]. One study [63] used public databases to collect the information on the SNPs in the regions of interest. However, sequencing of germline DNA of patients with sporadic lymphoma to systematically identify genetic variants had not been previously done. Our systematic characterization of these genes provides valuable information on the variation found in these genes in individuals with NHL. Previous systematic investigations of another double-stranded break repair gene, ATM, by our group did not reveal any association between common variants in ATM and NHL or its subtypes [32]. In contrast, a common SNP in the promoter region of H2AX showed a protective effect on NHL and on FL in particular [4].

Limitations of our study include the histological heterogeneity of NHL, which is composed of many subtypes, many of which are rare. Identification of genetic susceptibility factors that differ between subtypes will be limited by the lack of availability of adequate sample numbers for less common subtypes. The clinical diversity of NHL enabled us to make the strongest conclusions only for DLBCL and FL. Our sample is also ethnically heterogeneous, and so has reduced power to detect genetic factors that are present only in specific ethnic groups. Future replication of results in the context of large international consortia, such as the InterLymph Consortium [64], will help to overcome such limitations.

Conclusion

While the genes in this study were not significantly associated with NHL independently, it is possible that they could modify NHL risk in combination with other variants. Larger studies would be required to detect such gene-gene interactions. Our observation of possible associations of SNPs in RAD50 with DLBCL and MZ/MALT lymphomas may contribute to the refinement of biological hypotheses for confirmation in larger association studies and functional studies. Mechanisms of tumourigenesis, and the basis for NHL susceptibility, probably differ between NHL subtypes. Specific observations such as these will help us understand the etiological basis for the diversity of NHL types.