Background

Mutations in BRCA1 and BRCA2 are found in a proportion of multiple case breast cancer families. The particular mutations that are present differ from family to family indicating the marked allelic heterogeneity of these genes. The only viable methods for the identification of mutations in genes prone to such variation are sequencing or extensive DNA screening techniques. Mutation screening of these genes has become widespread despite the costs involved. In fact, 890 and 975 non-protein truncating mutations have been identified in BRCA1 and BRCA2 respectively, making it difficult to identify causal mutations due to the large number of variants. Many such variants may appear deleterious but may nevertheless only be associated with disease because they are close to a causal mutation and not deleterious in their own right. An appealing strategy is therefore to avoid the large number of comparisons required to test each variant separately, but to instead use methods based on haplotypes, which are combinations of genetic variants or alleles (typically common polymorphisms) inherited together or in phase from a single parent. They result from the phenomenon of linkage disequilibrium (LD), where alleles at closely spaced markers do not segregate independently. Haplotypes can capture more information across genomic regions of interest on the human genome than is available by examining single genetic markers one at a time [1] however the generation of haplotypes is difficult, usually requiring intensive laboratory efforts or the collection and genotyping of several closely related relatives to infer phase. An alternative is to infer phase from genotype data using the statistical techniques that have been developed rapidly in the last few years, which is the approach adopted in this report.

Haplotypes are expected to play an important role in the fine mapping of complex diseases since disease-affected individuals with common haplotypes may have recent shared ancestry of chromosomal segments that harbour disease-causing variants [2]. By identifying "disease haplotypes" that are the hallmarks of deleterious BRCA1 and BRCA2 mutations, it may be possible to identify carriers implicitly rather than by screening of the entire gene. Some early work in this area has suggested that one haplotype in BRCA1 is over-represented in individuals carrying deleterious mutations [3] while another haplotype was associated with a 20% increased risk in breast cancer [4]. There is evidence that this scenario is particularly useful for common polymorphisms of low penetrance, where results show that an association can be detected via haplotype methods using single nucleotide polymorphisms surrounding the functional allele even if the functional allele is not typed [5]. We explored extending this approach to detecting rare deleterious mutations, which are likely to have arisen more recently and occur on extended, less common haplotypes.

Methods

Study Methods

The Australian Breast Cancer Family Study (ABCFS) [6, 7] is a population based case-control-family study carried out in the metropolitan areas of Melbourne and Sydney between years 1992 and 1999. Potential case patients were identified through the Victorian and New South Wales cancer registries. Women were eligible if they were aged between 18 and 59 and had been diagnosed with histologically confirmed first primary cancer of the breast, irrespective of family history of breast cancer. Controls were women with no previous breast cancer, randomly selected from the electoral roll for which registration (and voting) is compulsory in Australia. Letters to cases were sent to both the attending doctor and patient inviting them to participate. Participation involved completing an interviewer administered risk factor questionnaire and giving a blood sample. A detailed family history was recorded for all first-degree and second-degree relatives, with verification sought for all reports of cancers in other family members. All patients provided written informed consent for participation in all aspects of the study. Sampling of cases was stratified by the age of onset, with over half being less than 40 years old. The majority of proband participants in the ABCFS are white women with northern or southern European ancestry. The protocols for ABCFS have been approved by the Human Research Ethics Committee of University of Melbourne. Details of recruitment strategy, participation, and data collection methods have previously been described [6].

Molecular methods

The coding and exon flanking sequences of BRCA1 and BRCA2 were screened as described in [7]. These analyses were performed by Myriad Genetic Laboratories, Inc. using full sequence analyses (BRC-Analysis), in house DNA sequencing and 2 D gel electrophoresis, which are fully described in [8, 9].

The criteria for defining deleterious mutations were those used by the Breast Information Core (BIC; http://research.nhgri.nih.gov/bic/ and Myriad Genetic Laboratories, Inc. and are described in more detail in [9].

Statistical methods

A smaller set of haplotype tagging SNPs (htSNP) was identified to explain the diversity of common haplotypes [10] using the Haploview software [11]. Haploview was also used to calculate minor allele frequencies for BRCA1 and BRCA2 from the data. Haploview uses a two marker EM algorithm (ignoring missing data) to estimate the maximum-likelihood values of the four gamete frequencies, from which the calculations for the LD summary statistics D' (Lewontin's normalized coefficient D' [12]), LOD (likelihood ratio test statistic) and r 2 (paired correlation) are derived. Haplotype phase and an estimate of the population frequency distribution of the haplotypes are inferred using a standard EM algorithm with a partition-ligation approach for blocks with greater than 10 markers. The standard chi-squared statistics on 1 degree of freedom and a corresponding p-value to test the assumption of Hardy-Weinberg Equilibrium (HWE) values were also generated for each variant using Haploview.

To validate the calculations in Haploview, haplotypes of tag SNPs were also generated using the Bayesian simulation procedures incorporated in the PHASE software [13, 14], which implements a Markov chain Monte Carlo (MCMC) algorithm. One advantage of this alternative software is that it can be used to generate standard errors for estimates of the population haplotype frequencies. We compared those standard errors to these generated from the sampling variability of a proportion using the binomial distributions and found that they were undistinguishable so we present the latter values based on the Haploview output only.

Individuals were classified into two groups according to whether their DNA specimen results showed that they had at least one deleterious mutation in BRCA1 and BRCA2 or not. Estimated frequencies of haplotypes as well as individual htSNP were compared between individuals carrying and not carrying deleterious mutations using the permutation test (with 10000 permutations) for association between SNP and case/control status that is implemented in Haploview. Haplotype distributions were also compared between groups with and without deleterious mutations by (i) using the permutation p-value for each single haplotype to generate a chi-squared statistic on 1 degree of freedom and then (ii) summing the chi-squared statistics to generate a combined statistics which was then compared to the chi-squared distribution with the relevant number of degrees of freedom to generate a p-value. These p-values were compared to the p-value generate by PHASE for comparing the full haplotype distribution. We found that in all cases these procedures gave the same "omnibus" p-value to two decimal places, which provides an empirical validation of the procedure used in PHASE.

Results

Data was available on 680 participants in regard to BRCA1 and 245 participants for BRCA2 (table 1).

Table 1 Breast cancer cases with DNA sequence data and those carrying deleterious mutations by age group.

Analyses was restricted to the 392 (BRCA1) and 179 (BRCA2) population-based individuals diagnosed with a first primary invasive breast cancer before 40 years of age for whom BRCA1 and BRCA2 had been sequenced. Some sequencing was performed for cases in other age groups but the sample sizes were small and not sufficient to warrant separate analysis. The total number of variants found in the coding regions of BRCA1 and BRCA2 gene in cases under 40 were 22 (table 2) and 15 (table 3) respectively. Each of the deleterious mutations identified in our sample appeared only once, with the exception of 2800 del AAG which occurred once in each of two study participants.

Table 2 Observed BRCA1 variants in breast cancer cases diagnosed before age 40 in the ABCFS.
Table 3 Observed BRCA 2 variants in breast cancer cases diagnosed before age 40 in the ABCFS.

Tables 4 and 5 list the details of the variants selected as tagSNPs by Haploview's Tagger program for BRCA1 and BRCA2 respectively. These tables display results of comparing the allele frequencies of these tagSNPs between individuals carrying and not carrying deleterious mutations. In both BRCA1 and BRCA2 there was some evidence that the tagSNPs allele frequency differed according to deleterious mutation status. Genotype frequencies for some variants generate small p-value for the test of HWE, most likely due to the fact that our analysis is restricted to breast cancer cases under age 40 years. The BRCA1 variant 3232A > G was found to occur more frequently in deleterious mutation carriers (p = 0.047) while BRCA2 variant 1342A > C was found to occur less frequently in deleterious mutation carriers (p = 0.043).

Table 4 Comparison of individual BRCA1 variant frequencies
Table 5 Comparison of individual BRCA2 variant frequencies

All four common BRCA1 variants used to form haplotypes occur more frequently in the deleterious mutation carriers when compared to the non-carrier group, but there was no evidence of a difference in the distribution between the two groups (p = 0.34). The opposite was true for BRCA2, where all four common variants were found to occur less frequently in the deleterious mutation carriers group when compared to non-carriers, but the evidence for difference in the distribution between the two groups was weak (p = 0.16).

When comparing haplotype frequencies between the two groups (tables 6 and 7), there was very weak evidence that haplotype AGGT in BRCA1 was over-represented in individuals carrying deleterious mutations (p = 0.151). Overall, there was no evidence of difference between haplotype distributions between deleterious mutation carriers and non-carriers in BRCA1 (with 6.d.f., p = 0.717).The most common haplotype in BRCA2, AATA, was also found to occur slightly more frequently in deleterious mutation carriers (p = 0.158). Again, there was no evidence of difference in BRCA2 haplotype distributions between the two groups (with 6.d.f., p = 0.851).

Table 6 Comparison of individual BRCA1 haplotype frequencies
Table 7 Comparison of individual BRCA2 haplotype frequencies

Discussion

In this paper we used BRCA1 and BRCA2 sequence data from Australian breast cancer cases less than 40 years of age at the time of diagnosis to classify individuals according to their deleterious mutation status, and resolved haplotypes of common polymorphisms separately in the groups that did and did not carry deleterious mutations.

We found weak evidence that one haplotype of BRCA1 variants is over-represented in carriers of deleterious mutations. This haplotype contains the minor allele for 3232A > G variant which we found to be over-represented among deleterious mutation carriers. Other haplotypes containing the minor allele "G" also occurred more frequently in deleterious mutation group.

In BRCA2 we found evidence that the population frequency of the most common haplotype in individuals carrying deleterious mutations was greater than 95%, when the corresponding frequency in those without deleterious mutations was only 65%. Individuals without this haplotype are unlikely to carry deleterious mutations but the predictive power of this haplotype for deleterious mutations is low since it occurs very frequently in those with no deleterious mutations (namely the vast majority of the population).

The sample size of the deleterious mutation group was small for both genes, with only 13 and 11 individuals carrying deleterious mutations for BRCA1 and BRCA2 respectively. The power to detect differences in haplotype frequencies is therefore quite low, which might explain some of the high p-values obtained from permutation testing.

Our selection of tagSNPs for BRCA1 gene has one in common with tagSNPs selection of Osorio et al. [3] (where they have used 4427T- > C as a tagSNP) and Cox et al. [4] (Q356R as a tagSNP). In Osorio et al. their class II haplotype occurs more frequently among BRCA1 mutation carriers. This haplotype is essentially characterized by the 4427C- > T variant allele which was used as a tagSNP in our study. We found that minor allele occurred more frequently in deleterious mutation carriers compared to non-carriers (32% vs 23%) but the evidence for this at the population level was weak p = 0.25. Cox et al. found slight increase in risk of breast cancer with the Q356R polymorphism, contradicting an earlier result showing an inverse association [15]. We found that Q356R occurred more frequently among deleterious mutation carriers but again the evidence at the population level was weak.

There have been several case control studies seeking BRCA1 and BRCA2 variants associated with an increased risk of breast cancer. Freedman at al. [16] investigated if common BRCA2 variants contribute to the more common forms of breast cancer in a large multiethnic cohort. Twenty one tagging SNPs were selected to predict common BRCA2 haplotypes. A number of haplotypes were found to be associated with increased risk of breast cancer, all of which could be attributed to a single marker (intron 24: rs206340) that was not selected as a tag SNP for analysis in our study. Freedman at al. [17] repeated similar analysis on BRCA1 gene. Specifically, they have used 28 variants to define patterns of common variation (5 in common with variants used in our study: Q356R, P871L, K1183R, S1613G and E1038G). They found no evidence for significant association between common variation in BRCA1 and risk of breast cancer.

The suggestive associations that we have observed do not imply a physical association on the same chromosome (as would be the case if the rare, deleterious mutation was in cis phase with a haplotype consisting of, for example, the minor alleles of several common variants) or a functional association (as might be the case even if the rare, deleterious mutation was in trans phase with a common variant haplotype, since it may still act to modify the penetrance of the disease causing variant). Establishing the phase of rare, deleterious mutations and the common variants we used to define haplotypes for both BRCA1 and BRCA2 would require either a much larger sample size than was available for this study, genetic data from extended pedigrees or expensive laboratory investigation.

Conclusions

We found some evidence that a single common BRCA1 variant occurs more frequently in deleterious mutation carriers. All four common variants used to form BRCA1 haplotypes are over represented in deleterious mutation carriers so the frequency of less common haplotypes is also greater in this group but there is no evidence for this at the population level. In BRCA2, there is some evidence that a single common variant is under-represented in deleterious mutation carriers so the most common BRCA2 haplotype occurs more frequently in this group. All four common variants used to form BRCA2 haplotypes occur less frequently in deleterious mutation carriers. We found no evidence of difference in haplotype distributions between the two groups in both genes, which concords with previous research. These findings are unlikely to have implications for screening at the population level for BRCA1 or BRCA2.