Relevance of the MHC region for breast cancer susceptibility in Asians

Background Human leukocyte antigen (HLA) genes play critical roles in immune surveillance, an important defence against tumors. Imputing HLA genotypes from existing single-nucleotide polymorphism datasets is low-cost and efficient. We investigate the relevance of the major histocompatibility complex region in breast cancer susceptibility, using imputed class I and II HLA alleles, in 25,484 women of Asian ancestry. Methods A total of 12,901 breast cancer cases and 12,583 controls from 12 case–control studies were included in our pooled analysis. HLA imputation was performed using SNP2HLA on 10,886 quality-controlled variants within the 15–55 Mb region on chromosome 6. HLA alleles (n = 175) with info scores greater than 0.8 and frequencies greater than 0.01 were included (resolution at two-digit level: 71; four-digit level: 104). We studied the associations between HLA alleles and breast cancer risk using logistic regression, adjusting for population structure and age. Associations between HLA alleles and the risk of subtypes of breast cancer (ER-positive, ER-negative, HER2-positive, HER2-negative, early-stage, and late-stage) were examined. Results We did not observe associations between any HLA allele and breast cancer risk at P < 5e−8; the smallest p value was observed for HLA-C*12:03 (OR = 1.29, P = 1.08e−3). Ninety-five percent of the effect sizes (OR) observed were between 0.90 and 1.23. Similar results were observed when different subtypes of breast cancer were studied (95% of ORs were between 0.85 and 1.18). Conclusions No imputed HLA allele was associated with breast cancer risk in our large Asian study. Direct measurement of HLA gene expressions may be required to further explore the associations between HLA genes and breast cancer risk. Supplementary Information The online version contains supplementary material available at 10.1007/s12282-022-01366-w.


Introduction
The major histocompatibility complex (MHC) region in the human genome is gene dense, highly polymorphic, and known as a hotspot for disease associations (mainly autoimmune and infectious). Mapping to the short arm of chromosome 6, the MHC region includes human leukocyte antigen (HLA) genes and plays various critical roles in regulating immune response. One such function is immune surveillance, which is an important defense against tumors [1].
Many studies have examined HLA genotypes in the MHC region in relation to breast cancer susceptibility [2][3][4]. However, low statistical power due to the limited sample sizes in these studies has made it difficult to obtain consistent and convincing results. Nonetheless, in a large-scale GWAS study comprising 62,533 breast cancer cases and 60,976 controls of European ancestry, a variant within the MHC region Mikael Hartman and Jingmei Li equally contributed. (rs9257408, chr6: 28,926,220) was found to be significantly associated with the risk of breast malignancy [5]. Pathway analyses of a larger, follow-up GWAS study comprising over 200,000 individuals revealed that immune-related processes may underlie some of the observed associations with breast cancer susceptibility [6]. Traditional GWAS analyses do not go beyond the interrogation of individual single-nucleotide polymorphisms (SNPs). The MHC region is unique in that beyond SNPs, there is an additional layer of functional polymorphisms in the form of HLA genotypes [7]. Transforming SNP-based associations into HLA-allele level associations may thus yield more information and biological meaning behind the associations [8].
Probe-based genotyping commonly used in GWAS is suboptimal for querying variation in the MHC region due to its extensive polymorphic nature [9]. However, with the recent development of new computational tools and large reference panels, HLA genotypes can be inferred from SNPs that are in close proximity to classical HLA loci. Compared to serological typing and direct sequencing approaches, imputation is a less accurate method of obtaining HLA genotype calls. However, determining HLA genotypes in silico using readily available SNP datasets has the benefits of being low-cost and efficient in both time and labor. We aim to investigate the relevance of the MHC region in breast cancer susceptibility, using imputed HLA alleles, in 25,484 women of Asian ancestry.

Study population
We studied 25,484 women from 12 case-control studies participating in the Breast Cancer Association Consortium (BCAC). These women were genetically Asian with known breast cancer status. Our study includes a total of 12,901 breast cancer patients and 12,583 controls (Supplementary Table 1 and Supplementary Fig. 1). This study was approved by the ethics board of the Agency for Science, Technology and Research (ASTAR IRB Ref: 2020-154). Informed consent was obtained by the individual studies which contributed to BCAC.

Single nucleotide variants (SNPs)
DNA was genotyped using OncoArray (Illumina Infinium array) [6,10]. A data request application was made to BCAC to obtain all variants within the 15-55 Mb region (build 37) on chromosome 6 (concept 686); of which 15,412 variants were received. A total of 10,886 variants remained after excluding 181 variants with missing genotype data and 4345 variants with minor allele frequency (MAF) below 0.01. The total genotyping rate was 0.999.

Imputation of human leukocyte antigen (HLA) alleles
We imputed 273 HLA alleles (in dosage format; 94 classical 2-digit alleles and 179 classical 4-digit alleles.) using the SNP2HLA package (v1.0.3) with using a pre-built reference panel (available as part of the software) based on the data of pan-Asian subjects (including Han Chinese, Southeast Asian Malay, Tamil Indian ancestries, and Japanese) [11,12]. Imputation was done in batches of 100 individuals. The info score for each individual was calculated from all individuals, based on the ratio of empirical and expected variance in dosage. The frequency of HLA alleles present ranges from 0.0005 to 0.489 (Supplementary Table 2). HLA alleles (n = 175) with info scores greater than 0.8 and frequency greater than 0.01 were selected for association analysis [9].

Breast cancer risk factors
Information on breast cancer risk factors was collected at enrollment for nested case-control studies, or post breast cancer diagnosis for cases in case-control studies. Risk factors used in our study: number of first-degree family members with breast cancer (none, 1, ≥ 2), body mass index (BMI), age at menarche (years), parity, menopausal status (post-menopausal, pre-/peri-menopausal), oral contraceptive use (≥ 4 months, < 4 months), and hormone replacement therapy use (HRT; ≥ 3 months, < 3 months). Women were considered post-menopausal if their date of last menstruation was > 12 months prior to the date of breast cancer diagnosis for cases or enrollment for controls. Ethnicity was self-reported with the exception of Korean and Thai, where they were coded based on the study's country.

Statistical analysis
Pooled analysis was performed as the mean and standard deviation of the HLA alleles were similar across studies (Supplementary Figs. 2 and 3). To study the associations of SNPs and HLA alleles with breast cancer status, logistic regression was used with adjustment for age (at recruitment for controls, and at diagnosis for cases) and the first 15 principal components (PC) (i.e. population structure). The PCs were based on 22 chromosomes. Further adjustment for breast cancer risk factors (study, family history, BMI, parity, age at menarche, menopausal status, contraceptive use, and hormone replacement therapy use) was done for the study of HLA alleles with breast cancer status. To compare our results with published literature, we did a systematic review of studies on HLA and breast cancer risk in Asian patients (Supplementary Methods).

Ethnicity distribution in our study
A PC analysis (PCA) plot of genotyping data shows the distinction between the six major ethnicities present in our study (Fig. 1). Over half of all women were Chinese (n = 14,048), and were from seven of the twelve studies ( Table 1, Supplementary Table 3). The same PCA plot, but colored by study, showed that the spread seen in the Chinese is due to the country of recruitment ( Supplementary Fig. 1).
Indians were from multi-ethnic Singapore and Malaysia (Supplementary Table 3).

Characteristics of cases and controls
The median age at breast cancer diagnosis for cases (n = 12,901) was 52 years (interquartile range [IQR]: 45-60) and the median age at enrollment for controls (n = 12,583) was 51 years (IQR: 44-58) ( Table 1). The majority of all women were of normal BMI (18.5-24.9 kg/m 2 , 42% in cases, and 57% in controls). The majority of cases (68%) and controls (81%) had at least one child. Forty-three percent of cases were post-menopausal at the time of diagnosis, a similar proportion of post-menopausal controls (45%) were recruited. Older age at menarche was observed in some studies ( Table 4). The majority of breast cancer cases were ER-positive (n = 8558, 66%), HER2-negative (n = 3777, 45%), and stage II (n = 4597, 36%) (Supplementary Table 1).

Associations between SNPs and breast cancer risk
None of the associations between the 10,886 SNPs chromosome 6 (15-55 Mb) and breast cancer risk reached genome-wide significance (P < 5e−8) ( Fig. 2A). The smallest p-value observed was for SNP rs12663096 (odds ratio [OR] = 0.92, P = 5.01e−05). Effect sizes (odds ratios, OR) observed ranged from 0.77 to 1.25, where 95% of the effect sizes observed were between 0.92 and 1.07. Similar non-significant results were observed when subgroups of breast cancer were studied, with the smallest p-value observed for rs9784889 and HER2-negative breast cancer (OR = 0.889, P = 1.19e−5) (Figs. 2B-E and Fig. 3).

Associations between HLA alleles and breast cancer risk
Of the 273 imputed HLA alleles, 175 had info scores greater than 0.8 and frequencies greater than 0.01; seventy-one were classic 2-digits HLA alleles and 104 were classic 4-digits HLA alleles (Supplementary Table 5). Heatmap of the 71 classic 2-digits HLA alleles by study participants did not show obvious clustering of cases and controls ( Supplementary Fig. 4).
Of the 175 HLA alleles, the association between HLA-C*12:03 and breast cancer risk attained the smallest p-value (OR = 1.29, P = 1.08e-3) ( Fig. 2A). Effect sizes (OR) observed ranged from 0.87 to 1.29, where 95% of the effect sizes (OR) observed were between 0.90 and 1.23 (Supplementary Table 5). No appreciable change was observed after adjustments for study and breast cancer risk factors (age, body mass index number of first-degree family members with breast cancer, number of children, age at menarche, menopausal status, oral contraceptive use, and hormone replacement therapy use) (Supplementary Fig. 5A and Supplementary Table 56).
Three of 14 articles identified in the literature reported associations between classic HLA class I or class II alleles and breast cancer risk (Supplementary Methods). All studies used a case-control design and HLA ascertained from blood samples. Only Leong et al. reported significant associations between HLA-A*31 and breast cancer risk (Fisher's exact test P = 0.020) ( Table 2). Two other HLA alleles (HLA-A*26 and HLA-A*36) were reported to be correlated with metastasis (Table 2). We did not observe the same associations in our study; the associations between HLA-A*31and breast cancer risk (OR = 1.05, P = 0.324) and between HLA-A*26 and late-stage breast cancer (OR = 1.11, P = 0.260) were not significant. HLA-A*36 alleles were not imputed.  unfilled triangles denote HLA class. Logistic regression models were used and adjusted for the first 15 principle components for population stratification and age (at recruitment for controls; at diagnosis for cases). Info score: info score from HLA prediction by SNP2HLA  Table 5). The association between HLA-B*35:03 and early-stage breast cancer showed the largest effect size after additional adjustments for study and breast cancer risk factors (OR = 0.60, P = 0.040) (Supplementary Table 6).

Discussion
It was suggested that the MHC region is associated with breast cancer susceptibility studies of European populations [5,13]. In contrast to the European genome-wide association study, we observed no significant association (P < 5e−8) between variants in the MHC region (imputed 175 classic class I and II HLA alleles) and breast cancer risk in our Asian study of 12,901 breast cancer cases and 12,583 controls (smallest P = 1.08e−3, HLA-C*12:03) [5].
Gourley et al. [14] reported that while studies showed no consistent association between HLA class I type (HLA-A, HLA-B and HLA-C) and breast cancer risk, certain HLA class I types may be more common in specific disease subclasses. Zhao et al. [15] reported down-regulation of HLA class I genes (which comprise HLA-A, HLA-B, or HLA-C) on CD4(+) and on CD8(+) T lymphocytes in breast cancer patients when as compared with healthy controls. However, we did not observe any significant association between any of the inferred HLA class I alleles and breast cancer risk, nor with any of the breast cancer subtypes in our Asian study population. Compared to other cancers, MHC class I mutations are less associated with the expression of killer lymphocyte effector genes in breast cancer [13]. Hence, possible differences in expression may be missed when studying HLA alleles (not expression).
Our systematic review identified one Asian study examining the relationship between HLA class I alleles and breast cancer risk [16]. Leong et al. [17] identified HLA-A*31 (6.8% in cases and 0% in controls; Fisher's exact test, P = 0.020) as a risk allele for breast cancer in 59 invasive ductal carcinoma breast cancer patients and 77 controls without breast cancer in Malaysia. In a subset of Malaysian women in our study, HLA-A*31 was not associated with breast cancer risk (adjusted OR = 0.77, P = 0.166; data not shown). It should be noted that the Malaysia study determined HLA-A expression using the Biotest HLA-A Sequence-Specific primers (SSP), while only imputed HLA alleles were interrogated in our study [17].
While it is conceivable that HLA genes play critical roles in immune surveillance against breast cancers, there are many other studies that show no consistent link between HLA and breast cancer [18]. Results from our study are in agreement with the two Asian studies on HLA class II alleles in the systematic review which reported the lack of associations between HLA alleles and breast cancer risk [16,19]. Contrary to this, alleles from MHC class II sub-region were found to be associated with breast cancer risk in studies on populations of European ancestry (HLA-DQB1*03 [20], HLA-DRB1*10:01 [21] HLA-DRB1*11:01 [21], and HLA-DRB1*13 [20]) and Middle-Eastern populations (HLA-DQB1*02 [14,22,23], HLA-DRB1*03 [22], HLA-DQB1*06 [22], HLA-DRB1*07 [14], HLA-DRB1*12 [24], HLA-DRB1*13 [22], and HLA-DRB1*18:01 [25]) populations. It should be noted that previously published results have small sample sizes (largest sample size of 216 cases and 216 controls), which makes validation of the results challenging [16,17,19]. It is noteworthy that HLA alleles are exceedingly diverse, and may be confined to certain ethnic groups and geographical locations [26]. Indeed, we observed that by accounting for study populations, significant associations were attenuated and no longer significant, suggesting that correction for population stratification is essential in the testing of the associations. This is the largest study to date to examine the associations between the MHC region and breast cancer risk in Asians (12,901 breast cancer cases and 12,583 controls). However, there are several limitations worth noting. Validation of the imputed HLA loci was not performed by the direct typing of HLA loci. An average error rate of between three to six percent has been reported for the prediction of high-resolution (four-digit) HLA alleles using imputation software [27]. Due to weaker linkage disequilibrium between rare variants, the imputation of rare HLA alleles is challenging. Variants with MAF > 1% in were filtered out prior to HLA imputation, and only imputed HLA alleles with MAF > 1% were retained in our analyses, which resulted in the exclusion of many informative alleles. In addition, nonclassical HLA alleles were not studied, as the imputation software used imputes only classical HLA alleles (class I and class II). The accuracy of HLA imputation is dependent on the reference panel used [12]. The ethnic representativeness of the reference panel used for HLA loci imputation may limit the generalizability of our findings. While the chosen Pan-Asian reference panel is large, it may not represent the ethnic groups of our study population fully [27]. As there were no genome-wide significant results, we did not perform orthogonal experiments to verify the findings.
In conclusion, imputed class I and II HLA alleles were not associated with breast cancer risk in our Asian sample. Direct measurement of HLA gene expressions may be required to further explore the associations between HLA genes and breast cancer risk.
Acknowledgements We thank all the individuals who took part in these studies and all the researchers, clinicians, technicians and administrative staff who have enabled this work to be carried out. The ACP study wishes to thank the participants in the Thai Breast Cancer study. Special thanks also go to the Thai Ministry of Public Health (MOPH), doctors and nurses who helped with the data collection process. Finally 15HTM). The Northern California Breast Cancer Family Registry (NC-BCFR) was supported by grant U01CA164920 from the USA National Cancer Institute of the National Institutes of Health. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the Breast Cancer Family Registry (BCFR), nor does mention of trade names, commercial products, or organizations imply endorsement by the USA Government or the BCFR. The NGOBCS was supported by the National Cancer Center Research and Development Fund (Japan). The SBCGS was supported primarily by NIH grants R01CA64277, R01CA148667, UMCA182910, and R37CA70867. Biological sample preparation was conducted the Survey and Biospecimen Shared Resource, which is supported by P30 CA68485. The scientific development and funding of this project were, in part, supported by the Genetic Associations and Mechanisms in Oncology (GAME-ON) Network U19 CA148065. SEBCS was supported by the BRL (Basic Research Laboratory) program through the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology (2012-0000347). SGBCC is funded by the National Research Foundation Singapore, NUS start-up Grant, National University Cancer Institute Singapore (NCIS) Centre Grant, Breast Cancer Prevention Programme, Asian Breast Cancer Research Fund and the NMRC Clinician Scientist Award (SI Category). Population-based controls were from the Multi-Ethnic Cohort (MEC) funded by grants from the Ministry of Health, Singapore, National University of Singapore and National University Health System, Singapore. The TWBCS is supported by the Taiwan Biobank project of the Institute of Biomedical Sciences, Academia Sinica, Taiwan.
Availability of data and materials The genetic and clinical data used in this study were obtained via a data request (concept #686) to the Breast Cancer Association Consortium (BCAC). All data requests can be directed to the BCAC data access committee (http:// bcac. ccge. medsc hl. cam. ac. uk/ bcacd ata/).