Introduction

Leukemia is the most common cancer among children under 15 years of age, accounting for 32 % of all childhood malignancies [1]. In California, Hispanics have the highest reported annual age-adjusted childhood leukemia incidence (56.2 per million), followed by non-Hispanic whites, Asian/Pacific Islanders, and non-Hispanic blacks (44.6, 40.0, and 29.1 per million, respectively) [2]. Although acute lymphoblastic leukemia (ALL) is the most common subtype of childhood leukemia, comprising ~80 % of total disease [1], it is much rarer than most cancers in adults and consequently more difficult to study epidemiologically. The etiology of ALL in children is believed to be distinct from that in adults, due largely to the clearer role for early life exposures. However, few risk factors have been conclusively established, including ionizing radiation, chemotherapeutic agents, and specific genetic abnormalities [3].

Recent studies from our group and others have found elevated childhood ALL risks associated with self-reported use of pesticides at home [4], household paint exposure [5], paternal smoking before conception [6, 7], and surrogate measures of exposure to motor vehicle exhaust [811]. In order to exert their effects, these potentially harmful xenobiotic (exogenous) chemicals must first gain entry into target cells and undergo cellular metabolic processes that alter their activity. Membrane transporters such as those encoded by the multiple drug resistance (ABCB1/MDR1) gene act as efflux pumps to expel compounds from the cell and are strategically expressed in regions of the body that act as epithelial barriers or perform excretory functions [12]. In addition, enzymes involved in phase I (bioactivation) and phase II (detoxification) metabolism maintain a critical balance of activation and inactivation of a wide range of chemical exposures of relevance to childhood ALL, including drugs, chemical carcinogens, insecticides, petroleum products, nitrosamines, polycyclic aromatic hydrocarbons, and other environmental pollutants [13].

In order to shed light on the role of genes involved in xenobiotic transport and metabolism, both alone and in conjunction with household chemical exposures, in childhood ALL risk, we utilized a haplotype-tagging approach to characterize genetic variation in a population-based study of 377 ALL case children and 448 control children in Northern and Central California. Furthermore, we examined whether effects of exposure to common household chemicals (including paints, solvents, pesticides, herbicides and tobacco smoke) previously linked to risk of childhood ALL or other childhood cancers were modified by these genetic variants.

Subjects and methods

Study population

The study was conducted among children participating in the Northern California Childhood Leukemia Study (NCCLS), an ongoing population-based case–control study conducted since 1995. The current study includes only those subjects for whom genotyping data were available. Enrollment and recruitment procedures have been described in detail previously [14]. Briefly, case children with incident childhood leukemia were ascertained via a rapid reporting system between the study office and participating hospitals. Control children identified from the state birth certificate records were individually matched to case children on date of birth, sex, maternal race, and child’s Hispanic ethnicity (having one or more parents reporting Hispanic ethnicity). Participation rates among eligible cases and controls were 87 and 86 %, respectively. Data on various potential risk factors were elicited from a parent (usually the mother) by trained interviewers using a structured questionnaire.

This study was reviewed and approved by institutional review committees at the University of California Berkeley, the California Department of Public Health (CDPH), and the participating hospitals. Written informed consent was obtained from all parent respondents.

DNA specimens

Buccal cytobrushes collected from 95 % of participating children (cases and controls) enrolled between 1995 and 2002 were processed within 48 h of collection by heating in the presence of 0.5 N NaOH. Isolated DNA was later re-purified using an automated DNA extraction system (AutoGen, Holliston, MA) and then whole-genome amplified using GenomePlex reagents (Rubicon Genomics, Ann Arbor, MI). When buccal cytobrush DNA was inadequate (26.6 % of subjects), DNA was isolated from dried bloodspot specimens (DBSs) collected at birth and archived by the Genetic Disease Screening Program of the CDPH. DBS specimens were available for 91 % of California-born participants (100 % of controls and 90 % of cases). After extraction (QIAamp 96 DNA Blood Kit, QIAGEN, Germany), these DNA samples were whole-genome amplified using REPLI-g reagents (QIAGEN). All DNA samples were quantitated using human-specific Alu-PCR to confirm a minimum level of amplifiable human DNA [15]. When analyzed using highly multiplexed GoldenGate genotyping (Illumina, San Diego, CA), whole-genome-amplified buccal cell DNA yields genotypes that are highly concordant with those from genomic DNA from peripheral blood [16]. We genotyped DNA specimens from both buccal cells and DBS for 9 subjects; genotype concordance between paired samples was 98.9 %.

Genotyping

Based on consensus review following review of the literature by our investigative team, we selected 42 genes coding xenobiotic transport and metabolism enzymes. Using HaploView software [17] in conjunction with reference single nucleotide polymorphism (SNP) data from the 30 Caucasian trios in the HapMap project (Release 19, Build 34, http://www.hapmap.org) and the 23 Hispanics in the SNP500Cancer project (http://www.snp500cancer.nci.nih.gov), we applied the method of Gabriel et al. [18] to select haplotype-tagging SNPs (htSNPs) that captured at least 80 % of diversity for common haplotypes (>5 % frequency) in either reference group. As Hispanics are a recently admixed ethnic group, and 42 % of our study population is Hispanic (at least one parent reporting Hispanic ethnicity), we placed special emphasis on capturing haplotype structures in Hispanics. To maximize capture of potential regulatory regions, we included in the SNP selection 10-kb stretches both up- and down-stream from the gene boundaries reported in the UCSC Genome Browser.

Genotyping of the selected SNPs was attempted in 385 ALL cases and 456 controls using a custom Illumina GoldenGate panel. We applied an Illumina GenCall quality threshold of 0.25, and SNP-wise and subject-wise call rate thresholds of ≥90 and ≥95 %, respectively. Genotypes for duplicate DNA specimens from 59 subjects showed 99.1 % concordance. Genotyped SNPs with an observed minor allele frequency <5 % (n = 13), or showing significant deviation from Hardy–Weinberg equilibrium (p < 0.01, n = 2), in both Hispanic controls and non-Hispanic controls were excluded. After applying these data quality thresholds, data for 250 SNPs in the 42 selected genes (Supplementary Table A) were available for 377 ALL cases and 448 controls.

In addition, we determined deletion status for GSTM1 and GSTT1 using polymerase chain reaction methods described previously [19].

Household chemical exposure assessment

Details of data collection for self-reported household chemical use in early childhood have been published elsewhere [5, 20, 21]. Briefly, utilizing an in-home interview with a structured questionnaire, we asked parents whether paints, stains, or lacquers (collectively called “paints”); adhesives or petroleum products, such as paint thinner, spot remover, paint remover, glue, solvent, gasoline, kerosene, or lubricating oil (collectively called “solvents”); professional lawn service and the use of weed control products (collectively called “outdoor herbicides”); or professional pest control services, insect repellents, flea foggers, and products to control ants, flies, cockroaches, spiders, termites, and plant/tree insects (collectively called “indoor insecticides”); were ever used in the home during specific time windows from the 3 months prior to pregnancy through either the child’s 3rd birthday or diagnosis age (reference age among controls). For this analysis, we focused on time windows for which the chemicals showed significant main effects in our previous analyses [5, 20, 21]. Accordingly, for paints and solvents, we censored exposures at the time window preceding the reference date (e.g., from birth to 1 year if the case was diagnosed between 1 and 2 years of age). For outdoor herbicides and indoor insecticides, we limited to exposures before birth. We also ascertained whether there were any tobacco smokers in the house from birth through the child’s 3rd birthday. For the purpose of this analysis, all exposures were classified as “ever/never” during the specified time window. The subjects in the current study comprise a subset of subjects in our previous reports linking household chemicals to childhood leukemia risk. In the current study subset, we found the observed associations of these household chemicals with risk of childhood ALL were consistent with our previous reports: paint use (OR = 1.42, 95 % CI: 1.06–1.92), outdoor herbicides before birth (OR = 1.46, 95 % CI: 1.04–2.04), and indoor insecticides before birth (OR = 1.29, 95 % CI: 0.97–1.72). We previously found solvents to be associated with childhood acute myeloid leukemia, and household passive tobacco exposure after birth showed joint effects with paternal smoking on childhood ALL risk. We found no main effects of these two exposures in our study sample.

Statistical analysis

Using a set of 80 ancestry informative markers [22], we have previously calculated individual estimates of Amerindian, African, and European genetic ancestry [23] using structured association methods [24, 25]. We found no evidence of major confounding by estimated genetic ancestry (>10 %) over and above adjustment for self-reported race and ethnicity [23]. Thus, we used stratification or adjustment for the self-reported factors in our analyses.

As a preliminary step prior to haplotype analysis, we tested for potential interactions of individual htSNPs with Hispanic ethnicity in disease risk on a gene-by-gene basis using unconditional logistic regression and the likelihood ratio test at the 0.05 significance level, after adjusting for age, sex, and child’s race.

For haplotype analysis, we used a haplotype sliding window approach for the SNPs in each gene, as implemented in the haplo.stats package for R [26]. This approach examines sub-haplotypes using the full set of SNP data, with differently sized “windows” of adjacent alleles. This is an effective means of combining multi-locus data for Hispanics and non-Hispanics alike, as it is agnostic to differences in haplotype structure, provided no individual SNPs for a given gene have a strong differential effect by Hispanic ethnicity. Thus, if none of the SNPs in a given gene showed significant interaction with ethnicity at p ≤ 0.05, data for both ethnicities were combined for haplotype analyses; otherwise, the haplotype analysis for that gene was conducted separately for Hispanics and non-Hispanics. We utilized GrASP, a graphical tool [27] to display and visualize sliding window results (Supplementary Figure 1). Using haplotype trend regression [28], we estimated the magnitude of effect associated with risk haplotypes from the windows with the smallest global p values, collapsing haplotypes with <5 % frequency among controls into a “rare haplotypes” category. We tested the significance of potential interactions between self-reported household chemical exposures and risk haplotypes using the likelihood ratio test, focusing on haplotypes with significant main effects among all subjects (i.e., both Hispanics and non-Hispanics combined) in order to maximize statistical power to detect interaction effects. We report both nominal and false discovery rate (FDR)-adjusted p values for the interaction analysis, adjusted for the total number of interactions examined [29]. Lastly, for haplotype-chemical interactions significant at p FDR ≤ 0.05, we derived effect estimates for household chemicals by haplotype status, applying to each subject the haplotypes with the highest inferred probability.

Results

Due to the matched case–control design of the NCCLS, the distribution of age, gender, race, and ethnicity was comparable between the 377 ALL cases and the 448 controls (Table 1). The 42 xenobiotic transport and metabolism pathway genes we examined are listed in Supplementary Table A. We found that htSNPs in eight genes (ABCC1, CYP1A2, CYP1B1, CYP2B6, CYP3A5, GSS, IDH1, and UGT1A9) showed significant heterogeneity of effect between Hispanics and non-Hispanics (p ≤ 0.05); further analyses for these genes were stratified by ethnicity.

Table 1 Characteristics of childhood acute lymphoblastic leukemia cases and controls, NCCLS

Results for genes with significant (p ≤ 0.05) haplotype effects that persisted through increasingly larger windows are presented in Supplementary Figure 1. Haplotype trend regression results estimating the magnitudes of effect for haplotypes with the lowest multi-SNP p-value in sliding window analyses are shown in Table 2.

Table 2 Haplotype trend regression results: xenobiotic transport and metabolism genes in childhood acute lymphoblastic leukemia risk, NCCLS

Among all subjects, ABCB1, ARNT, CYP2C8, and GCLC showed significant associations that persisted through progressively larger SNP windows (Table 2). In ABCB1, haplotype G–A–G–T was associated with a significantly reduced risk (OR = 0.44, p = 0.015). Haplotypes G–G of ARNT and G–G–T–G of CYP2C8 were significantly associated with increased risks of childhood ALL (OR = 4.93 and p = 0.001, OR = 3.18 and p = 0.004, respectively). The observed significant global haplotype association of GCLC was attributed to a rare haplotype; no further analysis was performed for this gene.

Among non-Hispanics, CYP1A2 and CYP1B1 showed significant haplotype associations. Haplotype A–G of CYP1A2 was significantly associated with an increased risk (OR = 2.19, p = 0.005), while haplotype A–A of CYP1B1 was significantly associated with a decreased risk (OR = 0.11, p = 0.007). The observed significant global haplotype association of CYP2B6 was attributed to rare haplotypes. Among Hispanics, the two SNPs in IDH1 showed a haplotype association stronger than either SNP individually (global p = 0.008), and the C–C haplotype was significantly associated with an increased risk of ALL (OR = 6.12, p = 0.005).

Two of the most commonly studied xenobiotic metabolism genes to date in childhood ALL are the glutathione S transferase genes GSTM1 and GSTT1, whose principal variants are deletions [30]. The GSTM1 deletion showed significantly different effects by Hispanic ethnicity (p < 0.001): among Hispanics the deletion was associated with elevated risk (OR = 1.85, 95 % CI 1.19–2.88, p = 0.007) while among non-Hispanics, the association was in the opposite direction (OR = 0.62, 95 % CI 0.43–0.89, p = 0.010). In addition, there was no evidence of association for the GSTT1 deletion (p = 0.526).

Finally, we examined interactions between household chemical exposures of interest and xenobiotic gene variants. For this analysis, we focused on haplotypes with at least 5 % frequency among controls and showed nominally significant main effects (global p ≤ 0.05) among Hispanics and non-Hispanics combined. We found significant interactions between CYP2C8 haplotype G–G–T–G and self-reported use of paints after birth (p FDR = 0.016), and ABCB1 haplotype G–A–G–T and self-reported use of indoor insecticides before birth (p FDR = 0.035) (Table 3). As shown in Table 4, our analysis indicates that the risks of childhood ALL associated with use of paints and indoor insecticides vary by presence or absence of these haplotypes. The increased risks associated with paint use appears confined to those with CYP2C8 haplotype G–G–T–G (OR = 1.67, 95 % CI 1.21–2.30), while in the small subgroup without the G–G–T–G haplotype (5.9 % among controls), paint use appears to be associated with a non-significant reduced risk (OR = 0.45, 95 % CI = 0.20–1.02). Similarly, the increased risk associated with use of indoor insecticides appears to be limited to the small population with the ABCB1 G–A–G–T haplotype (13.6 % among controls, OR for indoor insecticides = 3.03, 95 % CI = 1.59–5.78), while among those without the G–A–G–T haplotype, the risk associated with indoor insecticide use was null (OR = 1.02, 95 % CI = 0.74–1.41).

Table 3 Interactions of household chemical exposures with xenobiotic metabolism and transport genes on childhood acute lymphoblastic leukemia risk a
Table 4 Chemical by haplotype interaction analysis: effect sizes for childhood ALL risk

Discussion

In this population-based case–control study, we examined the risk of childhood ALL associated with several genes within the xenobiotic transport and metabolism pathways, utilizing a haplotype-tagging approach to maximize capture of genetic variation. We identified haplotypes of several genes that were significantly associated with childhood ALL, including ABCB1, ARNT, CYP2C8, CYP1A2, CYP1B1, and IDH1. In addition, we observed significant interactions of identified risk haplotypes with a number of self-reported household chemical exposures, including use of paints and indoor insecticides. Although confirmation is required, our findings provide evidence that genes involved in the xenobiotic transport and metabolism pathway may play a role in mediating risk of childhood ALL, and that the childhood ALL risks associated with various household chemical exposures may be modified by these variation in these genes.

A haplotype of ABCB1, which encodes a membrane transporter of lipophilic compounds, was significantly associated with childhood ALL risk and showed significant interaction with indoor insecticides, mirroring an earlier finding utilizing different genetic variants in the same gene [31]. Our results indicate that the increased risk associated with use of indoor insecticides before birth is limited to subjects carrying the G–A–G–T haplotype. The SNPs in this risk haplotype are 21 kb from the nearest of the 3′ SNPs examined in our previous analysis, in which no significant haplotype main effect was observed [31]. This is in agreement with the current analysis, which also shows no main effect of haplotypes at the 3′ end of the gene.

We also found significant childhood ALL associations with haplotypes in three genes in the CYP gene family: CYP2C8, CYP1A2, and CYP1B1. The CYP2C8 gene product is involved in metabolism of numerous drugs and other compounds [32]. In addition to a significant association with childhood ALL, the risk haplotype for CYP2C8 showed significant interaction with self-reported household paint use, with the increased risk associated with paint use being limited largely to those without the CYP2C8 G–G–T–G haplotype. For the common haplotype in CYP1A2 (31.5 % frequency among controls), we found an elevated risk of childhood ALL. The SNPs composing this haplotype are outside the CYP1A2 coding region, the nearest (rs11854147) being 5.4 kb from the 3′ end. The CYP1A2 gene product metabolizes polycyclic aromatic hydrocarbons (PAHs, found in tobacco smoke and vehicle exhaust); in utero exposures to PAHs have been linked to chromosomal aberrations [33]. CYP1B1, for which we observed a significant haplotype association with childhood ALL risk, is also involved in metabolism of PAHs, as well as steroids [34].

The ARNT gene product is a key transporter of PAHs and other compounds, and a transcription inducer of xenobiotic metabolism genes including CYP1A1 and CYP1A2 [35] that metabolize PAHs. We identified a risk haplotype for ARNT that showed a markedly higher risk of childhood ALL. We also observed a strong haplotype association for IDH1; a somatic mutation in IDH1 has been linked to survival in adult glioblastoma and AML [36, 37]. The two SNPs we examined are downstream from and in strong LD with SNPs in the IDH1 coding region.

In gauging these results, consideration must be given to several factors. First, despite this study’s relatively large sample size compared to those of most previous candidate gene studies, the presence of genetic heterogeneity due to the ethnic and racial diversity of the California population may have influenced our ability to detect associations. Our SNP selection strategy included elements designed to maximize capture of genetic variation in Hispanics. We examined Hispanics separately from non-Hispanics where there was significant heterogeneity in between-group effects of individual SNPs. Although this approach may have limited our ability to detect associations in the population as a whole, we believe it was necessary given that genetic susceptibility may be different in Hispanics versus non-Hispanics due to the Hispanic population’s relatively recent genetic admixture [22]. Results that differ between Hispanics and non-Hispanics may be due to differences in allele frequency and/or haplotype structure or may reflect underlying differences in exposures that modulate the effects of genes. Regardless, if the results are not spurious, they represent potential risk loci, and we present them in either or both ethnic groups for replication and further followup. Whereas the entire study population yielded adequate power to detect modest effect sizes (81 % power for ORlog additive = 1.40, minor allele frequency = 20 %), power was lower among Hispanics and non-Hispanics separately (44 and 59 %, respectively). In addition, the limited size of racial/ethnic sub-populations within the non-Hispanic group precluded further stratification of this group; as such, genetic heterogeneity among non-Hispanics might have obscured results. However, we found no evidence of strong confounding due to estimated genetic ancestry [23], minimizing concerns about the impact of population stratification on the results.

Two large genome-wide association studies on childhood ALL have been published to date (with 907 cases and 2,398 adult and child controls, and 317 cases and 17,958 adult controls, respectively) [38, 39]. Although these studies have identified a number of novel loci, no significant associations were observed for genes in the pathways we studied here. Null findings for these genes in the genome-wide studies may be due to stringent multiple testing adjustment (at the p ≤ 1 × 10−7 level) to account for the large number of individual variants under study. In contrast to the agnostic approach to discovery used in genome-wide studies, our study focused on relatively few genes representing key elements of the xenobiotic transport and metabolism pathways. We concede that results of our study may be due to chance and therefore must be replicated. However, the haplotype-tagging approach we adopted maximizes capture of total variation within each candidate gene and the haplotype analysis increases statistical power to detect associations over analyses of individual variants. Furthermore, although the haplotype-tagging approach does not pinpoint potential causal SNPs, it does localize risk-associated regions for further investigation such as fine-mapping.

In this study, we examined potential interactions of xenobiotic transport and metabolism genes with self-reported household chemical exposures early in childhood in the modulation of childhood ALL risk, focusing on haplotype findings observed for both ethnicities (Hispanics and non-Hispanics) combined, as the sizes of the individual ethnic groups were considered too small to permit adequately powered examinations of gene–environment interactions. Our observation that the increased risk associated with paint and indoor insecticide use [5, 20] was limited to specific subgroups defined by haplotypes of specific genes is suggestive that these genes work in concert with chemical use to modulate risk. Although we focused on a limited number of biologically plausible interactions according to a rigorous a priori analysis plan, we acknowledge that this analysis might be considered exploratory. As such we report only those interactions that were significant after accounting for multiple hypothesis testing. We recognize that our total sample size (377 cases, 448 controls) may be insufficient to observe modest interaction effects with adequate statistical power. Furthermore, although the environmental chemical exposures we examined have been previously associated with childhood leukemia risk [5, 21], these measures are derived from maternal self-reports, which are prone to reporting errors and recall bias in that mothers of cases may recall exposures differently than mothers of controls. Since this study was population-based with participation rates of 86–87 % and biospecimen collection rates >95 % for interviewed subjects, it is unlikely the results presented here are driven by bias in selection or participation. Further studies with improved measures of chemical exposure are needed to confirm the interactions observed.

In summary, we set out to investigate the role of genes in the xenobiotic transport and metabolism pathway in risk of childhood ALL in greater depth and with larger sample size than previous candidate gene studies. We also sought to examine the putative joint effects of these genes with environmental chemical exposures for which we have observed significant main effects. Our results provide evidence that elements of the xenobiotic transport and metabolism pathway may be associated with childhood ALL, and that some of these elements interact with chemical exposures to modulate risk. This study does not address the potential effects of maternal genes, which may influence in utero susceptibility to chemical exposures. The associations and interactions identified should be considered targets for further study in additional studies, with larger sample sizes, high quality environmental exposure data, maternal genes, and finer coverage of SNPs in the identified associated regions.