Introduction

Understanding the contribution of environmental exposures to human disease is a major focus of environmental health sciences and epidemiology and has important public health implications. Environmental exposures are typically ascertained using a broad range of collection instruments such as questionnaires, national registries, personal monitoring devices, geographic information systems, and biomarkers of exposure, i.e., chemical concentrations measured in biological fluids, which are obtained in a prospective or retrospective manner. Many of these collection methods have been shown to be highly robust and reliable for certain exposures, particularly when ascertained prospectively; however, for many studies, there are real practical barriers to obtaining reliable measures of exposure using existing methods. This is particularly problematic for exposures with short half-lives, studies interested in specific time frames where prospective ascertainment and biosample collection are not possible and retrospective exposure data is unreliable due to long recall times and/or recall biases, and exposure measurement that does not directly measure exposure levels present in individuals, i.e., effective biological dose, such as exposure estimates derived from geographic information systems. Even for exposures that can be reliably obtained using existing methods, exploring alternative methods of ascertainment is worthwhile because they may also be reliable and could provide a more cost-effective measure of exposure than the existing instruments.

Recent work highlights the potential for epigenetics, mitotically heritable, and reversible cellular information to fill this gap by serving as a robust molecular biomarker of exposure. Most studies seeking to investigate the role of epigenetics in environmental exposure and human disease have focused on epigenetics as a mechanism for disease, i.e., an environmental exposure leads to an epigenetic change that causes disease (Fig. 1a). However, recent evidence suggests that exposure-related epigenetic changes may also serve as a proxy for exposure when investigating exposure-disease relationships in lieu of having actual prenatal exposure data (Fig. 1b). Thus, epigenetics may inform environmental health and epidemiology studies in two equally impactful ways (Fig. 1). The focus of this review is to summarize the evidence in humans that supports the potential for epigenetic signatures, i.e., patterns, to serve as biomarkers of environmental exposure (Fig. 1b).

Fig. 1
figure 1

Framework describing how epigenetic marks can inform environmental health and disease based research. a Epigenetics may provide a biological mechanism for environmental exposure associations with human disease. b Epigenetics may serve as a biomarker of environmental exposure

Epigenetic Modifications Are Environmentally Labile

Epigenetics provides a mechanism for cells, with the same static genetic code, to develop into functionally distinct cell types and to change their cellular program in response to their environment. There are many different types of epigenetic marks including DNA methylation, histone tail modifications, small non-coding RNAs, and higher-order chromatin/nuclear structures. A central property of epigenetic data is that it is reversible, making it an attractive biological mechanism to link environmental exposures to cell alterations. In fact, changes in DNA methylation (DNAm), non-coding RNAs, and histone tail modifications have all been shown to be associated with human environmental exposures across a wide range of domains including toxicant, social/behavioral, diet, and pharmacologic. DNA methylation has been shown to undergo both losses and gains in global and/or repetitive elements with exposure to metals [13], benzene [4], persistent organic pollutants [5, 6], particulate matter [7, 8], mycotoxin [9], endocrine-disrupting chemicals [10], lifestyle factors [11, 12], and inorganic arsenic [13, 14]. In addition, locus-specific differences in DNAm associated with exposure to nutrition [1517], inorganic arsenic [18, 19], medications [20, 21], childhood abuse [22] and stress [23], socioeconomic status [24, 25], tobacco [2632, 33•, 34, 35•, 36], polycyclic aromatic hydrocarbons [37, 38], infections [39, 40], and endocrine-disrupting chemicals [41, 42] have also been observed. Changes in small non-coding RNAs have been found to be associated with inorganic arsenic [43] and ozone [44] exposures. Global changes in histone tail modifications have been shown to be associated with exposure to metals [45] and particulate matter [46, 47].

While exposure associations have been found for many different types of epigenetic marks, the evidence presented below, supporting the potential utility of epigenetics as a biomarker of exposure, mainly converges on DNAm for several reasons. First, DNAm is the most widely studied epigenetic mark; thus, given the recent emergence of this field, it has accumulated the most evidence to date compared to other types of epigenetic data. Second, DNAm has the potential to retain exposure signatures years later since it is the only epigenetic mark with a clear mechanism for post-mitotic inheritance. It is possible that some locus-specific DNAm marks in the genome are relatively stable, such as those involved in tissue specificity, while others are more labile. Recent evidence has shown that inter-individual variation in DNAm signatures is relatively stable in adulthood [48]. Although much work remains to determine which environmentally induced DNAm changes are rapid and transient and which are more stable over time, DNAm does offer an inherent biological mechanism for cells to remember environmental exposure-related changes. Finally, at the current time, there are practical limitations to considering histone tail modifications or RNA as biomarkers of exposure. Histone tail modification measurements require very large numbers of cells and rapid DNA-protein cross-linking that is not practical for most large population-scale studies. RNA can degrade quickly, and the quality of RNA in many existing biosample repositories is likely to be poor; therefore, its utility as a biomarker is also questionable. DNA methylation, on the other hand, is stable and can be reliably measured from biospecimens that have been stored for many decades. Finally, as described in more detail below and in Table 1, existing technologies to measure DNAm are reproducible, cost-efficient, and amenable to high-throughput processing.

Table 1 Overview of locus-specific epigenetic measurement tools

Ideal Properties of a Biomarker of Exposure

There is a growing body of literature showing that environmental exposures can influence the epigenome in humans. However, that alone does not merit its use as a biomarker. For an exposure biomarker to be useful it should (1) have a relatively long half-life that is suitable for a particular type of exposure and/or type of experiment, e.g., toxicants with acute effects may show DNAm signatures that are stable for hours or days, whereas chronic effects of exposure on DNAm may be present after many years; (2) show specificity; (3) be present in an accessible tissue and reflect exposure dose; (4) be able to classify individuals based on their exposure status; and (5) be relatively inexpensive to measure in a large number of samples. While this area of research is still in its infancy and considerable work remains, in Table 2 and the sections below, we summarize current findings that support the potential for DNAm to serve as a robust exposure biomarker, i.e., describe how it meets each of these criteria.

Table 2 Summary of evidence supporting DNA methylation as a biomarker of exposure

Exposure-Related DNA Methylation Changes Show Stability

The existence of a stable epigenetic signature of exposure could overcome existing limitations for exposures with short half-lives. For example, the half-life of cotinine and phthalate monoesters is less than 24 h, making past exposure difficult to assess, and cost-effective prospective studies are challenging since they are likely to require biospecimen collection at multiple time points. A biomarker of cumulative exposure would be beneficial for studies that seek to investigate the relationships between lifetime exposures and disease. For example, an epigenetic signature that reflects cumulative exposure to smoking may aid studies investigating the relationship between lifetime smoking exposure and lung cancer risk. Finally, epigenetic signatures of exposure may also serve as stable and practical biomarkers of rapid, acute changes related to environmental exposure, even if only present for hours or days after exposure. Several recent studies have shown that epigenetic changes associated with environmental exposures can be observed for months to years after exposure, offering a potential solution to this problem.

It is often difficult to study cumulative exposure because existing biomarkers for readily accessible tissues typically reflect short-term exposure levels. For example, lead in the blood has a half-life of 36 days; therefore, blood lead measurements reflect recent lead exposure. Bone lead reflects several years of exposure; therefore, cumulative lead exposure levels are currently obtained using bone scans, via K-X-ray fluorescence. Although these scans provide reliable estimates of several years worth of lead exposure, they can be costly and burdensome to research participants. DNA methylation levels at long-interspersed element 1 (LINE-1) elements in the blood have been shown to be associated with patella bone lead levels, potentially offering a more attractive alternative measure of cumulative exposure for lead [49]. Similarly, studies on adults have shown that DNAm levels at specific smoking-associated loci reflect cigarette pack years and time since quitting [35•]. These findings provide support for the potential of DNAm to serve as a biomarker of cumulative exposure in adulthood.

In addition to showing associations in adulthood, DNA methylation changes related to prenatal exposures have been observed at birth and in early life. Joubert et al. first identified a 26-locus DNA methylation signature, in cord blood at birth, that is significantly associated with maternal plasma cotinine (a molecular biomarker of smoking) levels around gestational week 18 [50•]. Thus, showing smoking-related changes in DNAm can be observed about 22 weeks after the exposure window under investigation. DNAm levels at birth, from the cord blood or placenta, have also been associated with prenatal exposure to folate [17], inorganic arsenic [14, 19], phthalate and phenol during trimester 1 [41], medications [20, 21], caloric restriction [15], and xenoestrogens [10]. DNAm differences related to prenatal exposure have also been detected over much longer periods of time. For example, Breton et al. found significant associations between DNAm levels at 19 genomic loci obtained from the peripheral blood of children at age 12 and their prenatal exposure to smoking [36].

In addition to prenatal exposures, studies with DNA methylation measurements in adulthood have shown associations with early life exposures. For example, differences in DNAm at a few loci, detected using blood from adults over age 40, have been shown to be associated with exposure to abuse during childhood [22]. Preliminary studies have also shown that after adjustment for potential confounders, three classes of repetitive elements (Sat2, Alu, and LINE-1) among females and hundreds of specific genes in males [25] show DNAm differences in the adult peripheral blood that are related to their childhood socioeconomic status. Finally, exposure to stress in early childhood was found to be associated with site-specific changes in buccal cell DNAm in adolescents [23].

Two studies have directly assessed the persistence of prenatal smoking-associated DNAm changes, within an individual, over time. The first study took a candidate gene-based approach and found that DNAm changes in the AHRR gene related to prenatal smoking exposure are present at birth and are consistent with DNAm patterns in the same individuals at 18 months of age [28]. A second, larger, independent study examined DNAm patterns at five loci among individuals with a biospecimen collected at birth, age 7, and age 17. They found that DNAm patterns at all five sites were significantly different and consistent across samples obtained at birth and age 7 between the prenatally exposed and unexposed individuals [51••]. Furthermore, four out of five sites showed similar patterns of DNAm at age 17 [51••]. These findings remained significant even after adjusting for exposure to secondhand smoke and firsthand smoking during childhood and adolescence [51••].

Persistent changes in DNA methylation have also been observed for other exposures as well as for both child and adulthood exposure windows. For example, during acute exposure (with a high viral load) to human immunodeficiency virus (HIV), virus-specific CD8 positive T cells undergo loss of DNA methylation at the PD-1 locus [40]. This DNAm change has been shown to persist in chronic stages of infection, even after the viral load is reduced to undetectable levels using highly active antiretroviral therapy (HAART) [40].

Specificity of Epigenetic Signatures of Exposure

For an epigenetic signature to be a useful biomarker of exposure, it should be specific for a particular exposure under investigation. Additionally, it would be useful to have epigenetic biomarkers that are specific to different windows of exposure and/or provide information about changes in exposure status at different points in time.

Exposure-Specific Signatures

Given the recent emergence of this field, little work has been done to date to directly address the specificity of exposure-associated epigenetic signatures in humans. However, the lack of overlap among the specific loci and genomic regions identified by studies of different exposures provides some plausibility for the existence of exposure-specific genomic location differences. For example, none of the top ten CpG sites identified as associated with prenatal exposure to cadmium [52] were identified in studies of prenatal [27, 50•, 51••] or adult [31, 32] exposure to smoking. Importantly, since these studies measured DNAm using the Illumina Infinium 450 K BeadChip, they were capable of identifying the same sites.

Exposure Window-Specific Signatures

Site-specific differences in DNA methylation at birth related to prenatal exposure to famine were shown to differ, depending on the gestational timing of the exposure [15]. Initial work in adults suggests that DNA methylation signatures of exposure to smoking, i.e., sets of loci, may differ based on whether the individual is a current, previous, or never smoker. A genome-scale screen of DNAm, using adult peripheral blood samples, revealed 15 loci with significant differential methylation related to levels of cumulative exposure to tobacco smoke (in pack years), current exposure to active smoking, and former smokers, compared to individuals that never smoked [33•]. Among the group of former smokers, DNA methylation levels at three sites were significantly associated with time since quitting, with increasing amounts of DNAm related to longer times since quitting. Although these three sites also came up as associated with current smoking exposure, the direction of change in DNAm differed [33•]; thus, these three loci show window-specific differences in DNAm alterations.

Additional evidence supporting the potential for DNAm signatures to be specific to a particular exposure period comes from the lack of complete overlap between sites identified in prenatal versus adult exposure windows for smoking, even though the sample sizes for each exposure time frame are comparable. For example, two of the strongest genomic regions, in the F2RL3 and LRP5 genes, identified and replicated in adult studies as associated with smoking exposure [26, 31, 32, 33•, 35•] have not been identified in any of the newborn or childhood studies of prenatal smoking exposure [27, 36, 50•, 51••]. In fact, Markunas et al. recently compared genomic regions showing differential DNAm associations with smoking across multiple adult studies to genomic regions showing DNAm alterations associated with prenatal exposure to smoking. For the four genes (AHRR, GNG12, GFI1, and CNTNAP2), smoking-related DNAm alterations were present in both the adult and prenatal exposure windows [27]. However, 11/17 and 6/10 genomic regions were restricted to the prenatal and adult exposure windows, respectively [27]. These findings highlight that some changes in DNAm may be common across different exposure windows; however, many appear to be specific for a particular exposure window.

Accurate Prediction of Exposure Using Epigenetic Measurements

A critical component of any biomarker is the ability to accurately classify samples into exposure categories and/or provide a quantitative measure of exposure for each sample. DNA methylation levels are quantitative in nature, with values ranging from 0 to 100 %; thus, they are inherently well suited to report quantitative measures of exposure. Empirical evidence for DNAm as a quantitative measure of exposure stems from recent work detailing dose-response relationships in adults with smoking exposure and DNAm levels at a locus in the F2RL3 gene [35•]. For current smokers, the authors observed a strong inverse relationship between DNAm levels and the average number of cigarettes smoked per day [35•]. Among former smokers, they report a dose-response relationship between methylation at F2RL3 and time since quitting smoking, up to about 20–25 years; individuals with longer times since smoking cessation had higher levels of DNAm [35•]. Studies have also shown dose-response associations between cumulative measures of lifetime exposure to smoking [35•] and lead and DNAm [49].

In addition to dose-response relationships, DNAm levels at smoking exposure-related CpG sites have been assessed for their ability to accurately classify samples into dichotomous exposure categories. A classifier built using DNA methylation values at four smoking-associated loci, measured 14 years after smoking cessation, on average, was able to predict former and never smokers with 71 % sensitivity and an area under the curve (AUC) value, a measure of classification accuracy, of 0.83 [53••]. In comparison, a predictive model built to classify former and never smokers using cotinine measurements, a widely used measure of smoking exposure, showed poorer performance with an AUC value of 0.47 [53••].

Practical Considerations

Detectable in Accessible Tissues

Examination of affected tissues is important for studies that seek to provide a mechanistic link between an exposure and disease; however, they are not needed for biomarker purposes. In fact, for a biomarker to be useful, it should be detectable and should reflect exposure levels in a readily accessible tissue such as blood, saliva, or buccal mucosa. Since blood circulates throughout the body and comes into contact with all organ systems, it is plausible that exposures with various routes of entry may come into contact with blood. Studies have shown that inter-individual differences in DNAm are highly consistent across tissues [54, 55]. Although some of these inter-individual differences are likely to be driven by underlying genetic variation, there is evidence that some genomic regions with changes in DNAm related to differences in periconceptional nutrition status are also maintained across tissues [56]. Finally, the most compelling evidence that DNAm changes, with values that reflect exposure levels, which can be detected in accessible tissues, is empirical: all of the studies described in the preceding sections of this review were carried out using surrogate tissues including buccal mucosal cells [3, 23], placenta [10, 41], cord blood [5, 14, 15, 17, 1921, 27, 28, 38, 50•, 51••, 52], saliva [42], and peripheral blood cells [2, 4, 69, 1113, 16, 18, 22, 2426, 2932, 33•, 35•, 36, 37, 40, 45, 49, 51••].

Availability of Cost-Effective Epigenetic Measurement Tools

There are a large number of reproducible and accurate genome-wide, site-specific, and global assays available to measure DNA methylation, each with their own advantages and disadvantages with respect to cost and sample throughput [5759]. It is likely that more expensive genome-wide approaches, costing hundreds to thousands of dollars per sample, will be needed to initially identify and define exposure-specific epigenomic signatures, i.e., sets of loci (Table 1). However, once these signatures have been established, there are a number of highly accurate, reproducible, and inexpensive technologies such as bisulfite pyrosequencing [60], MethyLight [61], and EpiTyper [58] available to measure a subset of genomic sites that comprise a particular exposure signature (Table 1). These cost an order of magnitude less that genome-wide technologies, costing tens of dollars per sample, and are well within (or below) the cost range of other commonly used exposure biomarkers such as cotinine and metal serum measurements (Table 1).

Challenges and Future Directions

We have highlighted several lines of evidence showing the promise of epigenetic signatures as biomarkers of exposure; however, there are also many challenges. Because DNA methylation patterns at some loci in the genome are cell type specific and tissues are heterogeneous, it is possible that differences in DNA methylation shown to be associated with environmental exposures may simply reflect shifts in the proportions of specific cell types in a given tissue [62]. For example, Kile et al. found that prenatal exposure to arsenic prior to 16 weeks of gestation was associated with an increase and decrease in cord blood CD8-positive and CD4-positive T cells, respectively [63]. Due to this concern, analytic methods [6466] have recently emerged and are being widely applied in epigenomic studies to account for potential shifts in underlying cell proportions. While it is important to understand the underlying cause of observed differences in DNAm, from a biomarker perspective, removal of potential confounding by cell type may be less of an issue than it is for mechanistic studies. Even if the observed DNAm differences associated with exposures merely reflect shifts in underlying cell proportions, DNAm could still serve as a biomarker of exposure, i.e., it would be an accurate, quantitative measure of shifts in cell proportions that are related to exposure.

It is also possible that epigenetic signatures of exposure may be influenced by genetic variation and/or gene-environment interactions [67]. Future studies to define epigenomic changes related to exposures should also consider potential genetic modifiers. This is likely to require unified population-scale datasets with genetic, environmental, and epigenetic data from the same individuals and development of new integrative statistical approaches.

Most of the evidence provided above in support of the potential for epigenetics to serve as a biomarker of exposure has focused on tobacco smoke. While it is likely that these findings would extend to other exposures, at least in some cases, studies designed to rigorously evaluate the extension of these properties to other environmental exposures are needed. Similarly, additional studies to investigate exposure specificity, persistence, and the predictive potential of epigenomic signatures exposures are needed. These studies will likely involve close collaboration among researchers across several fields including, epidemiology, basic science, environmental health sciences, biostatistics, and epigenomics. In fact, collaborative efforts are already underway to begin to relate “omics” data to early life exposures through The Human Early-Life Exposome (HELIX) project [68]. Finally, it is important to consider the potential ethical and legal impact of a biomarker of exposure, including epigenetic signatures, to determine whether there are likely to be any unintended consequences as a result of their use [69].

Conclusions

In this review, we present evidence showing the potential for epigenetic signatures to serve as biomarkers of environmental exposures. These findings highlight that DNA methylation fulfills several ideal biomarker criteria including long-term robust changes associated with environmental exposures, exposure specificity, dose-response relationships, the ability to accurately predict exposure status that it is detectable in accessible tissues, and existing technologies to measure DNAm that are accurate, reliable, and relatively inexpensive. Should future studies definitively show that epigenetic signatures can be used in lieu of actual exposure data, they could overcome several limitations with existing exposure ascertainment methods. Thus, enabling new studies seeking to investigate the relationships between environmental exposures and human health may otherwise not be possible.