Measurement of these key events is critical both to validate their roles in these and other AOPs, and to test agents for their ability to contribute to these key events, most of which are themselves adverse on the basis of their regulatory relevance. Available methods to measure these events are discussed below and summarized in Table 1. Standardized OECD guideline assays measure several key events and adverse outcomes, and limitations to those are noted where applicable. We also describe non-guideline methods to measure key events. Standardization and adoption into OECD-accepted guideline methods would advance the use of these assays and integration into chemicals testing. Very few of the assays are conducted in breast tissue, so additional validation is needed to ensure that results in other models can be generalized to breast.
There are several guideline assays for measuring breast cancer, DNA damage and mutation, and multiple non-guideline methods for measuring RONS, proliferation, inflammation, and epigenetic changes.
At least one method described for each key event has been applied in mammary glands or mammary cells. However, many of the assays specify a non-mammary tissue and even tissue-agnostic assays are not typically conducted in breast/mammary tissue. For chemical testing work in vitro, it is a priority to develop culture models that reflect the behavior of normal tissue (as opposed to cancer cells), and to establish the relevance of chosen models to the outcome of interest. For some key events, it may be important that models include multiple cell types (Morgan et al. 2020).
Non-guideline methods should be standardized for consistency. In particular GI, proliferation, epigenetic changes, and chronic inflammation all have a strong temporal component, and methods should be developed to measure these key events at multiple time points along with the other standard procedures to improve reproducibility.
Breast cancer
The 2-year rodent carcinogenicity bioassay is the primary assay for breast cancer (Rudel et al. 2007). The assay is included in the OECD Test No. 451 and 453 for carcinogenicity and combined toxicity and carcinogenicity (OECD 2009a, b), and is also used by the US National Toxicology Program (NTP) (Chhabra et al. 1990), and the FDA (2007), and referenced by the EPA (EPA 2005) in guidelines for risk assessments. Other assays from short term (2–4 weeks) and subchronic (90 day) to chronic (1 year) toxicity also call for the documentation of mammary tumors (FDA 2007; OECD 2018), so these assays could capture the early onset of tumors if the methods were sufficiently sensitive to detect small lesions (Makris 2011), and could be modified to report earlier key events like hyperplasia and inflammation. For example, evaluation of breast tissue for hyperplasia or proliferative epithelial lesions could be done at several early time points and could trigger later evaluations. Doing these observations after BrdU injection and staining of tissue sections to evaluate proliferation could also increase sensitivity of the assay.
Several characteristics of classic cancer bioassays limit the sensitivity of these assays to mammary gland carcinogens. First, assays do not require prenatal or early post-natal exposures for carcinogenicity testing. The NTP often starts exposures at 5–6 weeks of age and OECD regulatory assay exposures suggest (but do not require) exposures beginning after weaning and before 8 weeks of age. Since sensitivity appears to peak around or before week 7 for these DNA-damaging agents (around puberty) (Imaoka et al. 2013), studies that start dosing later may have reduced sensitivity. Also, assays that initiate exposure in mature animals have diminished sensitivity to hormonally active agents that act during development to alter breast development and increase future susceptibility to cancer, such as estrogenic hormones, DDT, and dioxins (EPA 2005; Rudel et al. 2011). Second, carcinogenicity assay guidelines do not require the best methods for detecting tumors in mammary gland: whole mount preparations of mammary gland coupled with longitudinal sections (dorsoventral sections parallel to the body) of mammary gland for histology (Tucker et al. 2017). Palpation and transverse sections of mammary gland can easily miss tumors or lesions of interest. The NTP does specify these preferable methods for mammary gland analysis in reproductive toxicity guidelines (NTP 2011a) and an NTP workgroup recommends the early life and in utero dosing for cancer bioassays (Thayer and Foster 2007).
Two additional limitations may reduce the sensitivity of standard carcinogenicity assays. First, benign tumors are not always interpreted as an indicator of carcinogenicity, leading to a possible underestimation of risk. NTP and EPA guidance suggest that benign tumors provide additional weight of evidence if malignant tumors are also present or if studies suggest benign tumors can progress to carcinogenicity. In a short-term study, proliferative epithelial lesions, hyperplasia, or benign tumors may indicate a need for a longer term study. Benign mammary tumors (fibroadenomas) almost always coincide with carcinogenic tumors in mammary gland or other organs, and carcinomas sometimes grow from fibroadenomas (Rudel et al. 2007; Russo 2015), suggesting that benign tumors are an underutilized indicator of mammary carcinogenicity.
Finally, the dose-selection guidance in carcinogenicity testing typically calls for a high dose that is sufficiently toxic to suppress body weight (OECD 2009a). However, body weight in rodent cancer bioassays is correlated with mammary tumors, and so, toxicity-induced weight loss at higher doses may result in fewer mammary tumors, and as a result, observed mammary tumors at the lower doses may be dismissed because of the apparent lack of a dose–response (Haseman et al. 1997; Rudel et al. 2007). This potential for suppression of mammary tumors by toxicity at high dose should be considered in weight of evidence determinations for carcinogenicity.
RONS
RONS is typically measured using fluorescent or other probes that react with RONS to change state, or by measuring the redox state of proteins or DNA (Dickinson and Chang 2011; Griendling et al. 2016; Wang et al. 2013). Optimal methods for RONS detection have high sensitivity, selectivity, and spatiotemporal resolution to distinguish transient and localized activity, but most methods lack one or more of these parameters.
Molecular probes that indicate the presence of RONS species vary in specificity and kinetics (Dickinson and Chang 2011; Griendling et al. 2016; Wang et al. 2013). Small molecule fluorescent probes can be applied to any tissue in vitro, but cannot be finely targeted to different cellular compartments. The non-selective probe DCHF was widely used in the past, but can produce false-positive signals and may not be optimal (Griendling et al. 2016). EPR (electron paramagnetic resonance spectroscopy) provides the most direct and specific detection of free radicals, but requires specialized equipment. Fluorescent protein-based probes can be genetically engineered, expressed in vivo, and targeted to cellular compartments and specific cells. However, these probes are very sensitive to pH in the physiological range and must be carefully controlled. Newer selective small molecule probes such as boronate-based molecules are being developed, but are not yet widely used. EPA’s ToxCast series of high-throughput in vitro assays (Thomas et al. 2019) includes an assay that measures mitochondrial membrane potential in combination with dihydroethidium, a qualitative measure of superoxide formation (Giuliano et al. 2010; Kalyanaraman et al. 2012).
Alternative methods involve the detection of redox-dependent changes to cellular constituents such as proteins, DNA, lipids, or glutathione (Dickinson and Chang 2011; Griendling et al. 2016; Wang et al. 2013). However, these methods cannot generally distinguish between the oxidative species behind the changes, and cannot provide good resolution for kinetics of oxidative activity.
These methods are readily applied to mammary cells. However, measurements in different cell types may vary based on differences in the expression of endogenous antioxidants (Kannan et al. 2014). A standard comparison of response across cell types would be useful.
DNA damage, GI, and mutation
DNA damage can be studied in isolated DNA, fixed cells, or living cells. Types of damage that can be detected include single- and double-strand breaks, nucleotide damage, complex damage, and chromosomal or telomere damage. The OECD test guideline for DNA synthesis Test No. 486 (OECD 1997b) detects nucleotide excision repair, so it will reflect the formation of bulky DNA adducts but not the majority of oxidative damage to nucleotides, which is typically repaired via the Base Excision Repair pathway. This test is not recommended by some agencies, because it is not generalizable beyond the liver (EFSA Scientific Committee et al. 2017). The OECD test guideline alkaline comet assay Test No. 489 (OECD 2016f) detects single- and double-strand breaks, including those arising from repair as well as some (alkali sensitive) nucleotide lesions including some lesions from oxidative damage. OECD tests for chromosomal damage and micronuclei Test Nos. 473, 475, 483, and 487 measure longer term effects of DNA damage, but these tests require the damaged cell to subsequently undergo replication (OECD 2016a, b, d, e). They can, therefore, reflect a wider range of sources of DNA damage, including changes in mitosis. While the comet assay test 489 does not specify a target tissue, it is not typically performed in mammary cells (OECD 2016f) and the other guideline tests are never performed in mammary cells or tissues. Although Rube 2008 reports no difference in degree of DNA damage and repair kinetics in five tissues, expression of DNA repair proteins can vary between tissues (Gottlieb et al. 1997; Gurley and Kemp 2007; Sun et al. 2019) which could lead to a difference in damage observed at various time points. For genotoxicity testing, variations in transport and metabolism can also lead to differences between mammary gland and other tissues (Ding et al. 2014). These tests should, therefore, be generally validated for or performed in mammary gland to address risk in this tissue.
Many other (non-test guideline) techniques have been used to examine specific forms of DNA damage. The comet chip facilitates high-throughput comet assays (Ge et al. 2014). Double-strand breaks are commonly measured microscopically using fluorescently labeled antibodies to H2AX or other labeled probes because of the significant risk attributed to breaks and the relative ease of detecting and quantifying them (Lorat et al. 2016; Nikitaki et al. 2016; Ojima et al. 2008; Rothkamm and Lobrich 2003). Measurement of single-strand break repair is less common but possible by labeling single-strand break repair protein XRCC2 (Lorat et al. 2016; Nikitaki et al. 2016). Base lesions can also be detected using labeled probes for base excision repair enzymes, or by chemical methods such as mass spectroscopy (Madugundu et al. 2014; Nikitaki et al. 2016; Ogawa et al. 2003; Ravanat et al. 2014). Refinements on these methods characterize complex or clustered damage, in which various forms of damage occur in close proximity on a DNA molecule (Lorat et al. 2016; Nikitaki et al. 2016). Some DNA-damaging agents act by directly binding to DNA to form adducts, the detection of which is indicative of the potential for mutation (Rundle 2006). EPA’s ToxCast uses a labeled antibody to p53 to indicate non-specific increases in DNA damage and stress (Giuliano et al. 2010). These methods can be or have been applied to mammary or breast cells (Al-Mayah et al. 2012, 2015; Dutta et al. 2014; Haegele et al. 1998; Hernandez et al. 2013; Jones et al. 2007; Kirshner et al. 2006; Redon et al. 2010; Snijders et al. 2012; Soler et al. 2009; Wang et al. 2015). Since results in other tissue may or may not be applicable to breast tissue, they should be performed in breast to best inform risk of breast cancer.
Certain challenges are common to all methods of detecting DNA damage. In the time required to initiate the detection method, some DNA may already be repaired, leading to undercounting of damage. On the other hand, apoptotic DSBs may be incorrectly included in a measurement of direct (non-apoptotic) induction of DSB damage unless controlled. All methods have difficulty distinguishing individual components of clustered lesions, and microscopic methods may undercount disparate breaks that are processed together in repair centers (Barnard et al. 2013). Methods that use isolated DNA (gel electrophoresis and analytical chemistry) are vulnerable to artifacts and must ensure that the DNA sample is protected from oxidative damage during extraction (Barnard et al. 2013; Pernot et al. 2012; Ravanat et al. 2014).
Finally, tests for mutations reveal past DNA damage that result in a heritable change, including multiple guideline tests (OECD Ames Test No. 471 (OECD 1997a), Hprt or Xprt Test No. 476 (OECD 2016c), Transgenic Rodent Somatic and Germ Cell Gene Mutation Assays Test No. 488 (OECD 2013; White et al. 2019), and Thymidine kinase Test No. 490 (OECD 2016g)) and other assays such as the RaDR-GFP transgenic mouse that detects mutational errors in homologous recombination (Kiraly et al. 2015; Sukup-Jackson et al. 2014). One of the approved TG 488 transgenic mouse lines (Jakubczak et al. 1996) has been used to measure mutations in mammary gland in vivo, and the RaDR-GFP mouse and several TG 488 approved lines have been used for primary or established mammary cell lines (Sukup-Jackson et al. 2014; White et al. 2019). The other tests are limited to specific non-mammary tissues, and should be validated for relevance to mammary gland.
No validated protocols exist specifically to measure GI, but a wide range of the above methods are currently performed at various times after exposure to verify ongoing damage or mutations, including in mammary gland or breast cells (Al-Mayah et al. 2012, 2015; Jakubczak et al. 1996; Maxwell et al. 2008; Ponnaiya et al. 1997a, b; Snijders et al. 2012; Ullrich and Davis 1999; Yu et al. 2001). Since GI can be expressed in a variety of ways, a validated protocol should be developed that measures multiple outcomes and time points.
Proliferation and hyperplasia
Past cellular proliferation can be measured directly using labels that are incorporated into cells upon cell division (BRDU or cytoplasmic proliferation dyes) or indirectly by measuring a change in population size. Ongoing proliferation can be quantified by labeling a protein associated with the cell cycle (e.g., Ki67). Methods have been recently reviewed (Menyhart et al. 2016; Romar et al. 2016) and histopathological assessments for mammary gland hyperplasia in in vivo guideline toxicity studies are reviewed in (Makris 2011).
For in vitro assays, EPA’s ToxCast has a BRDU assay for proliferation in HepG2 (Giuliano et al. 2010) and an assay for estrogen-mediated proliferation in T47D breast cancer cells (Judson et al. 2015). Since many of the cells are proliferating in the sub-confluent cell-culture system, it is not clear how sensitive the assay is to increased proliferation. A large body of work has demonstrated that cancerous and non-cancerous cells grow differently in 3D culture systems and that the differences are not apparent in 2D. Furthermore, the effects of chemical exposure on cell growth and development may only be manifest in tissue that contains multiple cell types together. For example, mammary pre-adipocytes produce estradiol via aromatase and epithelial cells proliferate in response (Morgan et al. 2020). Finally, while many toxicity testing models use cancer cell lines, non-cancerous cells may respond differently. Thus, there is considerable need for investment to strengthen in vitro models.
Hyperplasia is measured histologically based on increased cell numbers leading to increased layers of cells and tissue depth (Collins 2018). Several modifications to guideline in vivo toxicity studies would enhance the detection of hyperplasia and could also detect altered mammary development. These include adding time points and more detailed protocols for evaluating mammary tissue, assessment in male as well as female mammary tissue, longitudinal rather than transverse sectioning, and BrDU injection and staining to better detect proliferation. These assessments could be added to 90-day, 28-day, and 14-day studies, and to studies with developmental exposure such as EPA’s pubertal and the OECD one-generation reproduction study (Makris 2011; Rudel et al. 2011).
In mammary gland, Ki67 or histology is commonly used to characterize proliferation both in vivo and in vitro, and other methods can be easily applied as well. Given the potential variation in endogenous and context-dependent proliferation between mammary and other tissues, findings in non-mammary tissues should be either validated or measured directly in mammary tissues.
Inflammation
Inflammation is commonly measured using cytokine mRNA or extracellular concentrations of cytokines like IL-6, expression of proteins including COX2 or iNOS, activation of key inflammatory signaling molecules MAP kinases and transcription factors NF-kB, and AP1, as well as histological measures like leukocyte infiltration.
Typical assays to detect changes in protein concentration, phosphorylation, or localization include ELISA (El-Saghire et al. 2013; Partridge et al. 2010; Siva et al. 2014), Western Blot (Chai et al. 2013b; Ha et al. 2010), electrophoretic mobility shift assay or EMSA (Haase et al. 2003), or immunostaining (Chai et al. 2013b; Wang et al. 2015). Immunostaining can also detect infiltration of immune cells, a marker for inflammation (Ebrahimian et al. 2018; Monceau et al. 2013; Moravan et al. 2011). Changes in gene transcription are detected using PCR (Azimzadeh et al. 2011; Bouchet et al. 2013; Moravan et al. 2011; Snijders et al. 2012; Wang et al. 2014b). Other methods include histopathological examination of tissue for indicators of inflammation (Haddadi et al. 2017), or measurements of leukocyte adhesion (Arenas et al. 2006; Rodel et al. 2008).
EPA’s ToxCast has a range of assays measuring inflammation-related outcomes (Houck et al. 2009). All of these assays stimulate cells with inflammatory stimuli in addition to the test reagent, so it is possible (though not so reported) that these are more sensitive to reduction than induction of inflammatory pathways.
No specific test is recommended for chronic inflammation. Like GI, the above assays may be applied at various time points after exposure as an indicator of chronic response. We support the development of methods optimized for chronic inflammation. These methods would likely require long-term or repeated exposures. Additionally, since inflammation is a tissue response and not a cell response alone, such an assay would ideally capture tissue interactions.
The above methods are readily applied to mammary tissue (Barcellos-Hoff et al. 1994; Bouchard et al. 2013; Datta et al. 2012a; Snijders et al. 2012; Wang et al. 2015). Since it is not well established how inflammatory responses in a different tissue might apply to mammary tissue, validation studies should be conducted to establish the tissue variation in assay responses to inflammatory stressors, or studies should be conducted in mammary gland.
Epigenetic changes
Epigenetic changes occur and are measured in several different ways. Shifts in global methylation are measured using the [3H]dCTP extension assay (Koturbash et al. 2016), HPLC for 5mdC (Wang et al. 2014a), or quantitative immunoassay for 5-mc (Nzabarushimana et al. 2014). Gene-specific changes in methylation can be measured using MeDIP-on-chip (Hsu et al. 2015), methylation-sensitive/dependent restriction enzymes and qPCR (Nzabarushimana et al. 2014; Oakes et al. 2006a) and bisulfite sequencing and PCR (Wang et al. 2014a). Changes in miRNA expression are measured using qRT-PCR (Stankevicins et al. 2013; Szatmari et al. 2017), hybridization (e.g., microarray, etc.) (Aypar et al. 2011; Jacob et al. 2013), and sequencing (Mestdagh et al. 2014), while histone methylation is measured using ChIP (Prior et al. 2016) or immunohistochemistry (Kutanzi and Kovalchuk 2013).
The most consequential outcome of epigenetic changes is changes in gene expression, which can be measured using western blot (Wang et al. 2014a), qRT-PCR (Prior et al. 2016), or whole transcriptomics, e.g., RNA microarray and RNA-seq (Hrdlickova et al. 2017; Manzoni et al. 2018).
These methods can be and have been applied to breast and mammary gland (Kutanzi and Kovalchuk 2013; Loree et al. 2006; Luzhna et al. 2015; Stankevicins et al. 2013). Considering the tissue, sex, and temporal variability reported in epigenetic effects, experiments that differ in one of these contexts should not be extrapolated to mammary gland and instead the study should directly examine mammary tissue.