Over the last two decades, there has been rapid progress in the fields of nanoscience and nanotechnology. The utilisation of nanomaterials (NMs) has offered valuable technological advancements in sectors such as cosmetics, electronics environmental remediation, as well as the development of nanomedicines [1]. Unfortunately, however, the same nano-specific physicochemical properties, which make these materials so unique and attractive, has contributed to concerns about the potential hazards of NMs to human health [2]. These physicochemical properties include but are not limited to small size, high surface area to volume ratio, variation in shape, changes in melting point, solubility or dissolution rate, increased surface reactivity, varying electrical properties and potentially altered crystalline structure of the materials as compared to the bulk form [3]. It is now understood that exposure routes to NMs include ingestion, inhalation, and dermal, with the intravenous route being important for intentional administration of nanomedicines [4]. Recently, a comprehensive battery of in chemico, in vitro, ex vivo and in vivo studies have demonstrated that NMs can vary in their ability to induce adverse effects including cytotoxicity [5, 6], inflammation [7, 8], autophagy [9, 10], cardiotoxicity [11,12,13,14,15], carcinogenicity [16, 17] and genotoxicity [18,19,20]. In particular, amongst these associated NM-induced hazards to human health, genotoxicity has attracted much attention due to a causal link to cancer and the potential for inheritable mutations to cause birth defects [21].

The International Agency for Research on Cancer (IARC) categorizes materials and chemicals on their potential carcinogenicity according to the strength of scientific evidence [22]. Carcinogens are natural or synthetic materials that have the potential to cause cancer in living tissues via damaging DNA and chromosomes, inducing aneuploidy, or by disrupting normal cellular metabolic processes. Recently ten key characteristics have been identified with one or more are commonly exhibited for well-established human carcinogens. These characteristics provide the basis for an objective approach to identifying and organizing results from pertinent mechanistic studies. These characteristics are the abilities of a substance to (1) act as an electrophile either directly or after metabolic activation; (2) be genotoxic; (3) alter DNA repair or cause genomic instability; (4) induce epigenetic alterations; (5) induce oxidative stress; (6) induce chronic inflammation; (7) be immunosuppressive; (8) modulate receptor-mediated effects; (9) cause immortalization; and (10) alter cell proliferation, cell death, or nutrient supply [23].

The mechanisms of genotoxicity induced by NMs within a single cell type can be classified into primary (direct and indirect) and secondary damage [24]. During primary genotoxicity, the damage can be elicited by direct interaction of NMs with the genetic content of the cell. This would require the NMs to enter the nucleus and interact directly with DNA. By definition, indirect genotoxicity does not require physical interact of NMs with the DNA, but can be induced by oxidative stress or interactions of mutagens with non-DNA targets leading to damage of proteins involved in DNA replication, cell division, or DNA accurateness. The indirect mechanism of action (MoA) requires either the NMs to deplete antioxidants via promoting ROS production, thus increasing oxidative damage, or to increase oxidative damage via mitochondrial activity. Secondary genotoxicity is classified as ROS damage generated by phagocytes (predominately macrophages and neutrophils) during the NM-induced inflammation causing downstream secondary genotoxicity in other cells [25, 26]. In vivo NM-induced chronic inflammation is associated with a greater risk of secondary genotoxicity of bio-persistent materials and the associated continual generation of ROS and reactive nitrogen species (RNS) causing cell and tissue damage [27].

As touched upon, physicochemical properties of NMs including, shape, size, dissolution, agglomeration state, chemical composition, specific surface area, crystal structure, surface morphology, coating and charge will impact their interaction with biological surroundings, influence their toxicokinetics in the body and their potential adverse effects. As discussed, ROS are key in NM induced primary genotoxicity [28]. Many NMs, and especially carbon-based NMs, are able to generate ROS or RNS in a NM-surface dependent manner [29]. Carbon black-induced genotoxicity to the liver in vivo is likely caused by carbon NM-generated ROS [30]. Additionally, it is now well documented that the small size of such materials and consequent large surface area significantly increases the potential for ROS formation. An enlarged surface area exponentially increases the electroactive sites on the NM, allowing them to be readily exposed. The augmented reaction with molecular oxygen results in the generation of hydrogen peroxide (H2O2) or superoxide (O2) anions [31], which can subsequently oxidize DNA, RNA and other molecules.

Due to the large and expanding number of NMs produced and utilized in various formats and their availability in different nanoforms (NFs) (varying in size, shape, coating etc.), it is recognised that alternative methods are needed to streamline the hazard and risk assessment processes, reducing the need to assess hazard on a case-by-case basis. Such processes will help to make risk assessment and innovation of NFs more financially and ethically viable, as well as more efficient [32]. Alternative approaches such as grouping are suggested by the European Chemicals Agency [33]. Grouping requires a combined demonstration of similarity of physicochemical properties (what they are), toxicokinetics (where they go) and hazards (what they do). The demonstration of similarity by performing grouping allows read across from source substances with available hazard data, to target substances where toxicological data is lacking. The H2020 European project GRACIOUS has developed a Framework to supporting grouping and read across of NFs [34]. The GRACIOUS Framework is underpinned by scientific hypotheses, which identify physicochemical descriptors relevant to grouping of NFs with predicted similar routes of exposure, toxicokinetics and hazard outcomes. Hypotheses for grouping are substantiated by Integrated Approaches to Testing and Assessment (IATA), which encourage analysis of existing information coupled with the generation of new information where needed to support a grouping decision. Each IATA consists of a series of decision nodes that identify the required information, based on the relevant route of exposure, physicochemical descriptors, toxicokinetics and hazards identified in the grouping hypothesis. These decision nodes are structured (e.g. 35), to facilitate efficient decision-making (Fig. 1). If the members of a group are sufficiently similar for the different descriptors identified by the decision nodes, then read-across for regulatory data gap filling can be conducted for a specific hazard endpoint. At least one member from within the group would require sufficient data to meet regulatory requirements for the hazard endpoint of interest (source material), usually in the form of in vivo data. The Framework is designed to be sufficiently sensitive to distinguish between different NFs of the same NM varying in subtle characteristics such as size, crystallinity, functionalisation of the surface of contaminants. At the same time the Framework is sufficiently flexible to allow grouping or comparisons of different NMs of different chemical compositions, which can be useful during the early innovation stages. Case studies to demonstrate these applications have been completed and are being prepared for future publications.

Fig. 1
figure 1

A generic IATA as used in the GRACIOUS Framework to test grouping hypotheses and thereby support grouping of NFs. The blue boxes are the decision nodes which provide the questions to be addressed to streamline the information gathering. Each decision node is supported by a tiered testing strategy consisting of standard operating procedures (SOPs) where possible. Answering all of the decision nodes ‘yes’ results in moving down through the decision tree and acceptance of the hypothesis. If any of the answers is “no” this results in rejection of the grouping hypothesis and exiting the decision tree to the right. The tiered testing strategy described in this short communication would support one decision node in such an IATA

For each decision node, the user is provided with a tiered testing strategy, which identifies the most appropriate methods for gathering the relevant evidence (from existing literature and/or experimentation) required to allow an answer to be generated. For example, for a decision node which addresses the genotoxicity of the NFs within the potential group, the decision node could be worded either ‘Do candidate NFs induce genotoxicity?’ or ‘Are the NFs similar in their ability to induce genotoxicity?’, to which a ‘yes’ or ‘no’ answer could be generated by analysis of the data generated by the tiered testing strategy.

Since the assessment of genotoxicity is relevant for all exposure routes and for many target cell types and therefore for all human health IATAs generated in the GRACIOUS project [35], here, we describe a simple three-tiered testing strategy to assess NF genotoxicity. The tiered testing strategy employs existing genotoxicity testing strategies and methodologies, but places them in a context to support grouping of NFs (Fig. 2). The strategy is based on and builds upon existing guidance for assessment of NF-induced genotoxicity [36,37,38] and incorporates OECD guidelines when available [39]. In addition and importantly, previous EU projects have also worked and contributed to method adaptation of genotoxicity testing methods required for NFs (e.g. NanoGenoTox, NANoREG) [40]. Figure 2 can be incorporated into any existing human health IATA to allow genotoxicity to be addressed as part of a grouping hypothesis (e.g. respirable, bio-persistent, rigid HARNs—following inhalation exposure and translocation of HARNs to the pleura, mesothelioma development can occur) [35]. In addition, the tiered testing strategy could be used in a new user-defined IATA to address a hypothesis not currently outlined by the GRACIOUS Framework, or it can be used as a stand-alone decision node where the user has a very specific grouping need.

Fig. 2
figure 2

A simple three-tiered strategy to assessing and grouping of NF-induced genotoxicity

For a stand-alone decision node, a simple hypothesis would be required such as ‘NFs with X physicochemical characteristics, following exposure route of Y, would result in genotoxicity in tissues Z’. Acceptance of the hypothesis for each NF under investigation will support the formation of a group. Prior to use of the genotoxicity tiered testing strategy, it is crucial that the NFs are characterized, including a minimum of size distribution, shape, composition (including crystallinity) and surface coating [34]. Further characterization may be required for application of a pre-defined GRACIOUS grouping hypothesis (e.g. dissolution rate in biologically relevant fluids). In addition, information relating to use and exposure scenarios would be needed for NFs to ensure the physicochemical characterization of the NF is relevant to the specific exposure scenario (e.g. in the medium/form of exposure to the body) and resultant target tissues. The characterization methodologies are described elsewhere [34]. Additionally, toxicokinetics information, if available, will also be required to establish which target organs and tissue or cell types are most appropriate for inclusion in the hypothesis. This information may not be required for all candidate NFs in the proposed group, but instead may be read-across from the source NFs/non-NFs to the target NFs where needed. The route of exposure and toxicokinetics information is useful to identify relevant target cell types. Importantly for genotoxicity testing, there is also a requirement for careful consideration for the selection of appropriate cell lines with stable genetic background.

To build a read-across argument all of the target NFs first require either Tier 1 or 2 data to allow an initial assessment of similarity to the source. The same method from the same tier must be used for all NFs for a specific IATA decision node. The similarity assessment may be qualitative (expert judgement) or quantitative [40] and can be used to support read-across to fill the data gaps for the Tier 3 data. If Tier 1 data is utilized, but it is insufficient to support a grouping decision, for example due to variability in data, particle interference with an assay or missing data, then the user may move to Tier 2. The use of Tier 1 data is often sufficient to support decision making relating to potential safety of NFs during the early innovation stages. However, Tier 1 assays will only detect primary genotoxicity potential of a NF. Since most NFs (NFs of very soluble, non-toxic chemicals may be an exception—i.e. nanosized NaCl) have the potential to induce secondary genotoxicity, which requires the presence of immune cells, users may decide to use Tier 2 tests in addition to Tier 1 during innovation.

As stated above, it is important to assess all NFs using the same assay. During the first tier of the proposed testing strategy for the assessment of NF genotoxicity, we recommend the selection of two tests—one for detection of gene mutations (i.e. in vitro mammalian cell gene mutation test or in vitro mammalian mouse lymphoma TK gene mutation assay) and another for chromosomal damage (in vitro micronucleus test, in vitro mammalian chromosomal aberration test) (Fig. 3). Additionally, DNA damage indicator assays such as the comet assay or the Histone H2AX phosphorylation test can be used for detection of NF-induced DNA strand breaks, but these assays are not currently sufficient or accepted for regulatory purposes. Finally, the Ames test is not recommended for NFs due to the fact that bacteria have limited capacity to internalize NFs [37, 38], and because certain NFs have bactericidal properties. It is important that genotoxicity testing is conducted in parallel with cytotoxicity experiments to ensure that an appropriate dose range has been selected as high levels of cell death can confound interpretation of genotoxicity data.

Fig. 3
figure 3

Tier 1 of the strategy for the assessment of NF-induced genotoxicity based on utilisation of simple mono-culture systems and two genotoxicity tests to assess gene mutation and chromosomal damage

Once the tiered testing strategy has been used to assess genotoxicity of the candidate group members, the target and source NFs which produce a positive result in the selected Tier 1 assays (as compared to a well characterised benchmark control for the assay of choice) may be considered qualitatively similar and form a preliminary group. For any NFs which are negative in the Tier 1 assays the grouping hypothesis is rejected and the NF exits the IATA.

Quantitative methods can then be applied to assess similarity in order to identify whether the genotoxicity potential and potency is sufficiently similar to support grouping for regulatory read-across (the MoA will also be important for grouping with respect to genotoxicity). A range of quantitative similarity methods have been described previously [40, 41]. Quantitative methods can either employ pairwise comparisons of NFs in terms of their genotoxicity for each specific assay, or all data for all assays and all NFs can be compared in a multi-comparison analysis using machine learning. The pairwise methods have been shown to be sufficiently robust to use for regulatory grouping and decision making, while the multi-comparison models are less consistent, although they provide potentially useful mechanistic information useful for research purposes. To conduct a similarity assessment, complete high quality data sets [42] are required. A traffic light system to score data completeness and quality has been generated in the GRACIOUS project that can be applied to data uploaded to databases such as eNanoMapper [43]. Furthermore, a review of the existing literature investigating the genotoxicity of nanomaterials (NMs) using in vitro assays recommended in the GRACIOUS Tier 1 and 2 testing strategies has been provided as Additional file 1.

The data from Tier 1 will be very important for allowing a better understanding of the mode of action of genotoxicity and to inform on the selection of the most appropriate Tier 2 assays, if these are required. Crucially, the assays highlighted in Tier 1 do not provide information on secondary NF-mediated genotoxicity, hence the requirement to move to Tier 2 for a better understanding of secondary DNA damage induced by inflammation. Traditional in vitro DNA damage assessment for NFs has heavily relied on single cell mono-cultures. However, in recent years various alternative more complex multi-cellular methods have been developed for the assessment of certain toxicological endpoints that permit or replicate the interaction of different cell types observed in vivo. These multi-cellular models can also be utilised for genotoxicity assessment with the aim of allowing better comprehension of secondary DNA damage, something that has only been more feasible in vivo until recently [19, 24, 26]. The development of more complex test systems aim to bridge gaps between in vitro and in vivo NF genotoxicity data.

Once again, testing in Tier 2 is based on the utilisation of two assays, the in vitro micronucleus test, plus one of either the comet assay or the Histone H2AX phosphorylation assay. Significantly, these experiments involve the use of more complex physiologically relevant, multi-cellular test systems (Fig. 4). Where applicable and appropriate, advanced in vitro model protocols are based on SOPs developed in a second European Commission Horizon 2020 funded project, PATROLS (e.g. 44). An additional clear advantage of the use of the more complex Tier 2 models is that they allow for repeated exposure to NFs, which is not always possible with the utilization of the traditional 2D in vitro models. Tier 2 in vitro models, by their design (i.e. 3D spheroids), have longer viability and functionality which can range from days to weeks. Depending on the organ of interest, the route of exposure and material in question this could be highly advantageous. As an example, in the liver, with the exception of nanomedicines, uptake quantities into the body are so low that any potential for “real” NF-induced hazard to man is only likely following long-term repeated exposure. As mentioned above, it is also important to keep in mind that for Tier 2 test models, the selection of appropriate cell types is crucial. As an example, primary hepatocytes do not proliferate in vitro so they would not be suitable for mutagenicity testing; in contrast proliferating cell-lined based hepatocytes can be utilised for this purpose [44]. Finally, it stands to reason that the co-culture models should include immune cells that would be better predictors of NF-induced secondary genotoxicity in vivo.

Fig. 4
figure 4

Tier 2 of the strategy for the assessment of NF-induced genotoxicity based on utilisation of advanced co-culture in vitro systems

Progression from Tier 1 to Tier 2 can also be used to strengthen a read-across argument by providing additional mechanistic evidence of similarity in more physiologically-relevant models than the simple in vitro models included in Tier 1. The complex multi-cellular models employed at Tier 2 may be designed based on relevant adverse outcome pathways (AOPs), by, for example, focusing on the activation of specific key events (KEs) or probing key event relationships (KERs, through inclusion of inhibitors etc.) [12, 15]. Demonstration of high levels of similarity in terms of MoA can provide support for the waiving of in vivo toxicity assays, where similarity is based largely on apical toxicity endpoints which may result from divergent MoA. Evidence of in vivo genotoxic consequences to NF exposure may still be required by regulators to validate the predictivity of the in vitro methods employed, as Novel Approach Methods (NAMs) to assess genotoxicity are still under development and validation. However evidence of a high level of similarity between group members could support read-across of existing Tier 3 data from source group members to predict the genotoxic hazard for target group members. If such data is lacking, one member (potentially the worst-case example) can be selected for generation of Tier 3 data.

In summary, in this short communication we propose a simple three-tiered testing approach for grouping of NFs based on their genotoxicity which is in line with the GRACIOUS Framework. The testing strategy can be applied to gather evidence to determine whether different NFs are sufficiently similar with respect to their potential to induce genotoxicity, in order to be grouped. Additionally, in the second Tier of the testing a number of alternative more complex multi-cellular models are suggested as methods to allow for a better understanding of secondary NF-induced DNA damage, something that has only been more feasible in vivo until recently.