Approaches to Reducing Animal Use for Acute Toxicity Testing: Retrospective Analyses of Pesticide Data

. In this study, we considered whether acute oral toxicity hazard classi ﬁ cations for pesticide formulations and active ingredients (AIs) could be used to assign acute dermal toxicity hazard classi ﬁ cations using U.S. Environmental Protection Agency (EPA) and the United Nations Globally Harmonized System of Classi ﬁ cation and Labelling of Chemicals (GHS) hazard categories. This retrospective analysis used highly curated acute toxicity data for 503 formulations and 297 AIs. Hazard classi ﬁ cations based on rat oral LD 50 values were compared to hazard classi ﬁ cations based on rat dermal LD 50 values for the same substance. The concordance of oral and dermal hazard classi ﬁ cation was 62% for formulations and 64% for AIs using the EPA system and 71% for formulations and 55% for AIs using the GHS. Overprediction of dermal hazard was 38% for formulations and 32% for AIs using the EPA system and 28% for formulations and 41% for AIs using the GHS. Underprediction of dermal hazard was 1% for formulations and 3% for AIs using the EPA system and 1% for formulations and 3% for AIs using the GHS. While concordance overall was modest, the very low underprediction rates show that acute oral hazard categories are suf ﬁ ciently protective for acute dermal hazard classi ﬁ cation. Use of oral hazard data to also classify dermal hazard would obviate the need to perform acute dermal toxicity tests for classi ﬁ cation and labeling and thereby reduce the number of animals used for acute systemic toxicity testing of pesticides.


Introduction
Dermal exposure to chemicals can occur during routine handling of chemicals or during accidental spills. Dermal exposure can contribute considerably to the internal dose of users exposed to hazardous substances [4], and in particular is an important source of internal dose for occupational chemical exposures [1,15,26]. For some types of chemicals, such as pesticides, the dermal route can be the most important route of exposure [12]. Because of this, the industrial hygiene community develops specific notations for substances expected to present a toxic hazard via dermal absorption [5]. Regulatory agencies use data from acute oral and dermal toxicity tests to determine the potential systemic toxicity of chemicals and chemical products following oral ingestion and topical exposure to the skin, respectively. LD 50 values from such tests, representing the dose expected to produce lethality in 50% of the animals tested, are used to assign substances to oral and dermal hazard categories. The hazard categories are then used to assign product packaging labels to caution workers and consumers about poisoning potential. Figures 1 and 2 show two systems for classifying substances for acute toxicity hazard. Figure 1 summarizes the hazard classification system used by the U.S. Environmental Protection Agency (EPA). EPA requires hazard labeling to be applied to pesticides with dermal or oral LD 50 values less than or equal to 5000 mg/kg [8]. The EPA hazard classification system assigns a chemical to one of four oral and dermal hazard categories according to its relevant LD 50 value, with each category associated with specific signal words and hazard statements that must be used in labeling that chemical. Dermal hazard categories are associated with specific recommendations for personal protective equipment to mitigate skin exposures. Figure 2 provides the requirements for labeling according to the United Nations Globally Harmonized System of Classification and Labelling of Chemicals (GHS) [30]. The GHS was established with the goal of harmonizing rules and regulations for chemical handling and labeling at the national and international levels. The GHS has five hazard categories that are associated with specific signal words and hazard statements to be used on product labels for chemicals with LD 50 values less than or equal to 5000 mg/kg. However, Category 5, which provides for classification of chemicals having LD 50 values greater than 2000 mg/kg but less than or equal to 5000 mg/kg, is optional. When Category 5 is not used, the GHS hazard categories provide hazard notations for chemicals with LD 50 values less than or equal to 2000 mg/kg. The GHS has been implemented in the European Union by Regulation No. 1272/2008 on classification, labelling and packaging of substances and mixtures, although the European Union does not use the optional Category 5 [11]. Some U.S. regulatory agencies have harmonized their classification systems with GHS. The U.S. Occupational Safety and Health Administration uses the same GHS categories as the European Union, omitting use of the optional Category 5 [25] [30]. NR not required, NC not classified. LD 50 dose range is not to scale. The shaded category is optional and was not used for the analyses herein. Chart is adapted from Seidle et al. [27] Transportation uses a packing group system that is consistent with GHS Categories 1 through 3 to determine appropriate packaging and labeling of poisonous materials during transport [3].
Acute toxicity tests are the most commonly conducted product safety tests worldwide [13]. These tests can require large numbers of animals, and the animals used may experience significant pain and distress. Test methods for acute dermal systemic toxicity are described in test guidelines issued by EPA and the Organisation for Economic Co-operation and Development (OECD). The current EPA test guideline [6] and the previous OECD test guideline [17] recommend using a minimum of 20 animals for the main test. A recently revised OECD test guideline adopted in late 2017 uses a stepwise procedure that requires less than 10 animals per test [24]. Acute oral systemic toxicity test guidelines were updated years ago to minimize the number of animals used [7,19,20,22] and provide refinement by minimizing pain and distress [19]. Additionally, an OECD guidance document [23] provides information on using an in vitro test method to predict a starting dose, thereby further reducing animal use. Currently, acute oral tests can typically use five to nine animals for main tests and five to six animals for limit tests [18].
While significant reductions in animal use for acute systemic toxicity testing have been achieved, there is a great deal of interest, for both efficiency and ethical considerations, in further reducing the number of animals used for this purpose. If acute oral toxicity data were found to be sufficient to classify pesticides for both oral and dermal hazards, the acute dermal toxicity test would then be unnecessary, thereby reducing the number of animals used for acute systemic toxicity testing. This paper describes an evaluation to determine whether acute oral toxicity classifications for pesticide formulations and active ingredients (AIs) can be used in lieu of acute dermal toxicity data for dermal hazard classification and labeling.

Materials and Methods
We collected acute oral and dermal LD 50 data for 503 pesticide formulations and 297 AIs. To eliminate the uncertainty associated with comparing results across species, we only used LD 50 data from acute oral and dermal tests that used rats. We obtained these data from the following sources: • EPA Data Evaluation Reports • EPA Reregistration Eligibility Decision documents • Study reports submitted to fulfill EPA regulatory requirements (provided by EPA) • Two peer-reviewed publications on acute toxicity testing of chemicals [2,16] • Two publicly available online toxicity databases: a. Hazardous Substances Data Bank [31] b. European Chemicals Agency database [10] Data Quality Evaluation Data were evaluated for reliability using Klimisch categories [14]. Only LD 50 data with a reliability score of 1 (reliable without restriction) or 2 (reliable with restrictions) were used for our analyses. The exception was data from Creton et al. [2]; this reference indicated that the data included were reliable, but did not specify the methods used to determine reliability.

Categorization of Data
We assigned oral and dermal hazard classifications to the 503 pesticide formulations and 297 AIs according to the EPA and GHS classification systems using the respective oral and dermal LD 50 values. We adopted the implementation of GHS that does not use the optional Category 5. Thus, substances with LD 50 values greater than 2000 mg/kg were unclassified. If more than one LD 50 value was available for a substance (for example, if LD 50 values were reported for male rats, female rats, and both sexes combined), we used the lowest (i.e., most toxic) LD 50 value to categorize the substance. When an LD 50 was listed as greater than a specific value (e.g., greater than 2000 mg/kg), it was assigned a value just above the specific value (i.e., 2001 mg/kg for the preceding example) to assign the hazard category.

Analyses
We calculated concordance, underprediction, and overprediction rates for the classification of dermal hazard on the basis of the oral hazard classification. In these calculations, we excluded 27 formulations and 64 AIs from EPA classification because of the uncertainty of their dermal hazard classifications. These substances were characterized on the basis of limit tests as having oral LD 50 values greater than 5000 mg/kg and dermal LD 50 values either greater than 2000 mg/kg or greater than 4000 mg/kg. Although the oral hazard classifications for these substances were unequivocal, their dermal classifications could be either EPA dermal hazard Category III (LD 50 greater than 2000 mg/kg but less than or equal to 5000 mg/kg) or Category IV (LD 50 greater than 5000 mg/kg). Thus, an accurate comparison of oral and dermal hazard was not possible because the highest doses tested for the two routes were not the same. The resulting subsets of 476 formulations and 233 AIs were used for the EPA classification analyses. As noted previously, the GHS hazard classifications were determined with four hazard categories and a "not classified" category (i.e., LD 50 greater than 2000 mg/kg). No substances were excluded from the GHS classification analyses because a substance with a dermal LD 50 greater than 2000 mg/kg is not classified, even if the LD 50 is actually greater than 5000 mg/kg. Thus, the full data sets of 503 formulations and 297 AIs were used for the GHS analyses. Figure 3 shows that, according to both the EPA and GHS hazard classifications, formulations and AIs in this data set were much more likely to be toxic via the oral route than via the dermal route, and formulations were less toxic than AIs.

EPA Hazard Categories
The majority of the formulations in our data set (70% [331/476]) were classified by the EPA system as Category IV for dermal hazard (dermal LD 50 greater than 5000 mg/kg). However, only 36% (173/476) of these substances were classified by the EPA system as Category IV for oral hazard (Fig. 3a). Similarly, 23% (54/233) of the AIs were classified by the EPA system as Category IV for dermal hazard, but only 12% (29/233) of these substances were classified as Category IV for oral hazard (Fig. 3b).
The higher toxicity of the AIs is indicated by the lower proportion of AIs in the lowest toxicity categories, compared to the formulations. A higher proportion of formulations were classified by the EPA system as Category Figs. 3a and b).

GHS Hazard Categories
Nearly all of the formulations in our data set (98% [494/503]) were not classified by the GHS for dermal hazard (dermal LD 50 greater than 2000 mg/kg) (Fig. 3c). As for the EPA system, a lower proportion of these substances (71% [355/503]) was not classified by GHS for oral hazard. Similarly, 86% (255/297) of the AIs were not classified by GHS for dermal hazard, but only 51% (151/297) of these substances were not classified for oral hazard (Fig. 3d).
As with the EPA system, the AIs as a group were classified by GHS as more toxic than the formulations. A higher proportion of formulations were not classified for oral hazard by GHS compared to the AIs (71% vs. 51%), and similarly, more formulations than AIs were not classified by GHS for dermal hazard (98% vs. 86%) (compare Figs. 3c and d).  (Fig. 4a) and 64% (150/233) of the AIs (Fig. 4b). The EPA oral hazard classifications overpredicted dermal hazard for 38% (179/476) of the formulations and 32% (75/233) of the AIs. The EPA oral hazard classifications underpredicted dermal hazard for 1% (4/476) of the formulations and 3% (8/233) of the AIs.

Discussion
Regulatory acceptance of any new alternative test method or approach requires that the new method or approach provide equivalent or better protection of human health [21,28]. We conducted the current analysis of the use of acute oral hazard classification to classify substances for dermal hazard primarily to determine if there are instances where the acute dermal hazard is greater than the acute oral hazard. When the acute dermal toxicity hazard is greater than the acute oral toxicity hazard, the dermal hazard would be underpredicted by the oral hazard classification if that were used for dermal hazard classification. However, if such cases are rare or nonexistent, the use of oral hazard classification to predict dermal hazard would be at least as protective as using dermal LD 50 data for this purpose because all of the predictions would either be concordant or would overpredict dermal hazard. Therefore, oral hazard categories could be used to determine dermal hazard classification without compromising public health, and product labeling could be based on the oral hazard without the need for an acute dermal systemic toxicity test.
Our analysis, which focused on the EPA hazard classification system and primarily included substances of interest to the EPA, found that using oral hazard classifications to predict dermal hazard classifications resulted in very few substances being underpredicted for dermal hazard. This is consistent with similar analyses conducted by other investigators [2,16,27]. Our reanalyses of these investigators' data, to the extent such data were available, provide further support for this conclusion.
Using a data set of 240 pesticide AIs, Creton et al. [2] showed that oral hazard underpredicted dermal hazard for only 0.8% (2/240) of the substances using the obsolete United Kingdom (UK) Chemicals (Hazard Information and Packaging for Supply) Regulations system [29]. The UK system was a four-category hazard classification system. Like GHS without the optional Category 5, the UK system did not classify substances with LD 50 values greater than 2000 mg/kg as hazards, but the UK LD 50 ranges for the other hazard categories were different from those used by the GHS. Because Creton et al. provided the dermal and oral LD 50 data for the AIs, we were able to analyze them using the GHS and the EPA system. For our analysis of these data using the EPA system, we excluded substances with oral LD 50 values greater than 5000 mg/kg and dermal LD 50 values greater than 2000 mg/kg, as we did for our analysis. The underprediction rates for both EPA and GHS categories were very similar to those of our data set, which includes the Creton et al. data (Table 1).
In an analysis of a different set of 337 pesticide AIs using the same four GHS hazard categories we used, Seidle et al. [27] obtained concordance and underprediction rates similar to our AI analysis. Because this paper did not include LD 50 data, we were not able to reanalyze their data to determine the concordance, underprediction, and overprediction rates for the classification of dermal hazard on the basis of the oral hazard classification using the EPA system.
Moore et al. [16] analyzed a broader data set of 335 substances, which included 110 pesticide AIs from Creton et al. [2] and 225 uncharacterized substances from the European Chemicals Agency database. Table 1 shows that concordance achieved by Moore et al. using GHS was much lower than ours and those of the other studies, and that their underprediction rates were higher. Our dataset consisted of only pesticide formulations and AIs. The higher underprediction and overprediction rates and lower concordance for the Moore et al. analysis could be due to inclusion of the uncharacterized substances, which comprised approximately 67% of their database. Again, because this paper did not include LD 50 data, we were not able to reanalyze this data set to determine concordance using the EPA classification system.
While underprediction of dermal hazard would pose a clear danger to users, overprediction of dermal hazard is also undesirable, as overuse of stringent hazard warnings has a desensitizing effect, ultimately causing users to disregard them. The categorization approaches for our data sets showed that dermal hazard might be overpredicted by the oral hazard classification for 38% (179/476) of formulations using the EPA system and 28% (142/503) of formulations using the GHS (Fig. 4a, c). Similarly, rates for overprediction of dermal hazard for AIs were 32% (75/233) with the EPA system and 41% (123/297) with GHS (Fig. 4b, d).
The consequences of underprediction are more severe. Underprediction of dermal hazard could lead to warning labels and protective equipment recommendations inadequate to protect exposed persons, resulting in increased public health risk. Underprediction of dermal hazard would also affect the caution and care with which users handle consumer products. However, because dermal hazard classifications for only a small proportion of formulations (1%) and AIs (3%) in our dataset were underpredicted by using oral hazard data, our analyses suggest that acute oral hazard would provide appropriate recommendations for personal protective equipment for all but a small number of substances. If acute oral hazard were used to predict acute dermal hazard, animal testing for acute dermal toxicity would be necessary for few substances.
Our analyses, along with the others discussed here, indicate that it may be feasible to greatly reduce the use of animals for acute dermal toxicity testing of pesticide formulations and AIs. Based on the EPA acute dermal toxicity test guideline [6], waiving the dermal acute toxicity test and using oral hazard classification to assign dermal hazard classification would reduce the number of animals by 10 animals per pesticide for a limit test and 20 animals per pesticide for a main test.
EPA has used an analysis similar to ours to support guidance for waiving all acute dermal LD 50 studies for pesticide formulations when acute oral LD 50 studies are available [9]. Because EPA receives hundreds of acute dermal submissions for formulations each year, this development has the potential to reduce animal use significantly for acute toxicity testing. Although EPA has not waived the dermal test requirement for AIs, the waiver for formulations has a much larger impact on animal savings because the vast majority of new data submissions support registrations for formulations rather than AIs.
Future efforts to further reduce animal use for acute toxicity testing of pesticide formulations and AIs should be directed towards developing approaches to identify the small number of substances that might be underpredicted by acute oral toxicity testing before dermal tests are performed. In silico investigations of route-specific bioavailability could assist in identifying those substances. For substances that are likely to be more toxic dermally than orally and must be tested using the acute dermal toxicity test, the OECD test guideline has recently been revised to use a stepwise procedure that requires fewer than 10 animals [24].

Disclaimer
This article may be the work product of an employee or group of employees of NIEHS, NIH, or other organizations. However, the statements, opinions or conclusions contained therein do not necessarily represent the statements, opinions, or conclusions of NIEHS, NIH, U.S. government, or other organizations. ILS staff do not represent NIEHS, the National Toxicology Program, or the official positions of any federal agency. EPA underprediction rate 3% (8/233) 4% (7/182) NA NA NA data not available "Current analysis" refers to the analysis described in this paper. Concordance and underprediction rates for Creton et al. were calculated by us from data provided in their paper. Concordance and underprediction rates for the other papers were as reported; we were not able to calculate EPA concordance and underprediction rates for these data sets because LD 50 data were not provided in the papers