FormalPara Key Points

There is a need for a uniform approach to causality assessment in DILI, which is expected to include novel methodology and/or an update of existing tools and to enhance reliability, reproducibility, and completeness of DILI assessment in clinical trials.

When establishing an adjudication committee, a minimum of three experienced experts in the field of clinical hepatology/DILI should be used to ensure clarity of decision making in the assessment.

Rechallenge and liver biopsy should be considered when the benefit for the patient of continued treatment with the suspect drug exceeds risk.

The establishment of a minimum data set for causality assessment in clinical trials and in the post-marketing environment is recommended to assist with the assessment and diagnosis of DILI.

1 Introduction

The IQ drug-induced liver injury (DILI) Initiative was launched in June 2016 within the International Consortium for Innovation and Quality in Pharmaceutical Development (also known as the IQ Consortium) to reach consensus and propose best practices on issues surrounding DILI. The IQ Consortium is a leading science-focused, not-for-profit organization addressing scientific and technical aspects of drug development and comprises over 30 pharmaceutical and biotechnology companies. The IQ-DILI Initiative is an affiliate of the IQ Consortium, comprised of 17 IQ member companies, focused on establishing best practices for monitoring, diagnosing, managing, and preventing DILI. This review paper is based on an extensive literature review and cross-pharmaceutical industry survey data; the consensus and findings are achieved through carefully structured discussions between IQ DILI members as well as academic and regulatory experts.

Causality assessment for suspected DILI is a major challenge during drug development and following approval. Given the intricacies of determining causality from DILI due to a suspected drug, industry, regulators, and academia continue to explore this complex topic. The diagnosis of DILI is challenging given the multitude of clinical variables that must be considered, the temporal relationship of the injury to the administration of the suspect drug, the clinical course of abnormal liver tests, the presence, absence, or potential interactions of concomitant medications, knowledge of the suspect drug’s propensity to cause hepatotoxicity and, perhaps most importantly, the elimination of alternate potential causes of liver injury [1, 2]. Seeking an expert opinion to diagnose DILI is the ‘gold-standard’ approach [3] used by pharmaceutical companies and regulatory agencies to both identify or confirm DILI cases and assess a drug’s hepatotoxic potential, especially during clinical drug development. However, lack of defined criteria leading to subjectivity in drawing definitive conclusions, and occasional lack of expert consensus have been viewed as limitations [4]. Additional challenges in both the clinical trial and post-marketing setting include missing data and collection of an appropriate minimal data set to enable adequate assessment of causality for DILI. Over the years, other attempts at standardized approaches and tools (e.g., RUCAM; Maria and Victorino method) have been met with limited success and uptake and have been criticized, especially regarding their utility in the setting of clinical trials. A structured approach to causality assessment on an individual level is warranted in cases of suspected DILI.

The goal of this IQ DILI manuscript is to provide a review of existing industry best practices and recommendations for causality assessment in the setting of potential DILI. In collaboration with academic and government subject matter experts (SMEs) and the IQ DILI Causality Working Group (CWG), this paper was developed to address the following key objectives:

  • Understand and describe the current state of DILI causality assessment.

  • Evaluate the utility of new tools/methods/practice guidelines.

  • Recommend a proposal for a minimal data set needed to assess causality.

  • Define best practices for causality assessment as they relate to DILI.

  • Promote a more structured and universal approach to DILI causality assessment.

2 Strategy

The IQ DILI CWG applied several strategies to gain better insight into current practice of causality assessment for DILI, including a review of the literature (ROL) and administration of a survey (hereafter called ‘the survey’). The data and information generated from ROL and the survey passed through a number of structured discussions during dedicated face-to-face (F2F) meetings with subject matter experts (SME) with academic and regulatory backgrounds. The ROL, survey, and interaction with SMEs assisted the CWG with development of recommendations and guidance to industry, academia, and health authorities.

The survey was distributed to the IQ DILI member pharmaceutical companies and results were reviewed by the CWG. The survey provided information on up-to-date practices and trends of causality assessment in pharmaceutical companies. Advice and insights from leading SMEs amongst the CWG and external SMEs played an important role in gathering feedback.

2.1 Review of Literature (ROL)

An initial search was performed in PubMed and Google Scholar on “drug safety causal assessment for liver injury.” The search string was entered without quotes, so terms were search combinatorically. Additional ad-hoc searches were conducted pertaining to related keywords (i.e., hepatoxicity, drug toxicity, hepatic adverse events), principal investigators, and references cited within identified articles. Titles and abstracts were reviewed by the IQ DILI Secretariat to identify articles that focused on the key topics of describing or evaluating methods for DILI causality assessment. Forty-three manuscripts were selected for further review by the CWG, which determined whether they were relevant to the goals of this review and, therefore, retained for further review. Thirty-two articles were recommended for additional review. Individual working group members reviewed assigned articles in depth and provided high-level written summaries to the team.

2.2 Survey

A blinded survey was developed by the CWG with the specific intent to collect information not available through a review of the scientific literature and regulatory guidance. The IQ CWG created a 28-question data collection instrument (i.e., survey) to gain a better understanding of assessment of causality for DILI in industry and regulatory sectors [Electronic Supplementary Materials #1 (ESM# 1) Survey Causality Assessment]. The survey was sent to 14 IQ DILI pharmaceutical industry member companies. The survey, although not validated, collected information on membership, meeting frequency, and data needs of groups charged with both adjudicating individual cases of potential DILI and evaluating entire programs for DILI.

Responding companies selected a champion to provide the responses to the survey from within their organization at the recommendation of an IQ board member and/or senior member of the company based on their leadership in evaluating DILI and knowledge of pharmacovigilance activities.

The survey focused on hepatic adjudication committees (HAC) and current practices for the assessment of causality for DILI and was designed to allow a mixture of quantitative and qualitative insights, primarily relying on multiple choice questions, though use of respondent weighting and Likert scales allowed some depth to responses. Some of the questions were mutually exclusive, so that if respondents answered a question one way, it precluded them from answering the following question. Some of the questions also allowed respondents to check more than one answer, while others asked for brief explanations in text (see instrument in ESM#1). Information drawn from multiple pharmaceutical companies was used to evaluate the different practices and to reach consensus on best practices that could potentially become the industry standard. The survey explored several aspects of HAC practices, including scope, selection of members including external experts and training, meeting frequency, data collection tools, review strategy, and internal review of unblinded data.

Pharmaceutical companies’ respondents were also asked to provide a charter and/or DILI case report form (CRF), if available, for their HAC reflecting any current approach.

2.2.1 Survey Data Consolidation and Analysis

The IQ administrative office blinded all the survey results and reviewed them carefully for antitrust compliance before sharing the blinded results with the working group. The survey was structured to obtain descriptive data in response to the questions. Therefore, the data were analyzed using descriptive statistics. To describe the results in a constructive manner, the data were categorized based on the area of interest and practices of causality assessments.

The survey was completed by 13 respondents (77% of which identified themselves as large-size companies [> 10,000 employees]) of which 70% had five or more new drug applications (NDAs) yearly. One company provided separate responses for two divisions of the organization. Sixty-nine percent of the survey respondents were from clinical R&D with the majority from medical safety, pharmacovigilance/post-marketing. Seventy-seven percent were physicians (not specified as hepatologists), and 23% were hepatologists. Each survey was completed by different group of individuals at the respective pharmaceutical company. It was up to company who they chose to complete the survey.

3 Current Practices/Tools Administered for Dili Assessment

3.1 Literature Review

Causality assessment, especially of severe liver injuries, is a critical determinant in informing hepatotoxicity risk in both clinical development and post-marketing experience. Diagnosis of DILI remains challenging, since it is often a diagnosis of exclusion. The 2009 US Food and Drug Administration (FDA) guideline on pre-marketing risk assessment of DILI in clinical trials [5] provides guidance; however, it does not provide necessary details or ‘standards’ for how causality assessment should be conducted beyond the determination of Hy’s Law.

There is strong agreement that clinical and lab analyses to assess acute liver injury should be consistent with guidance from major regulators such as the US FDA guidance on pre-marketing risk assessment of DILI in clinical trials [6]. This risk assessment conceptually includes causality assessment to rule out other causes of laboratory changes; this is inherently qualitative and may be performed by drug development teams or external (to a company) experts. Quantitative methods and expert opinion, when used together, provide a sound start in determining whether a causal relationship may exist between a medication and liver injury. However, these commonly used approaches have shortcomings. Additionally, the clinical trial and post-marketing settings (where there is no regulatory guidance) present distinct challenges and therefore one approach may not fit both scenarios.

3.1.1 Current Practice Tools

Expert opinion, while having some pitfalls, remains the standard for assessing causality for DILI in clinical development. Specialists in the field of hepatology with specific DILI expertise provide their opinions after reviewing and evaluating available data.

The Drug Induced Liver Injury Network (DILIN), internal industry adjudication committees and HACs are groups that use a standardized expert opinion approach. The DILIN, established by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), focuses only on marketed, complementary and alternative medicine (CAM)-induced liver injury. This group has developed standardized and rigorous processes to identify and characterize DILI cases [3, 4, 7]. While a detailed discussion is beyond the scope of this manuscript, briefly, the DILIN approach to causality assessment uses a structured consensus expert opinion based on the DILIN scoring system as noted in ESM#2 [3, 4, 8].

When evaluating large sets of data such as clinical trial data, the eDISH (Evaluation of Drug-Induced Serious Hepatotoxicity) tool, can be used to provide a graphical representation of alanine aminotransferase (ALT) and total bilirubin (TBL) upper limits of normal (ULNs) on a log scale [9]. eDISH can be used to help reviewers identify potential cases meeting established laboratory criteria for suspected DILI. Assessment of individual patient’s causality requires review of both lab data and the clinical details of the case.

The Council for International Organizations of Medical Sciences (CIOMS) qualitative scale, also called Roussel Uclaf Causality Assessment Method (RUCAM), was developed in 1989 to assist with causality assessment of individual cases. It is based on a quantitative scoring system and was validated against cases with positive rechallenge [10]. Calculation of R value (i.e., the observed ALT value over ALT ULN, divided by the alkaline phosphatase (ALP) observed value over ALP ULN) is an initial step in assisting with type of liver injury in the RUCAM scale. One major limitation of this calculation is the availability and timing of lab samples. The RUCAM scale has other limitations, including poor inter- and intra-observer reproducibility and was mainly developed to assess cases originating from post-marketing sources [11]. Updates to RUCAM published in 2016 proposed and added additional questions to reduce inter-observer and intra-observer variability and to address the role of herbal therapies [12]. However, this updated RUCAM is still pending validation as it was not intended for clinical trial causality assessment. Companies have adopted RUCAM as there is a lack of another standard. When comparing the expert consensus opinion in the DILIN network to RUCAM there was discordance 31% of the time [13]. The utility of causality assessment scales is complex and depends on the weight assigned to the criteria included in the specific scale. The utility may be reduced if incorrect weights have been allocated. For both methods, the RUCAM and Expert Opinion approach, the main challenges are to differentiate the causality categories “unlikely versus possible” and “possible versus probable” [3]. The review of these challenges is not in the scope for this manuscript.

Benefits and barriers of current approaches are outlined in Table 1 below.

Table 1 Summary of current causality methods for assessing cases of DILI

3.2 Survey Results

Results from the survey when evaluating suspected/confirmed DILI showed that:

  • All companies used expert opinion for causality assessment;

  • Eight companies (66% of respondents) used R-value or similar objective measures to categorize the type of suspected DILI;

  • Inconsistency in the approach to causality assessment appeared to be common, with ten companies (77% of respondents) indicating that they did not use a pre-defined liver-specific causality assessment algorithm or a published instrument to identify DILI cases;

  • Most respondents were satisfied using the RUCAM and DILIN assessment methods;

  • When assessing possible DILI cases, 11 companies (85% of respondents) reported internal pharmacovigilance staff were involved or led the evaluation of suspected DILI cases.

3.3 Recommendations

  1. 1.

    There is a need for a uniform approach, novel tool and/or update of existing tools to address reliability, reproducibility, and completeness of DILI assessment in both clinical trial and post-marketing setting.

  2. 2.

    Expert hepatologist opinion is currently the best approach to assess a drug’s causal contribution to liver injury and should continue to be the primary approach in causality assessment in clinical development.

  3. 3.

    The use of an adjudication committee/expert opinion methodology such as the DILIN expert opinion process is currently believed to be the most effective approach to DILI assessment, and therefore should currently be the primary method used for causality assessment.

  4. 4.

    A need exists for continued collaboration among stakeholders to share findings and ‘brainstorm’ best approaches to DILI causality assessment such as minimum data elements, biomarkers, rechallenge, liver biopsy, and data mining.

  5. 5.

    Clinical trials adjudication should be performed by blinded and independent experts. The adjudications should be performed on an ongoing basis and the unblinded reports sent to a safety committee for review.

4 Challenges Observed in DILI Assessment

4.1 Literature

Many challenges exist in the assessment of DILI, including the varying clinical presentation of DILI and lack of tools to establish a definitive diagnosis. Unlike many other medical conditions, no single test or biochemical signal exists to establish a definitive diagnosis. This diagnostic dilemma is heightened by the fact that DILI can mimic virtually all known forms of non–drug-induced acute and chronic liver disease [11, 20]. DILI has no specific diagnostic clinical presentation, laboratory test or biomarker, or histological pattern. Signs and symptoms of DILI vary with the pattern and severity of injury, which vary with the drug and the individual patient. Similarly, DILI may present a wide variety of histologic patterns depending upon the drug and the host. Assigning causality to a drug is a meticulous process that requires carefully linking administration of a drug to onset of disease on the one hand and excluding competing causes of liver diseases on the other [21]. Drug rechallenge often can provide definitive answers in the assessment of causality; however, concerns associated with the risk of recurrence of severe DILI that may result in death or require liver transplantation prevents investigators from considering a rechallenge of the suspect drug [22].

Key challenges that physicians and investigators face to diagnose DILI include the following:

  • DILI lacks broadly useful or widely accepted objective diagnostic tests (biomarkers) and thus the diagnosis depends largely on clinical acumen;

  • DILI may resemble essentially any acute or chronic liver disease;

  • DILI is rare, which limits systematic clinical experience;

  • There is no gold standard for the verification of DILI;

  • The diagnosis is heavily dependent on exclusion of other causes of liver injury;

  • Polypharmacy and the presence of comorbidities or intercurrent disease (esp., underlying chronic liver disease) impede or complicate the diagnosis of DILI;

  • Lack of a systematic approach for data collection, analysis, and specific clinical presentation of DILI cases further limits continuous learning.

4.2 Survey Results

The results of the survey revealed that different laboratory criteria are used amongst pharmaceutical companies. While 85% of companies used a combination of ALT or aspartate aminotransferase (AST) > 3 × ULN and TBL > 2 × ULN with/without ALP > 2× ULN, 46% used ‘other’ criteria or approaches. These include:

  • Symptoms of hepatic injury plus biochemistry results;

  • Use of PT/INR (prothrombin time and international normalized ratio) plus ascites and/or encephalopathy;

  • Assessment was dependent on the protocol, product, population, and phase of study or life cycle;

  • No official company position on the approach to assessment;

  • No cross-therapeutic standard.

The variety and range of laboratory criteria has created a remarkable challenge in DILI causality. For example, only approximately half of respondents use a CRF to collect information pertaining to DILI. Respondents indicated frequently used assessment tools include the DILIN framework, RUCAM, and expert opinion.

4.3 Recommendations

  1. 1.

    There is a need to review existing approaches and definitions of DILI across industry, academia, and regulators and to develop comprehensive and better predictive methods for the identification and diagnosis of DILI.

  2. 2.

    A standardized, minimum dataset for collection of data within clinical trials may assist with subsequent assessment of causality and diagnosis of DILI.

  3. 3.

    When considering the complexity associated with the assessment and diagnosis of DILI, a case-by-case evaluation is recommended.

5 Rechallenge and Liver Biopsy in Clinical Development

5.1 Rechallenge

Drug rechallenge (re-administration of a drug suspected to have caused DILI) in the causality assessment of DILI remains controversial. The decision to rechallenge should consider the potential risk of fulminant liver failure, and the totality of risks and potential benefits should be carefully evaluated. The FDA guidance on DILI, as well as others, have suggested that rechallenge should be avoided, especially in subjects that have demonstrated significant aminotransferase elevations (> 5 × ULN) [5, 22]. Rechallenge is associated in some cases with serious adverse events including death, liver transplantation, hospitalization, and jaundice, particularly in DILI patients with hypersensitivity features [23].

Despite the known risks of rechallenge, a positive rechallenge of a drug suspected of inducing DILI provides convincing evidence for the causality of the implicated drug in liver injury and is weighted heavily in causality assessment tools such as RUCAM [10, 24]. Recent clinical trials in oncology have allowed for the re-administration of certain drugs (e.g., pazopanib) after suspected DILI in controlled clinical trial settings with frequent liver safety monitoring [25]. Re-administration in these cases may allow for a clear causality assessment of the implicated drug and provide a framework to better understand the role of liver adaptation as opposed to recurrence after an initial DILI event. This may be especially important during the early stages of clinical development when the overall liver safety profile of a potential new therapeutic is unknown. Rechallenge may therefore be appropriate during clinical development when the patient’s benefit for continued treatment with the suspect drug exceeds risk and where there are no alternative treatment options. Rechallenge could also inform the overall liver safety profile of the drug when evaluated in a controlled manner in a clinical trial setting with appropriate safety monitoring [26, 27]. However, as per FDA guidance on DILI, a negative rechallenge may not necessarily indicate that the reaction was not caused by the drug, but instead a result of liver adaptation to the drug [5]. The characteristics of the drug, the type of drug reaction, and the classification of DILI should also guide the consideration of rechallenge.

The severity (i.e., jaundice, symptoms of hepatitis) of the suspected DILI is typically important with regards to decisions concerning rechallenge, but not in all cases [28, 29]. In addition, patient characteristics such as advanced age, female sex, alcohol use, substance abuse, HLA markers (e.g., HLA-B*57:01, HLD DQB1*0201) if known and applicable [30], and concomitant medications [31] should be well thought out when considering rechallenge, but may not be applicable in all cases. Finally, the lack of alternative therapies in a patient displaying clear benefit should be considered in terms of the risks associated with discontinuation of the study medication and the ensuing change to the benefit/risk profile for that particular patient or patient group. Irrespective of the rationale for rechallenge, re-administration of the implicated drug must be considered carefully and the benefit/risk and treatment alternatives for the patient clearly communicated, in addition to close monitoring.

A positive drug rechallenge is defined as an ALT level of ≥ 3–5 × ULN after re-administration of the suspect drug, in a patient with normal baseline ALT. Typically, this occurs more rapidly than the initial episode of DILI [25, 28, 29]. However, this definition does not account for individuals with pre-existing liver disease who have baseline elevated ALT levels. Due to the obesity epidemic, more patients are entering clinical trials with elevated ALT levels due to associated metabolic conditions and nonalcoholic fatty liver disease [32]. As such, multiples of the patient’s baseline ALT level instead of multiples of ULN may better represent a positive rechallenge in this situation, although this needs to be further evaluated and the cut-offs for positive rechallenge defined.

Since the decision to rechallenge must be made carefully based on numerous factors, the dose and dosing regimen for rechallenge, the optimal frequency of follow-up monitoring, and criteria for drug discontinuation, for example, cannot be universally applied across all studies. Rechallenge dose and dosing regimen may need to be adjusted according to factors such as phase of drug development, established liver safety of the drug, as well as emerging data. Pazopanib, a tyrosine kinase inhibitor effective for both renal cell carcinoma and soft tissue sarcoma, was associated with ALT elevations of > 8 × ULN in 5% of patients in clinical trials [25]. Rechallenge was attempted in select patients using the following criteria: (1) there was clear clinical benefit from pazopanib, (2) there was a positive dechallenge evidenced by a reduction in ALT to 2.5 × ULN, and (3) there was an absence of a hypersensitivity reaction. In this example, most patients were rechallenged with a reduced dose and were monitored weekly for 2 months with liver biochemical tests [33]. While this monitoring interval for rechallenge may not be applicable to all drugs in development, it appears to be reasonable and thus is recommended by the IQ DILI consortium. Of course, the mechanism of action of the drug, pattern of known liver injury if applicable, clinical severity, timing of onset, as well as other factors, all need to be taken into consideration when determining monitoring interval. It should be remembered that if DILI occurs upon rechallenge, patients with chronic liver disease may have an increased incidence of morbidity and mortality compared with those with healthy liver and normal baseline liver biochemical tests [34,35,36,37]. Finally, a recent study from DILIN found that non-liver-related comorbidities adversely impact the likelihood of survival within 6 months of the occurrence of a suspected DILI [38]. Thus, while studies need to confirm this finding in patients with suspected DILI being rechallenged, it seems prudent to consider comorbidities when evaluating the risk/benefit decision to rechallenge. Further data to support the appropriate interval of monitoring is needed from clinical trials, from clinicians reporting results of rechallenges to Health Authorities (i.e., FDA MedWatch, EudraVigilance, etc. [39]), and in prospective national and international DILI registries.

In the survey administered to industry participants on the topic of rechallenge, respondents indicated the following:

  • Seven companies (58%) would allow rechallenge, five companies (42%) would not;

  • Twelve companies (92%) indicated that rechallenge data or liver biopsy/histology data were typically missing.

5.2 Recommendations

  1. 1.

    Rechallenge should be considered when the patient’s benefit for continued treatment with the suspect drug exceeds risk.

  2. 2.

    The benefit/risk of the rechallenge must be communicated to the patient, efforts should be made to ensure that this is clearly understood by the patient, and re-consent of the patient should be obtained.

  3. 3.

    The severity of the suspected DILI, including risk factors for potentially fatal drug rechallenge outcome, patient and drug characteristics, dose and dosing regimen for rechallenge, the optimal frequency of follow-up monitoring, and criteria for drug discontinuation, should all be considered on an individual basis and not universally applied to all drug development programs.

  4. 4.

    Data known about the severity, timing of onset, and pattern of injury should inform the monitoring duration and frequency. However, while the optimal frequency of follow-up monitoring with liver tests has not been established, monitoring weekly for 2 months should be considered.

  5. 5.

    In patients with pre-existing liver disease, multiples of the patient’s baseline ALT level instead of multiples of ULN may better represent a positive rechallenge, although this needs to be further evaluated and validated.

5.3 Liver Biopsy as Part of Causality Assessment: To Biopsy or Not To Biopsy?—That is the Question

Results from the IQ DILI survey revealed that 46.2% (6) of companies may utilize a liver biopsy for follow-up evaluation for patients with suspected DILI, with the caveat that it is dependent upon the protocol. However, in this same survey, 92.3% (12) of companies stated that liver biopsy/histology data was typically missing for DILI assessment. The importance of obtaining histology in the evaluation of a suspected DILI was ranked 6.4 on a scale of 1–10 (10 being most important). Responses were varied with companies ranking the value of obtaining a biopsy from as low as 2 to as high as 8. Finally, 36.4% (4) of companies stated that they utilize histology results as part of their rechallenge decision-making process.

Reasons for recommending liver biopsy in suspected DILI may include (1) identification of alternative diagnosis, (2) identification or confirmation of previously undiagnosed chronic liver disease (e.g., cirrhosis or nonalcoholic steatohepatitis [NASH]), (3) confirmation of autoimmune features that might respond to steroid treatment, (4) assessment of tissue damage that may estimate prognosis, and (5) identification of features that support the diagnosis of DILI.

The issue of whether to obtain a liver biopsy to aid in the diagnosis of a suspected DILI case is a subject of ongoing debate. A liver biopsy is an invasive procedure that carries inherent risks (e.g., pain, bleeding, and gallbladder or bile duct perforation) [40]. This must be weighed against the potential useful information to be obtained from evaluation of liver tissue. Typically, histologic results are nonspecific, and have little impact in establishing the diagnosis of DILI or in changing the clinical assessment [41, 42]. In fact, liver biopsy results rarely alter clinical assessment. In a study published by DILIN of 249 suspected DILI cases, the five most common histologic patterns of injury (acute and chronic hepatitis, acute and chronic cholestasis, and cholestatic hepatitis), observed in liver biopsy samples obtained from cases subsequently confirmed to have DILI, did not have any distinguishing characteristics from cases in which non-DILI diagnoses were felt to be more likely [42]. Also, if liver-related blood tests return to baseline or near baseline after the study drug is discontinued, suggesting a recovery of liver injury, there is no clinical indication for a liver biopsy.

On the other hand, histologic evaluation of liver tissue is the only way to characterize the pattern, severity and distribution of hepatic injury. This information may be useful in supporting or refuting the diagnosis of DILI as the etiology of liver-related blood test abnormalities. Histologic findings may also predict outcome. In a meta-analysis of 570 case reports of DILI, it was found that patients who had histologic eosinophilic infiltrates were statistically less likely to have a fatal outcome compared with patients without these independent characteristics [41]. An analysis of 461 liver biopsy samples from the Spanish DILI database revealed that patients with hepatocellular necrosis had a higher incidence of death than those with cholestatic or mixed cholestatic/hepatocellular damage on biopsy [43]. Histologic findings from the DILIN experience that correlated with a good outcome included granulomas and eosinophils, but multiacinar or bridging necrosis and ductular reaction was associated with poor outcome [42].

Histologically, DILI can mimic virtually any type of liver disease. Therefore, biopsy results must be used in combination with all other factors to assess causality. While there are no histologic findings considered to be diagnostic of DILI, there are many histologic characteristics that are suggestive of DILI. These include but are not limited to microvesicular steatosis, demarcated perivenular necrosis, minimal hepatitis with canalicular cholestasis, poorly developed portal inflammatory reaction, eosinophil infiltration, and epithelioid-cell granuloma [44,45,46].

Some situations in which the benefits of obtaining a liver biopsy may outweigh the risks include the need to identify lesions that could have prognostic significance; the need to characterize injury patterns from a new drug or a new class of drugs not previously associated with DILI; the need to define the etiology of prolonged elevations in liver tests, and instances in which worsening of liver-related blood tests occur during a clinical trial in a subject who had baseline liver-related blood test abnormalities; to assist in differentiating disease progression from suspected DILI; and the onset of clinically important liver events, even in the background of normal aminotransferases and total bilirubin tests. Examples of clinically important liver events include the new onset of ascites, encephalopathy, or variceal bleeding occurring in a patient with normal aminotransferases and undiagnosed cirrhosis. In such instances, a liver biopsy is useful as it may reveal previously undiagnosed cirrhosis or non-cirrhotic portal hypertension etiologies (e.g., nodular regenerative hyperplasia) that can occur despite liver blood tests being normal [47].

It is important to be aware that DILI can also be associated with high antinuclear antibody (ANA) and anti-smooth muscle antibody (ASMA) titers, as well as high immunoglobulin G (IgG) levels, as drugs are known potential triggers for idiopathic autoimmune hepatitis (AIH) [48, 49]. In addition, there is also a subset of AIH known as drug-induced AIH (DI-AIH), in which patients had pre-existing undiagnosed low-grade disease and/or a genetic predisposition to AIH that becomes overt after being triggered by a drug [48, 50]. Since a considerable degree of histologic overlap exists between all of these forms of autoimmune liver disease, a liver biopsy may reveal characteristics that can distinguish AIH from DILI [51]. Finally, histologic assessment should be done by a hepatopathologist with expertise in distinguishing features that may suggest DILI versus an alternative etiology. Finally, it should be recognized that even though certain histopathological features have been identified that might distinguish AIH and DILI, the differentiation between AIH and DILI remains challenging and may not be possible even if with hepatic histology in hand.

It is important to underscore that when a liver biopsy is done, whether as part of causality assessment or as part of clinical endpoint efficacy assessment, evaluation of histologic results should occur as soon as possible after the procedure has been performed. If unusual or unanticipated findings occur, an external blinded liver and liver pathology safety group should evaluate the finding and data should be unblinded if determined necessary. In this manner, unexpected or unusual histologic findings can be evaluated promptly as part of the safety monitoring.

In conclusion, liver biopsy is not routinely recommended in the evaluation of a suspected DILI case, but may be an important tool in specific instances as detailed above. A liver biopsy should be considered as a final diagnostic approach in instances of diagnostic dilemmas, where the results may provide information that may change the course of treatment or prognosis.

5.4 Recommendations

  1. 1.

    Liver biopsy should be considered when the patient’s benefit for continued treatment with the suspect drug exceeds risk.

  2. 2.

    Biopsy histology results must be used in combination with all other factors to assess causality.

  3. 3.

    Liver biopsy and histological assessment should be considered when it is important to distinguish AIH from DILI.

  4. 4.

    Histologic assessment should be performed by an expert hepatopathologist.

  5. 5.

    Evaluation of liver biopsy histology should occur at the time of or within a few days of the procedure. If unusual or unanticipated findings occur, an external blinded safety group should evaluate the findings and data should be unblinded if determined necessary. In this manner unexpected or unusual histologic findings can be evaluated promptly as part of safety monitoring.

6 Best Practices Adjudication Committee

Expert panels can help to better assess individual cases of suspected DILI or entire programs for the potential for DILI. There is a gap in regulatory guidance and published literature to establish best practices for constituting and optimizing engagement with these committees. The selection of participants, types of data and analyses, and background information supplied is explored in this section.

6.1 Literature

There is no regulatory guidance for clinical trials to guide the optimum composition, structure, and frequency of expert adjudication committees used to assess either individual suspected cases or to conduct aggregate assessments of the potential causal relationship between investigational/marketed products and DILI. The closest analog in the literature and regulatory guidance is for Data Monitoring Committees, but the applicability of this approach to DILI evaluation is less than optimal since a free dialogue between sponsor companies and external experts is desirable, particularly since data available for any individual compound in development is often sparse. Generally, identification of DILI cases within pharmaceutical companies resides within the pharmacovigilance/medical safety group for marketed products that have completed clinical development and within the clinical development and safety group for investigational products. Such an approach lacks the consistency, expertise, and experience across the organization to be ideal, especially for complex situations when patients may have underlying conditions, including liver disease, that require input from experts.

In an attempt to address this issue, some companies have borrowed concepts from data monitoring committees for general safety issues, but those are not optimized for DILI assessments, often lacking the relevant expertise [8, 52], resulting in the need to set up a free-standing Hepatic Adjudication Committee (HAC).

6.2 Survey Results

The IQ-DILI CWG survey collected information on membership, meeting frequency and data needs of groups charged with adjudicating both individual cases of potential DILI and with evaluating entire programs for DILI. Several themes emerged from the IQ DILI survey that may be helpful to companies in the process of forming an appropriate group of experts.

  • When several cases of suspected/confirmed DILI occur in a program, in 54% of respondents (7 companies), internal experts form a HAC to evaluate the signal further, external experts form the HAC in 54% (7) of companies, while a combination of both was reported by 23% (3 respondents). Companies rely upon individuals, rather than a HAC, in the remaining cases: three companies (37.5%) used an internal expert, one (12.5%) an external expert, two (25%) used both internal and external experts, and two (25%) used ‘other’.

  • For the evaluation of suspected DILI cases during a clinical trial, the majority of respondents stated that this was conducted by the internal pharmacovigilance (PV) staff, followed in approximately half the companies’ respondents by an internal or external HAC. In other companies, the adjudication was conducted by an external expert.

  • When a potential hepatotoxicity signal is identified, it is most common for an ad-hoc committee to be formed and to function for the entire company, rather than a standing committee.

  • The scope of these committees includes both clinical studies and post-marketing cases.

  • Panels are multifunctional, most commonly including hepatology/medical, pharmacovigilance/safety, clinical research, and toxicology specialties. It is far less common to include members from chemistry, regulatory, epidemiology, or clinical pharmacology specialties in the assessments. The inclusion of these functions may vary by lifecycle of the product.

  • When external experts are used in a HAC, the most important characteristics considered are recognized expertise in DILI, publications in the field of hepatology, and formal hepatology training rather than formal pharmacovigilance experience or internal expertise gaps.

  • In instances when data is blinded to treatment arm (i.e., placebo and/or several ascending dose cohorts), a staged unblinding may be needed by those not involved in the study.

6.3 Recommendations

  1. 1.

    Adjudication committees are a necessary component in causality assessment since expert opinion remains the standard in clinical development.

  2. 2.

    A minimum of three experienced experts in the field of clinical hepatology/DILI should be used to ensure clarity of decision making in the assessment.

  3. 3.

    Data reviewed should include case details, diagnostic workup to include assessments in the minimum dataset, and available hepatic adverse event clinical data from the entire program.

  4. 4.

    Committees should be informed as to the index of suspicion contained in a summary of pre-clinical data.

7 Missing Data/Minimum Dataset

7.1 Literature

Major difficulties arise in DILI causality assessment when clinical data about the case are incomplete [52, 53]. In one study of 97 published case reports that attributed liver injury to a specific drug, it was found that the case reports lacked substantial information important in determining the cause of the injury [54]. This study as well as several other publications support the need for a more standardized approach to the reporting of DILI. Agarwal et al. [54] suggests cases lacked clinical as well as important laboratory information such as bilirubin, ALP levels, and testing for viral hepatitis. One approach would be to develop a checklist of minimal elements considered essential for diagnosis and causality assessment of cases of DILI.

Although experts in the field recommend a list of minimal clinical and laboratory data that are essential for DILI diagnosis, such suggestions have not been widely adopted [55]. In fact, as of today, there is no unified approach for data collection across the industry even within clinical development.

7.2 Survey Results

The survey collected information on data requirements for DILI cases. Questions addressed what additional follow-up is required for patients with suspected DILI, types of data that is typically missing for DILI assessment, and which order to rank the data that are deemed essential for DILI assessment. The survey also asked if a DILI CRF was available for use. Two respondents shared DILI CRF pages.

  • Seven companies (54%) have a DILI CRF, while six companies (46%) do not. All respondents (13) reported follow-up methods: seven companies (54%) used LIVERTOX® to identify alternative causes; no other hepatotoxicity databases were listed.

  • 100% of respondents (13 companies) indicated that genetic testing was typically missing for DILI assessment.

  • Additional information that was marked as frequently missing included immunology parameters (85%, 11 companies), diagnostic imaging (54%, 7 companies), hepatitis serology (46%, 6 companies), liver synthetic parameters (39%, 5 companies), details on confounders (39%, 5 companies), sufficient serial laboratory information (31%, 4 companies), and complete medical history (31%, 4 companies).

7.3 Recommendations

  1. 1.

    Implementation of a standardized minimum dataset is primarily for the evaluation and confirmation of DILI.

  2. 2.

    A minimum dataset for clinical trials using a standardized protocol or CRF will help to proactively guide investigators, enabling an aggregate assessment of patients’ data with regards to suspect cases. It provides the potential opportunity to combine data from different centers or even countries. As DILI is a relatively rare event, such approaches would allow for faster collection of data and more vigorous scientific analysis. This will also facilitate research for promising biomarkers ensuring the associated clinical data is available to test reliability of these biomarkers.

  3. 3.

    See ESM#3 for a minimum data set. This suggested data set was developed to balance considerations of feasibility and completeness with a goal to ensure a more complete set of data to evaluate cases. The list is based on review of previously published suggestions and discussions within the CWG and with external SMEs and will need to be adapted based on study population and other known risks of DILI [6, 52,53,54,55,56,57].

8 Nuances of Causality Assessment In Post-Marketing Setting

8.1 Literature

Similar to clinical development, post-marketing DILI cases rely on expert and clinical judgment. One caveat may be the scarcity of information in post-marketing cases. A consistent prevailing consensus in the approach to causality assessment is the need for a reliable, objective and reproducible means in assessing DILI causality reflected in a high-quality standard and preferred approach and tool [3, 11, 57,58,59,60,61]. Previous groups have attempted to identify data elements necessary for causality assessment and not always necessary for reporting or supporting cases of DILI [58].

While causality assessment for DILI in post-marketing is challenging due to limited available information, the three main categories used for causality assessment include probabilistic methods, algorithmic scales, and expert judgment [60].

8.1.1 Probabilistic Methods

The probabilistic methods are primarily based on the Bayes Theorem and require a probability for causality. The key components required for this include previous estimation, key findings in the case, as well as background information. The advantage of this type of method is that it has a predictive value and is promising but not yet validated. The limitation of this method is that it is complex, and that specific adverse drug reaction (ADR) incidence is required [60].

8.1.2 Algorithmic Scales

Algorithm methods comprise a set of queries with defined scores; causality is derived from the sequence of questions. There are several algorithms that have been defined and used over the years [60].

8.1.3 Expert Opinion

The 5-point (category) DILIN likelihood causality scale [3] uses both a percentage figure and descriptive legal terminology to grade cases as definite, highly likely, probable, possible, or unlikely and has been described in more detail in the Current Practices/Tools Administered for DILI Assessment section of this paper (Appendix B). A quorum for panel meetings may require all members including the Chair.

Expert opinion continues to be an appropriate method to evaluate post-marketing liver toxicity adverse events.

8.2 Survey Results

In the post-marketing setting, the survey showed that

  • 90% of respondents (11 companies) used external expert review;

  • 75% (9 companies) used internal expert review;

  • 100% (12 companies) had a set internal review process;

  • Of the 6 companies that responded to the use of novel computer algorithms, all 6 reported that they were not using a novel computer algorithm.

8.3 Recommendations

  1. 1.

    IQ-DILI recommends the establishment of a minimum data set for causality assessment in the post-marketing environment. A standardized assessment is recommended, inclusive of spontaneous reporting benefiting from a standardized questionnaire. The recommended data set is summarized in the table of ESM#3: Minimum Data Set for Causality Assessment.

  2. 2.

    The capture of longitudinal data would allow for some assessment of compatibility with drug-induced toxicity; otherwise events will be deemed to have insufficient data.

  3. 3.

    For post-marketing cases that do not meet the minimum data elements, requests for additional follow-up information should be made using standard efforts and due diligence.

  4. 4.

    In ascertaining the optimal causality assessment approach for post-marketing, the recommended guidance is to obtain a causality assessment for DILI consensus, which is better than the reporting of individual assessments without consensus [8].

9 New Approaches

9.1 Literature

New methods combine the use of advanced science and technology, as well as an analysis of large datasets with the ultimate goal of improving causality assessment of DILI. The DILIN's robust database “offers the opportunity to allow the computer to lead the way” [62].

For example, the German national flagship program Virtual Liver Network (VLN) bridges investigations from the subcellular level to patient and healthy volunteer studies in an integrated workflow to generate validated computer models of human liver physiology. The VLN is also researching new data mining technology that can assist with improving already existing standard tools available for causality assessment [63]. In addition, the American Medical Association (AMA) has proposed a baseline policy to guide AMA engagement regarding augmented intelligence [64]. Digital tools have been noted to give “providers a truly holistic view of patient health and function through new data flows” [65]. Importantly, Artificial Intelligence (AI) has the potential to enhance causality assessment although the ‘human factor’ remains a vital component of assessing a patient for DILI and interpreting results from multiple sources.

Another tool in pilot testing is the eRUCAM; an electronic RUCAM that uses electronic medical record data in a population of patients receiving medication associated with hepatotoxicity [66]. The results of the study suggest that it is feasible to create an automated causality assessment algorithm with reasonable concordance between manual RUCAM and eRUCAM scoring. The authors also suggest that refining the seven RUCAM criteria may be beneficial.

A modified Hy’s law known as Hy’s Law n-R has also been developed in the post-marketing setting. When the n-R criteria for Hy’s Law was applied to DILI recognition, it provided the best balance of sensitivity and specificity for the prediction of liver failure [67]. Robles-Diaz et al. recommended that the modified Hy’s Law be examined and tested by regulators.

Furthermore, genetic testing has identified HLA alleles that increase the risk of idiosyncratic reactions and has strengthened the concept of a pathophysiological predisposition for some types of DILI. In a next step, diagnostic tools are required that assess this immunological risk [56]. Other potential tools being developed are in silico algorithms that allow modelling of various parameters to extrapolate the risk in DILI in vivo [66].

Biomarkers are also a potential tool for DILI diagnosis. From a set of biomarkers investigated by the Innovative Medicines Initiative (IMI) Safer and Faster Evidence-based Translation Consortium (SAFE-T) and Predictive Safety Testing Consortium (PSTC), a subset of biomarkers has recently received regulatory support from both the European Medicines Agency (EMA) and the FDA for more systematic use in an exploratory development setting [68]. This set of biomarkers may ultimately enable full qualification of the most promising among them. Another new marker in testing is monocyte-derived hepatocyte-like (MH) cells from the affected patient to distinguish DILI from other liver injuries and to identify the responsible drug among several concomitant therapies [69]. Once these markers are validated in well controlled trials, regulators can incorporate them into existing guidelines.

In 2019, the European Association for the Study of the Liver (EASL) published new clinical practice guidelines for DILI [70]. The recommendations from the CWG are complimentary and aligned with the recommendations of the EASL practice guidelines in the setting of potential DILI and approaches to causality assessment. It should be acknowledged that DILI can never be confirmed by registry analyses.

9.2 Survey Results

Respondents were asked to indicate whether another assessment method was used by their company, how effective that method is, and how is it differs from the other available methods with as much description or detail as possible. Respondents were also asked to assign a weight as to the importance of specific methods when evaluating DILI (‘novel computer algorithms’ and ‘other’ were included as choices).

  • None of the companies that responded (6) reported using novel computer algorithms when evaluating DILI in clinical trials.

  • 85% of respondents (11 companies) reported that they will not be able to make data available for validating new methodologies due to the complexities of having patient consents, having finalized data available, and having anonymized patient-level data.

Further information on novel methods used to evaluate DILI risk during drug development was collected as part of a survey conducted by the IQ DILI Non-Clinical Working Group and will be published as part of a separate paper.

9.3 Recommendations

Newer and yet-unvalidated tools are under development. The following research areas should be considered to enhance causality assessment:

  1. 1.

    Novel algorithms;

  2. 2.

    New biomarkers;

  3. 3.

    Genome sequencing.

Many approaches can be brought to bear to facilitate advancement of some of these tools.

  • Implementation of minimum datasets in clinical trials to ensure robust and consistent availability of data in confirmed and refuted cases of suspected DILI.

  • A collection of cases adjudicated by expert opinion to serve as test and validation datasets for new tools, including artificial intelligence solutions.

  • Robust data mining from maturing registries and cross-industry clinical trial databases.

  • Interactions and sharing of best practices among CIOMS working groups, academia, and developers of new tools such as the eRUCAM, n-R formula.

10 Conclusions

The ROL, survey, and SME feedback helped provide a better understanding of the current landscape of DILI causality assessment. Areas of particular interest included current practice tools, nuances, and challenges of causality assessment during clinical development (including the use of rechallenge and liver biopsy to improve causality assessment), and potential novel approaches to causality assessment. DILI may manifest with different clinical phenotypes. Another layer of complexity is the variation in baseline liver enzymes amongst individuals due to underlying liver disease [33]. The survey and literature review provided a renewed understanding of the complexity and challenges of assigning DILI causality as well as the use of varying approaches and the lack of a standard approach. The development of a consistent framework for clinicians, investigators, industry, and regulatory agencies to evaluate drug hepatotoxicity across various settings is timely and could be indispensable. Combining the use of advanced science and technology with robust data mining is necessary. Maturing registries around the world and data from electronic medical records such as DILIN and the VLN play an important role in gathering information to potentially pool and share. The use of advanced technology, including novel algorithms, better diagnostic instruments, new biomarkers, and a minimum dataset are approaches that may require additional attention. Similarly, a validated process for causality assessment for DILI in clinical trials is an area that could be further developed, as there are currently no tools that exist for clinical trials. RUCAM was not developed for use in clinical trials. The CWG did consider the possibility of creating a causality assessment tool for clinical trials, however the development of such a validated tool was not deemed feasible at the time. Determining the source of the data for evaluation was also difficult. In this regard, potential partnerships with different organizations may be helpful in determining next steps for creating a causality assessment tool for use in clinical trials.

The survey provided trends, however harmonization of causality assessments in DILI does not appear to be present. This review provides recommendations including possible future research opportunities and remaining knowledge gaps. Three areas have been identified as those likely to enhance consistency and standardization of DILI causality assessment. These include (1) development of a standardized CRF containing a minimum dataset, (2) utilization of formal adjudication committees, and (3) updating of current assessment tools or development of novel tools. In conclusion, a variety of working groups and scientific technology including novel biomarkers and computerized algorithms have made progress in pursuing new tools to assist with causality assessment in the DILI setting. Continued progress is necessary and joining of resources to optimize and advance innovative tools would be beneficial to the scientific, pharmaceutical, and regulatory community.