INTRODUCTION

Dissolution tests are used to guide the development of new formulations, monitor the quality of drug products, assess the potential impact of post-approval changes on product performance, and, in some cases, predict the in vivo performance of the drug product. It is often necessary to collect dissolution data at multiple time points to adequately characterize the in vitro performance of the drug product more precisely than the point estimate approach (1).

The resulting dissolution profiles of the product or products under different test conditions (e.g., media pH) can then be compared using model-independent or model-dependent methods (2). The model-independent similarity factor (f 2) approach is a relatively simple and widely accepted method for comparing dissolution profiles. In fact, many regulatory authorities require the use of the f 2 test for this purpose. However, the rules and criteria associated with the application of this test are not harmonized on a global basis.

While the majority of guidance documents do not differentiate similarity assessment for various dosage forms such as immediate release (IR) versus modified release (MR), there are minor areas where application of f 2 may differ (e.g., criteria for coefficient of variation and definition of “early time point”). This article examines the rules and criteria for the f 2 similarity test that are published by a number of influential health authorities worldwide, and is applicable to both IR and MR dosage forms. Comparing and contrasting differences in the rules and criteria used to demonstrate similarity enables the pharmaceutical industry and regulatory authorities to establish scientifically relevant expectations to improve global harmonization of dissolution similarity requirements.

Regulatory Landscape

While regulatory authorities in the European Union (EU) and United States of America (US) have historically been at the forefront of dissolution guidance, recent trends indicate a proliferation of tailored dissolution similarity requirements from regulatory authorities around the world. This results in differences in dissolution profiles requirements and a significant amount of needless and redundant work that does not help to ensure product safety or efficacy.

The f 2 similarity factor approach is recommended by many global regulatory authorities as a means to demonstrate dissolution similarity. This approach is favored because it is relatively easy to use, the f 2 value is easy to calculate, and a clear acceptance criterion for profile similarity (i.e., f 2 ≥ 50) has been established (2). An f 2 value of 50 corresponds to an average difference of 10% at all specified time points (3). In the US, for example, the Food and Drug Administration (FDA) recommends that a dissolution profile comparison be performed under identical conditions for the product before and after some formulation changes. The FDA’s Scale-Up and Post-Approval Changes (SUPAC) guidance defines type of change, such as components and composition, site and scale of manufacturing, manufacturing process, and equipment (4). Other factors should also be considered, including the Biopharmaceutics Classification System (BCS) designation and the therapeutic index of the drug.

The European Medicines Agency (EMA) guideline on the investigation of bioequivalence states that “If a product has been reformulated from the formulation initially approved or the manufacturing method has been modified in ways that may impact on the bioavailability, an in vivo bioequivalence study is required, unless otherwise justified (5).” This guideline also recommends the use of the f 2 test.

The Japanese guidance for bioequivalence recommends that the mean results of the test and reference formulations be compared in two ways: by the absolute difference and by applying the f 2 mathematical equation used by the FDA (6). The Japanese guidance documents also require screening experiments to determine the reference lot and the dissolution media to be used when comparing the dissolution performance of the test and reference formulations.

While the majority of the published literature focuses on the f 2 requirements in major markets (2,710), there exists limited literature comparing other global markets. Each country determines how to apply and enforce regulations to ensure the safety and efficacy of pharmaceutical products for sale in their jurisdiction, and the regulatory authorities in these markets are asking for more information about dissolution methods and dissolution profile comparisons. In general, the f 2 test is an acceptable approach for assessing the similarity of product quality and performance characteristics after post-approval changes. Likewise, the f 2 test provides an opportunity to obtain a waiver of in vivo bioequivalence studies for additional dosage strengths under certain biowaver criteria specified in the appropriate guidelines (5,11,12). In the EU and US, multi-media dissolution testing and f 2 comparison may be accepted in lieu of bioequivalence studies under certain conditions.

METHODS

While the f 2 test is generally accepted for demonstrating dissolution profile similarity on a global basis, subtle country-to-country differences with respect to how this test should be applied can affect how the dissolution experiments are performed and these regulatory differences can even affect the overall conclusion from these experiments. For our comparative analysis of dissolution similarity requirements, we compared and contrasted regulations from 14 global markets. Our analysis includes a discussion of bioequivalence guidelines from major markets (i.e., Australia, Canada, EU, Japan, and US) as well as markets where a high level of regulatory scrutiny has been observed during review of recent regulatory applications (i.e., Brazil, China, India, Korea, Mexico, Russia, South Africa, Thailand, and Turkey). The objective of this review is to compare the regulatory guidelines and expectations established by different health authorities for demonstrating dissolution profile similarity. The markets that were considered in this review, respective regulatory health authorities, and links to their websites are shown in Table I.

Table I Countries and Regulatory Authorities Considered in this Review

In this comparison of global dissolution requirements study, the following aspects associated with the similarity factor approach are compared:

  • f 2 criteria for demonstrating similarity

  • Criteria for exemptions from f 2 comparisons

  • Minimum number of time points required for an f 2 calculation

  • Determination of the last time point for an f 2 calculation

  • Coefficient of variation criteria

Criteria for this comparison were taken from guidelines published by the various regulatory agencies, as well as recent experience with regulatory applications. Before comparing these aspects, however, it is important to briefly review the general approach used for comparative dissolution studies as well as the fundamentals of the statistical approaches that are available for comparing dissolution profiles.

Comparative Dissolution Methods

To successfully bridge formulation and manufacturing process related changes in the preapproval or post-approval space, most regulatory agencies recommend that the f 2 assessment be conducted with a specified number of reference (prechange) and test (postchange) drug product lots. In Japan and Korea, for instance, three prechange production batches are tested and the batch with the intermediate dissolution rate is selected as the reference lot; likewise, three postchange production batches are tested and the batch with the intermediate dissolution rate is selected as the test lot. Dissolution profiles of reference and test products are performed with a validated dissolution method using the medium described in the regulatory application as well as two additional media, for example:

  • 0.1 N HCl or simulated gastric fluid without enzymes

  • pH 4.5 acetate buffer

  • pH 6.8 phosphate buffer or simulated intestinal fluid without enzyme

The purpose of testing the product in these three media is to assess its dissolution performance across the physiologically relevant pH range. In cases where multiple time points and multiple media testing are required, special consideration should be given to media selection. For example, the use of water as a dissolution medium and dissolution media outside the physiologically relevant pH range may require justification.

Statistical Considerations

Dissolution profiles may be considered similar by virtue of (1) overall profile similarity and (2) similarity at every dissolution sample time point. The dissolution profile comparison can be conducted using model-independent or model-dependent statistical methods.

In 1996, Moore and Flanner proposed two indices, or fit factors, to compare dissolution profiles in a pairwise fashion (13). These indices are known as the difference factor (f 1 ) and the similarity factor (f 2 ). To accurately compare two profiles using these fit factors, the dissolution results should be obtained at a sufficient number of time points to adequately characterize the shape of the dissolution profiles. Because the mean dissolution profiles are compared using these fit factors, the variability associated with the dissolution results of the individual dosage forms at each time point must also meet certain regulatory criteria.

The f 1 factor calculates the percent difference between the two dissolution profiles at each time point and is a measurement of the relative error between the two profiles:

$$ {f}_1=\left(\frac{{\displaystyle {\sum}_{t=1}^n\left|{R}_t-{T}_t\right|}}{{\displaystyle {\sum}_{t=1}^n{R}_t}}\right)\times 100 $$

where n is the number of time points, R t is the mean dissolution value for the reference product at time t, and T t is the mean dissolution value for the test product at that same time point. The f 1 value is equal to zero when the test and reference profiles are identical and increases as the two profiles become less similar.

The f 2 factor is a logarithmic reciprocal square root transformation of the sum of squared error and is a measurement of the similarity in the percent dissolution between the two profiles:

$$ {f}_2=50\times { \log}_{10}\left[\frac{100}{\sqrt{1+\frac{{\displaystyle {\sum}_{t=1}^n{\left({R}_t-{T}_t\right)}^2}}{n}}}\right] $$

The f 2 value is equal to 100 when the test and reference profiles are identical and exponentially decreases as the two profiles become less similar.

f 2 Criteria for Demonstrating Similarity

According to the guidelines issued by the 14 regulatory authorities evaluated in this study, f 1 values up to 15 (0–15) and f 2 values greater than 50 (50–100) ensure the “sameness” or “equivalence” of the two profiles (1,2). Values less than 50 may be acceptable if justified (5,14).

Statistical Methods When the Variability Is Large

If the variability associated with the individual dissolution results at one or more time points for either the reference or test batch does not meet the criteria specified by the regulatory authority (typically ≤20% RSD at early time points and ≤10% at later time points), calculation of the f 2 statistic is not recommended and alternative statistical procedures should be used.

One alternative proposed by Shah et.al. (15) is to use bootstrapping to calculate a lower bound for f 2 . Bootstrapping is the practice of estimating properties of an estimator (such as its variance) by measuring those properties when randomly sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution function of the observed data. Similarity can be claimed when the 95% lower bound for f 2 is greater than or equal to 50. The result obtained will be biased low, which makes it a conservative estimate. This means that the lower bound for f 2 will be less than 50 more often than intended for cases where the differences in the true dissolution profiles would yield f 2 values close to 50.

An alternative to the bootstrap f 2 procedure is to use the concept of similarity testing and to apply a two one-sided t test (TOST) approach at each dissolution time point. This approach requires defining a criterion for similarity a priori with respect to the maximum acceptable difference between the two mean dissolution profiles. By default, this generally describes a similarity region of ±10%. A confidence interval (typically 90%) is then constructed about the mean differences at each dissolution time point. If each of the calculated confidence intervals lies entirely within the similarity region, a claim of similarity can be supported.

This approach has the benefits of not needing an assumption of equal variances in the dissolution data for the reference and test batches, and it is not constrained by the amount of variability present. However, this method will almost always result in a claim of non-similarity if the variability is too large. It is also unclear what the conclusion should be in cases where one or more of the confidence intervals does not lie entirely within the similarity region.

Other Statistical Approaches

Other approaches for comparing dissolution profiles are allowed by regulatory authorities in some countries as long as they are justified. For example, the model-independent multivariate confidence interval method for comparing the dissolution curves is explicitly mentioned in the FDA guidance on Dissolution Testing of Immediate Release Solid Oral Dosage Forms (14). This method uses the Mahalanobis Distance between the mean dissolution profiles in n-dimensional space where n is the number of dissolution time points in the data set. The test and reference samples can be considered to have similar profiles if the upper limit of the confidence interval calculated between the reference and test sample is less than or equal to the similarity limits derived from testing multiple reference batches.

Many articles suggest fitting mathematical models to the dissolution curves for each unit tested. It is recommended to adopt a model with not more than three parameters (such as a linear, quadratic, logarithmic, or Weibull model). A multivariate statistical distance (MSD) and its confidence interval are then calculated between the mean of the parameter estimates obtained from the test and reference samples. This is compared to a similarity region defined by looking at the MSD between the parameter estimates obtained from multiple reference batches.

RESULTS

The focus of this comparison of global dissolution requirements study is to compare and contrast the regulatory requirements associated with the application of the f 2 similarity assessment.

For similarity assessments in all markets, testing must be conducted under identical conditions using 12 dosage units for both test and reference products. Dissolution profile similarity testing and any conclusions drawn from the results (e.g., the products are similar or a biowaiver is justified) can be considered valid only if the dissolution profile is satisfactorily characterized using a sufficient number of time points. According to EMEA and FDA guidelines, it is not necessary to compare the dissolution profiles of very rapidly dissolving dosage forms as long as the test and reference products are more than 85% dissolved within 15 min in the specified dissolution media (4,5). For this reason, a 15-min time point should be included when testing rapidly dissolving dosage forms. In the subsequent sections, additional detail where regulatory divergence is noted will be discussed.

Criteria for Exemptions from f 2 Comparisons

When the active pharmaceutical ingredient is highly soluble across the physiologically relevant range of pH and the dosage form exhibits very rapid dissolution, it may not be necessary to compare dissolution profiles. The definition of “very rapid dissolution” varies according to country regulatory guidance as shown in Table II.

Table II Similarities and Differences in Criteria for Exemptions from f 2 Comparisons

The majority of guidelines state that dissolution profile comparisons are unnecessary when the test and reference batches are more than 85% dissolved within 15 min.

Minimum Number of Time Points

A minimum of three time points (zero excluded) is generally required for the calculation of f 2 values. The selected time points must be the same for the test and reference products. It should be noted that more than three time points may be required to adequately characterize the shape of the dissolution profiles. The EMA guideline (5) suggests that sampling should occur at least every 15 min for immediate-release products and that more frequent sampling is recommended during the period of greatest change in the dissolution profile. This guideline also states that sampling at 5 or 10 min intervals may be necessary to adequately characterize the dissolution profiles of rapidly dissolving products, where dissolution is essentially complete within 30 min. Therefore, it may be necessary to perform some preliminary studies to determine the most appropriate time points to be used with each dissolution medium during the definitive studies with the test and reference batches. The similarities and differences in the minimum number of time points required for f 2 calculation are summarized in Table III.

Table III Similarities and Differences in the Minimum Number of Time Points Required for an f 2 Calculation

The time points for dissolution testing could be spaced at regular intervals or adjusted to better characterize the dissolution profiles. In some cases, guidelines recommend the appropriate time points. For an extended-release dosage form, the selection of time points should be based on the shape of the specific dissolution profile and not on specified time points for all drug products. It would be useful to perform some preliminary experiments to determine the sampling time points that adequately characterize the dissolution profiles before the initiation of the comparative dissolution testing.

The importance of time point selection to avoid biasing the f 2 results is illustrated in the example dataset provided in Table IV, which was adapted from a workshop given by the World Health Organization (WHO) (16). In this example, it is assumed that the same dissolution results are obtained when the protocols include either six or four time points. If all of the dissolution results obtained at the six time points (i.e., 10, 15, 20, 30, 45, and 60 min) are included in the f 2 calculation, an f 2 value of 47 is obtained. As a result, the overall conclusion is that the dissolution profiles for the test and reference products are not similar (f 2  < 50). This conclusion should be contrasted with that from the protocol where samples are taken at 15, 30, 45, and 60 min. The f 2 value obtained using the results from these four time points is equal to 57, resulting in the overall conclusion that the two curves are similar. This example demonstrates that the sampling time points must be sufficiently spaced to appropriately characterize the curve and to comply with the guidance that only one time point should be considered after 85% dissolution of both the test and reference products (4). In the latter example, the choice of time points used leads to a different and potentially incorrect conclusion as the time points used did not adequately characterize the dissolution profile over the steepest part of the curves.

Table IV Example Data Showing the Importance of Time Point Selection

Last Time Point

The regulatory guidelines for some countries allow that the dissolution results from only one measurement (i.e., time point) should be considered after 85% dissolution of the product. However, the determination of the last point measurements requirement varies from one country to another as shown in Table V.

Table V Similarities and Differences in Determination of the Last Time Point for an f 2 Calculation

Similar to the time point selection example, the dataset shown in Table VI, which was adapted from a workshop given by the WHO (16), illustrates the importance of minor differences in how the last time point is determined. If the last time point allowed is when both the test and reference products reach 85% dissolution, all data up to 45 min may be considered, and an f 2 value of 53 is obtained. For the same data set, if the last time point allowed is when either the reference or test product reach 85% dissolution, data up to the 20 min time point may be considered and an f 2 value of 48 is obtained. The example provided in Table VI shows that the same dataset will result in a different overall conclusion when the different global criteria are applied.

Table VI Example Data Showing the Importance of Determining the Last Time Point for Calculation of f 2

Coefficient of Variation Criteria

In general, the guidelines for immediate-release products state that the coefficient of variation (%CV) for the individual dissolution results should be not more than 20% at the earlier time points and not more than 10% at other time points. However, the guidelines from many countries do not clearly define what constitutes an “early” time point for either immediate-release products or modified-release products. For immediate-release products, for example, it may be appropriate to define all time points of 15 min or less as early time points. For modified-release products, however, the shape of the dissolution profile must be taken into account when defining an early time point.

It is important to point out that time, per se, is not the key variable to use to define what constitutes an “early” time point. In general, the coefficient of variability changes as a function of percent dissolved and not necessarily as a function of time. For example, time points up to several hours could be considered as “early” time points for an extended-release dosage form, while 15 min might be a reasonable cut-off for an immediate-release dosage form.

The global requirements related to variability are provided in Table VII. It is important to highlight that the same data set may meet the criteria established by some countries and not in others.

Table VII Similarities and Differences in Coefficient of Variation Criteria

DISCUSSION

Based on the assessment of the selected countries’ dissolution requirements, it is recommended to methodically apply the local requirements in the following categories to ensure regulatory compliance:

  • Selection of dissolution media

  • Adequate selection and number of batches

  • Appropriate number of dosage units

  • Suitable time points

  • Appropriate determination of the last time point

While the overall f 2 acceptance criterion and the number of dosage units from the test and reference batches that must be tested is harmonized across the global guidance documents evaluated in this review, the authors propose that representatives from the global regulatory bodies and pharmaceutical industry perform a thorough evaluation of the following criteria with the ultimate goal of reaching a harmonized guidances for both immediate-release and modified-release oral dosage forms:

  1. 1.

    Criteria for exemptions from f 2 comparisons: Most countries recommend that where more than 85% of the drug is dissolved for both the test and reference products within 15 min, dissolution profiles may be accepted as similar without further mathematical evaluation. However, countries such as Brazil require that a coefficient of variation at the 15-min time point may not exceed 10%. A dialog is required to harmonize these rules and acceptance criteria.

  2. 2.

    Minimum number of time points: Most countries recommend that f 2 calculations be based on a minimum of three time points. Some markets require “suitably spaced” time points or “adequate sampling.” Still, others require a minimum of five time points, and further specify where on the dissolution curve that the points must fall. The minimum number of time points that are required for f 2 calculations and how they should be selected for immediate-release and modified-release products should be harmonized.

  3. 3.

    Last time point to include in an f 2 calculation: Significant differences in how the last time point is determined was noted. Some countries require the reference drug to reach 85% dissolution, others require both reference and test drugs to reach 85% dissolution, while others are even less specific. Because this criterion has the potential to influence the overall conclusion of the comparative dissolution assessment, a globally harmonized criterion is required.

  4. 4.

    Coefficient of variation: Most countries recommend that the coefficient of variation for the individual dissolution results should not exceed 20% at “early” time points and should not exceed 10% at subsequent time points. In some countries, the coefficient of variation should not exceed 15% at any time point. In addition, the current guidelines provide unclear or conflicting information as to what constitutes an “early” time point. Furthermore, it seems to us that the acceptance criterion for the coefficient of variation should be based on the mean dissolution results for a particular product rather than on the same set of time points for all products. A harmonized approach is required because the rules and acceptance criteria associated with the variability of the individual dissolution results often determine when the f 2 test can be used to compare dissolution profiles.

The authors advocate a dialog between industry and the regulators to identify ways to minimize the divergence in regulatory expectations as this would facilitate patient access to the medicines they need. A path forward will be to bring this topic to the International Conference of Harmonization (ICH). As a first step toward harmonization, the authors recommend the rules and criteria shown in Table VIII.

Table VIII Recommended f 2 Harmonized Criteria

CONCLUSIONS

The regulatory landscape is ever-changing and a comprehensive review of the current guidance documents should be undertaken when applying the similarity factor approach to comparing in vitro dissolution profiles. As detailed in this study, there is considerable global variance in the determination of equivalence using the similarity factor approach. These differences in expectations create a complex regulatory landscape for the pharmaceutical industry, leading to potential confusion, errors, and delays in the delivery of safe and efficacious medicines to patients. At the same time, this complexity increases the cost of medicines and does not help to ensure patient safety. Ultimately, we believe that a scientific and regulatory dialog is needed to harmonize the rules and acceptance criteria associated with the f 2 test. To that end, we have included an initial proposal in this paper.