FormalPara Key Summary Points

Filgotinib is a once-daily oral Janus kinase 1 preferential inhibitor, approved for the treatment of rheumatoid arthritis in Europe, the UK and Japan, and for ulcerative colitis treatment in Europe, and is under investigation for the treatment of Crohn’s disease.

Preclinical studies of filgotinib in rats and dogs demonstrated histopathological lesions (germ cell depletion/degeneration and/or tubular vacuolation) in testes with correlating findings in epididymides; the highest exposure level with no adverse effects on the testis in dogs (the most sensitive species) was twofold above the exposure seen in humans taking filgotinib 200 mg once daily, the highest dose evaluated in phase 3 studies.

The randomized, double-blind, placebo-controlled, phase 2 MANTA and MANTA-RAy studies were designed, upon consultation with global regulatory authorities, to evaluate the impact (if any) of filgotinib 200 mg on semen parameters in participants with active inflammatory bowel disease (IBD) and rheumatic diseases, respectively, and are the first large studies to assess the effects of an advanced therapy for IBD and rheumatoid diseases on semen parameters.

Despite several challenges, such as stringent selection criteria (active inflammatory disease, no history of reproductive health issues, semen parameters ≥ 5th percentile of World Health Organization reference values) with the potential to slow recruitment, and logistical complexities (e.g. standardization of assessments of semen parameters across multiple, international sites), a unique and robust trial programme was successfully executed.

The primary endpoint (combined across both trials) was the proportion of participants with a ≥ 50% reduction from baseline in sperm concentration after 13 weeks of treatment with filgotinib 200 mg versus placebo; overall conclusions, however, will be based on the totality of the data, including secondary and exploratory endpoints.

Introduction

Chronic inflammatory diseases are increasing in prevalence [1, 2], probably owing to changes in social, environmental and lifestyle factors [3]. The classic signs, symptoms and complications of chronic inflammatory conditions such as inflammatory bowel disease (IBD) and rheumatoid arthritis (RA) are well known [4, 5]. Less widely appreciated are the negative impacts of chronic inflammatory conditions on sexual and reproductive health [6,7,8,9,10,11,12,13,14,15,16].

Despite advances in the management of chronic inflammatory diseases, especially with regard to the introduction of biologic agents, significant unmet treatment needs still remain in terms of rates and durability of response, tolerability and safety [17,18,19]. Janus kinase (JAK) inhibitors have been investigated and approved as treatments for inflammatory diseases owing to the wide range of cytokine pathways regulated via JAK/signal transducer and activator of transcription signalling [20]. Filgotinib is an orally administered, adenosine triphosphate-competitive, reversible inhibitor of the JAK family, with preferential inhibitory activity against JAK1 [21]. After oral administration, filgotinib is quickly metabolized by carboxylesterase 2 to GS-829845, which has similar properties of JAK1 preferential inhibition, but a longer half-life than filgotinib [22, 23]. Consequently, GS-829845 exposure is 16–20-fold higher than filgotinib exposure, and thus has the greatest contribution to the overall pharmacodynamic effects of filgotinib treatment [22].

Filgotinib (100 mg or 200 mg once daily) is approved in Europe and the UK as monotherapy, or in combination with methotrexate, for the treatment of moderately to severely active RA in patients with inadequate response or intolerance to one or more disease-modifying anti-rheumatic drugs (DMARDs) [24], and in Japan for the treatment of RA (including prevention of structural joint damage) in patients who have had inadequate response to conventional therapies. Filgotinib has also recently been approved in Europe for the treatment of moderately to severely active ulcerative colitis (UC), on the basis of positive findings from the multicentre, randomized, double-blind, placebo-controlled phase 2b/3 SELECTION study [25]. Filgotinib is also under investigation in patients with Crohn’s disease (CD) in a global, phase 3 clinical trial programme (the DIVERSITY1 study; EudraCT number 2016-001367-36; ClinicalTrials.gov identifier NCT02914561).

Alongside clinical trials, the development of new therapeutic compounds requires extensive in vitro and in vivo testing. Evaluation of potential testicular effects is a standard component of this process and is routinely assessed during repeat-dose toxicity studies. Preclinical, chronic toxicity studies in rats and dogs demonstrated that male reproductive organs were affected by filgotinib, but not by its primary metabolite, GS-829845 [26]. These effects appeared at different exposure levels across species, with dogs being the most sensitive [26]. The observed lesions comprised germ cell depletion/degeneration and/or tubular vacuolation in testes with correlating findings in epididymides (reduced sperm count and/or cell debris) at high doses of filgotinib [26]. In rats, similar changes were observed which were associated with reduced male fertility in reproduction studies [24]. At the ‘no observed adverse effect level’ (NOAEL) in dogs, defined as the highest drug exposure level that produced no adverse effects, the exposure was twofold above the exposure seen in humans taking filgotinib 200 mg once daily [24], the highest clinical dose evaluated in phase 3 studies [25, 27].

Semen parameters are among very few clinical parameters that can be used to reliably monitor testicular function, along with serum testosterone and gonadotropin (e.g. follicle stimulating hormone [FSH] and luteinizing hormone [LH]) measurements [28]. In evaluating parameters related to testicular function, it is important to account for the known latency period of several months (reflecting the approximately 13-week time course of one spermatogenic cycle) between testicular injury and detection of its effect on semen parameters [29]. In addition, there is a limited ability to interpret changes from baseline in semen parameters in terms of their effects on fertility [28]. Other measures of relevance to findings with filgotinib in animal studies, such as pregnancy rates to assess fertility, or evaluations of testicular histology, are neither practical nor feasible for use in clinical trials [28]. Semen parameters are thus the main outcome measures recommended for trials evaluating testicular toxicity and/or fertility in men, and US Food and Drug Administration (FDA) guidance is provided on the conduct of such trials [28].

On the basis of the aforementioned findings with filgotinib in animals, and upon consultation with global regulatory authorities, two international, randomized, double-blind, placebo-controlled, parallel-group phase 2 studies (MANTA [EudraCT number 2017-000402-38; ClinicalTrials.gov identifier NCT03201445] and MANTA-RAy [EudraCT number 2018-003933-14; ClinicalTrials.gov identifier NCT03926195]) were designed to evaluate the impact (if any) of filgotinib 200 mg on semen parameters in participants with IBD and rheumatic diseases, respectively. These trials were designed so that data for the week-13 primary endpoint could be pooled for analysis, thus encompassing a wide range of participants with inflammatory diseases.

Here we present the rationale and methodology for the MANTA and MANTA-RAy studies, with the goal of providing appropriate context for interpretation of their forthcoming results, as well as the practical insights gained during the design of these trials.

Rationale for Patient Selection

Recent (finalized October 2018) FDA guidelines for the evaluation of testicular toxicity during drug development recommend that, to the extent feasible, participants should be representative of the population(s) for whom the drug is intended [28]. This guidance is especially pertinent to chronic inflammatory conditions, which are known to affect the male reproductive system [6, 7, 14]. Therefore, effects of filgotinib on semen parameters in healthy individuals may not sufficiently reflect those in patients with inflammatory diseases. The MANTA and MANTA-RAy studies therefore included participants with IBD and rheumatic diseases, respectively, with inadequate response (or intolerance) to previous inflammatory-disease-specific treatment.

General Eligibility Criteria

Both the MANTA and MANTA-RAy studies included men aged 21–65 years with active inflammatory disease (Table 1). Full inclusion and exclusion criteria for these studies are provided in Tables S1–S4 in the electronic supplementary material. In MANTA, participants were required to have documented UC (minimum extent of 15 cm from the anal verge) or CD lasting at least 4 months, and to meet the criteria for moderately to severely active IBD at (or in the 90 days before) screening. In MANTA-RAy, participants had to meet specific diagnostic criteria for RA, psoriatic arthritis (PsA), ankylosing spondylitis (AS) or non-radiographic axial spondyloarthritis (nrAxSpA), for at least 12 weeks before screening, and criteria for active disease during the screening period (Table 1).

Table 1 Key inclusion criteria for the MANTA and MANTA-RAy studies (see Tables S1–S4 in the electronic supplementary material for full inclusion and exclusion criteria)

Individuals were excluded from MANTA and MANTA-RAy if they had previously documented problems with male reproductive health (e.g. primary or secondary hypogonadism) or a previous diagnosis of reduced fertility. Choice of eligibility criteria for baseline semen parameters was primarily based on FDA guidance, which recommends values equal to or exceeding the generally accepted 5th percentile of World Health Organization (WHO) 2010 reference values [30]. The eligibility criteria used in the current study were semen volume ≥ 1.5 mL, total sperm per ejaculate ≥ 39 million, sperm concentration ≥ 15 million/mL, sperm total motility ≥ 40% and normal sperm morphology ≥ 30%. These were based on the mean of two separate semen samples taken during the 45-day screening period (further procedural details are provided later). Motility assessments were based on the proportion of sperm with any sort of movement (total motility), rather than the proportion of sperm with forward movement (progressive motility).

The aforementioned reference values are based on data from ‘fertile’ men (defined as those with partners who had a time to pregnancy of ≤ 12 months) and do not represent a ‘normal’ reference range per se, but were developed as a tool for use in conjunction with clinical data to contribute to clinical evaluations of semen quality and prospects for fertility [30]. Exceptions to the WHO 2010 criteria in the MANTA and MANTA-RAy trials were total sperm motility, with a 40% cut-off (vs progressive motility, with a 32% cut-off in the WHO 2010 criteria [31]), and normal morphology, with a 30% cut-off (vs 4% in the WHO 2010 criteria [31]), which were based instead on WHO 1992 criteria [32, 33]. These exceptions reflect the fact that development of WHO 1992 criteria (vs WHO 2010 criteria) was based more on studies of testicular toxicity/sperm safety (vs studies in fertile men) [33], making them potentially more suitable in this instance.

Exclusion of Potentially Sperm-Modifying Medications

While data on the effects of anti-inflammatory agents on semen parameters are limited, sulfasalazine has been shown to have a clear negative effect on semen parameters [16]. Conversely, studies investigating exposure to tumour necrosis factor inhibitors in patients with AS have indicated possible improvements in semen parameters with these drugs [16]. Consistent with FDA guidance, use of these (and other) potential sperm-modifying agents was prohibited before and during the MANTA and MANTA-RAy studies owing to their potential confounding effects on the trial results [28].

Other Medication Restrictions

Potential sperm-modifying effects have also been implicated for other medications, but data are so far inconclusive [16]. Exclusion of all such drugs in the MANTA trial, on the basis of ‘possible’ sperm-modifying effects, would have been untenable from a recruitment perspective. In the MANTA trial, concomitant use of oral 5-aminosalicylic acid compounds, azathioprine, 6-mercaptopurine, methotrexate and corticosteroids (dose equivalent to ≤ 20 mg/day of prednisone) was therefore allowed, provided the prescribed dose was stable for at least 4 weeks before randomization and up to, at minimum, the time of primary endpoint assessment (methotrexate required a stable dose for 26 weeks). In the MANTA-RAy trial, concomitant use of methotrexate (≤ 25 mg/week), leflunomide, hydroxychloroquine, chloroquine and apremilast was allowed, also with the requirement of stable dosing at least 4 weeks before randomization and up to, at a minimum, primary endpoint assessment.

Effective IBD and rheumatic disease therapies were not discontinued for the purposes of inclusion in these trials. The full lists of medications that were restricted during these trials are presented in Tables S5 and S6 in the electronic supplementary material.

Logistical Considerations

IBD occurs during the peak reproductive years. It is associated with a higher frequency, versus the general population, of both reproductive (e.g. reduced sperm quality [6], increased risk of erectile dysfunction [7]) and psychological issues (e.g. reduced libido [7], impaired body image [8], anxiety and depression [9]). All of these factors may combine to affect sexual and reproductive health. Nevertheless, fecundity in patients with IBD is not necessarily reduced versus the general population [10], highlighting that overall fertility is a complex product of both male and female sexual and reproductive health [10]. Voluntary childlessness, however, is more common in patients with IBD than in the general population, possibly owing to heightened concern about adverse reproductive outcomes that, although legitimate, may be disproportionate to the available evidence [10, 11].

Rheumatic disease can also affect sexual function and reproduction in men, with sexual dysfunction reported in a higher proportion of men with rheumatological conditions compared with healthy controls [12,13,14]. In addition, the recently conducted multicentre, cross-sectional iFAME-Fertility study found that participants diagnosed with inflammatory arthritis before and during their peak reproductive age had a lower fertility rate, a higher childlessness rate and an increased frequency of fertility problems, compared with those who were diagnosed after their peak reproductive age [15]. Importantly, some treatments for RA (and IBD) may also affect sexual and reproductive health [16].

In light of the aforementioned considerations, it was considered possible that the highly restrictive (IBD and reproductive) eligibility criteria, and restriction of some widely used medications, particularly sulfasalazine, had the potential to substantially limit the recruitment pool. These factors contributed to the decision to use an international, multicentre study design, to ensure that enrolment rates were sufficient to complete the trials in a timely manner.

It was also anticipated during the design of the MANTA and MANTA-RAy trials that patients with inflammatory diseases, who often have negative perceptions concerning fertility, may be hesitant to participate in trials evaluating semen parameters, which are often mistakenly equated to measures of fertility and sexual performance [11, 34]. With regard to the latter point, materials were provided to help normalize and simplify the requirement for a semen sample, to educate study staff and patients on men’s reproductive health to enable accurate and comfortable conversations, and to help study staff and patients understand the objectives and value of the MANTA and MANTA-RAy studies.

Study Design Rationale

Both MANTA and MANTA-RAy were conducted in accordance with the principles of the Declaration of Helsinki, International Council for Harmonisation guidelines, or with the laws and regulations of the country in which the research was conducted, whichever afforded the greatest protection to study participants. Trial protocols and subsequent amendments were approved by the relevant institutional review board at each participating site. All patients provided written informed consent before enrolment. Both studies included a 45-day screening period for assessment of baseline parameters, followed by randomization (1:1) to receive either filgotinib 200 mg or placebo orally, once daily for 13 weeks (allowing pooled analysis of the primary endpoint), after which disease-specific considerations required some deviations in the designs of these trials (Fig. 1). In both trials, randomization was stratified according to disease type (UC vs CD in MANTA; RA vs spondyloarthritis [PsA, AS or nrAxSpA] in MANTA-RAy), baseline sperm concentration (15–25 million/mL vs > 25–50 million/mL vs > 50 million/mL) and methotrexate use (yes vs no).

Fig. 1
figure 1

MANTA and MANTA-RAy study designs. aStudy drug was discontinued upon entry into the monitoring phase (standard of care was initiated [or continued] in MANTA-RAy). Reversibility for a participant was defined as all semen parametersb that qualified the participant for entry into the monitoring phase returning to > 50% of baseline. bReduced semen parameters were defined as a ≥ 50% decrease in sperm concentration, and/or motility, and/or morphology compared with baseline. cDisease worsening for UC in MANTA was defined as an increase of ≥ 3 points in pMCS (to a score of ≥ 5) from week 13, on two consecutive visits, or an increase to a score of 9 (from a week 13 score of > 6) on two consecutive visits; disease worsening for CD in MANTA was defined as an increase of ≥ 100 points in CDAI score from the week 13 visit on two consecutive visits, and a score of ≥ 220 on two consecutive visits. dIn MANTA, participants who had disease worseningc between weeks 13 and 26 while on open-label filgotinib discontinued the study and completed an ET visit, and had a safety visit 30 days after the last dose of study drug. eResponse for UC in MANTA was defined as a reduction of ≥ 2 points in pMCS compared with the baseline visit. Response for CD in MANTA was defined as a reduction of ≥ 100 points in total CDAI score compared with the baseline visit; in patients with a total CDAI score of ≥ 220 (but ≤ 250) at the baseline visit, response was defined as a CDAI score of < 150. fIn MANTA, non-responderse at week 26 discontinued the study and had only a safety visit 30 days after the last dose of study drug. gIn MANTA, participants who had disease worseningc during the LTE (receiving open-label filgotinib or blinded study drug [filgotinib or placebo]) discontinued and completed an ET visit, followed by a safety follow-up visit 30 days after last study drug dose; participants receiving open-label filgotinib or blinded study drug who had reduced semen parametersb (assessed every 13 weeks) during the LTE entered the monitoring phase. hResponse for rheumatoid diseases in MANTA-RAy was defined as an improvement of ≥ 20% in PhGADA score compared with day 1. iIn MANTA-RAy, participants who were receiving open-label filgotinib or standard of care and required prohibited concomitant treatment discontinued the study and completed an ET visit. Safety visits occurred 30 days after the last study drug dose. CD Crohn’s disease, CDAI Crohn’s Disease Activity Index, ET early termination, FIL orally administered filgotinib, IBD inflammatory bowel disease, LTE long-term extension, PhGADA Physician's Global Assessment of Disease Activity, pMCS partial Mayo Clinic Score, QD once daily, UC ulcerative colitis

FDA guidelines recommend that semen analyses for the assessment of drugs intended for chronic use should be conducted at baseline, and at the end of each of two consecutive 13-week treatment periods (equivalent to two full spermatogenic cycles), and at least 13 weeks after study drug discontinuation [28]. MANTA and MANTA-RAy were designed to accommodate these requirements, up to the point where they conflicted with ethical considerations in this population, namely the need to switch to an alternative effective therapy if the subject is not responding to treatment after 13 weeks, especially with respect to the prevention of irreversible, inflammatory disease-related damage.

The double-blind treatment phase of the MANTA study consisted of two consecutive parts (A and B), each comprising a 13-week treatment period (Fig. 1). Stable dosing of permitted concomitant medications was allowed during part A, with dosing changes allowed thereafter (within specified limits). Participants who had a pre-specified decrease in one or more semen parameters from baseline (≥ 50% decrease in sperm concentration, and/or total motility and/or normal morphology) at the week 13 study visit, or any semen parameter assessment visit thereafter, discontinued study drug and entered a monitoring phase, during which they were offered investigator-selected standard of care therapy as per protocol, as long as it did not have known effects on semen parameters. During the monitoring phase, the reduced semen parameter (or parameters) that qualified the patient for entry into the monitoring phase was evaluated every 13 weeks for up to 52 weeks, or until reversibility was observed, defined as the return of that semen parameter to greater than 50% of the baseline value. Participants who did not have a pre-specified decrease in one or more semen parameters from baseline at the end of part A (week 13), and who were also disease responders, continued blinded study treatment in part B (weeks 13–26), while those who were disease non-responders were switched to open-label filgotinib 200 mg. Participants with disease worsening in part B (weeks 13–26) while on blinded study treatment were switched to open-label filgotinib 200 mg, and those with disease worsening while on open-label filgotinib 200 mg were discontinued. Responders at week 26 (blinded or open-label filgotinib) who had not had a pre-specified decrease from baseline in one or more semen parameters continued in the long-term extension (LTE). Treatment in the LTE could be continued for up to 195 weeks if participants continued to meet the LTE entry criteria (assessed every 13 weeks).

The MANTA-RAy study was designed to be similar to MANTA up to week 13, when the primary endpoint (proportion of participants with a ≥ 50% decrease from baseline in sperm concentration) was assessed—allowing for pooling of data from the two studies at the week-13 timepoint (Fig. 1). Beyond this time point, the key difference (vs MANTA) was that the MANTA-RAy study no longer provided a placebo comparator, owing to the potential for irreversible tissue damage in untreated participants with RA. Thus, at the end of the 13-week double-blind treatment phase in MANTA-RAy, all disease non-responders, and disease responders originally assigned to placebo, were switched to standard of care therapy, and disease responders originally assigned to filgotinib were switched to open-label filgotinib 200 mg. Participants in this ‘extension phase’ could continue treatment up to week 156. As with the MANTA study, participants with pre-specified decreases in semen parameters at any semen assessment visit discontinued study drug and entered a monitoring phase (Fig. 1).

Rationale for Chosen Assessments

A summary of study procedures and assessments performed in the 13-week double-blind treatment phases of the MANTA (part A) and MANTA-RAy studies is provided in Table 2. Procedures and assessments for the MANTA (part B) double-blind phase, MANTA open-label, monitoring and LTE phases, and MANTA-RAy extension and monitoring phases are provided in Tables S7–S10 in the electronic supplementary material.

Table 2 MANTA (part A) and MANTA-RAy procedures and assessments during the double-blind treatment phase

Inflammatory Disease Measures

Physician- and patient-reported outcome measures, as well as lower gastrointestinal investigations in participants with IBD, were performed at screening to determine disease activity status (Tables 1 and 2). In MANTA, partial Mayo Clinic Scores and Crohn’s Disease Activity Index scores were assessed in participants with UC and CD, respectively, to evaluate disease response to treatment (Table 1 and Fig. 1). In MANTA-RAy, Physician Global Assessment of Disease Activity scores were assessed at screening, day 1 and week 13 to determine participants’ rheumatic disease activity status (Table 1 and Fig. 1). These are well-established inflammatory disease measures that have been used in previous clinical trials of filgotinib [25, 27, 35].

Semen Analysis Methods and Logistics

There is substantial natural variation in an individual’s sperm concentrations and other semen parameters over time [36], independent of known modifiers such as fever [37] and certain anti-inflammatory medications [16]. Thus, sperm concentrations may occasionally fall below the 5th percentile of WHO 2010 reference values, even in individuals who might typically produce values above this threshold [36]. The collection of two separate semen samples for each evaluation of semen parameters, and the requirement for minimum and maximum abstinence periods between ejaculations, substantially increased the logistical complexity of the MANTA and MANTA-RAy studies, with study sites (or associated fertility clinics) and technicians all required to be available within 1 h of a sample being provided. However, this requirement was necessary to minimize intra-individual variation in semen parameters, optimize screening accuracy and ensure differences between treatment groups could be detected if present.

In both MANTA and MANTA-RAy, semen samples were collected at screening and then every 13 weeks throughout each phase of each study, as well as at early termination visits. For every assessment stage where semen parameters were measured, two separate semen samples were collected within 14 days of each other. Each semen sample had to be collected within an ejaculation-free period of at least 48 h and no greater than 7 days. In select instances where the semen sample was found to be non-assessable or invalid (e.g. collection without adherence to ejaculation-free periods; intercurrent illness or dehydration at time of collection; incomplete capture of semen sample; and/or sample processing or semen analysis that deviated from standardized procedures), a third semen sample was to be collected within 14 days of the date the previous sample was collected.

FDA guidelines recommend that the collection and handling of semen samples should be standardized for all sites, and that a single central laboratory should process and analyse all semen samples for the purposes of consistency and quality assurance [28]. However, as described, an international, multicentre approach was employed in both the MANTA and MANTA-RAy studies to ensure timely completion of the study to address potential recruitment difficulties. The requirement for fresh semen samples precluded centralized laboratory assessments of semen parameters (except for sperm morphology), especially for motility which must be measured within 1 h of sample collection.

To ensure consistency across study sites, stringent standardization processes (with central oversight by staff at Tulane University, New Orleans, LA, USA) were applied. Suitable candidate technicians for training were selected through live instruction and written tests of proficiency in the assessment of semen parameters. Technicians who successfully passed the training programme were issued a Certificate of Completion from Tulane University, followed by immediate post-training evaluations of technicians’ proficiency at their ‘home’ laboratory, and follow-up assessments to ‘requalify’ technicians every 6 months. Timely refresher calls for troubleshooting of any issues that arose, and quality control (QC) review at Tulane University of all semen analysis data to verify completion of all fields as well as technicians’ calculations, were also used to maintain consistency. In addition, staining and reading of sperm morphology smears (prepared at the study site or fertility clinic using fresh semen samples) were performed at Tulane University. Study sites and fertility clinics also had the option to use qualified ‘travelling technicians’ provided by a vendor, as either a primary option for semen analysis or as a backup option.

Inclusion of Sex Hormone Assessments

Hormone measures are routinely included in the clinic in addition to assessments of semen parameters, to obtain a more granular picture of testicular health, and (as already described) are among the few clinical parameters that can be used to reliably monitor testicular function [28]. Although no firm evidence of hormone abnormality indicative of testicular injury was observed with filgotinib treatment in the preclinical animal studies, blood samples were nevertheless collected for the measurement of sex hormone levels (FSH, LH, inhibin B and total testosterone) in the MANTA and MANTA-RAy studies (Table 2 and Tables S7–S10 in the electronic supplementary material).

Routine Safety Assessments

Safety evaluations (adverse events, haematology and serum chemistry, vital signs, body weight and symptom-directed medical examinations) and assessments of concomitant medication use occurred at every study visit. Electrocardiograms and further laboratory measures (urinalysis, lipids) were performed at predefined time points throughout each study phase (Table 2 and Tables S7–S10 in the electronic supplementary material).

Additional Safety Assessments

In addition to routine safety evaluations, an independent committee periodically reviewed and adjudicated all potential major adverse cardiovascular events (cardiovascular death, myocardial infarction and stroke) and thromboembolic events (arterial thrombosis and venous thrombosis) in a blinded manner. It is noteworthy that identification of any such signals by the independent committee would lead to early interruption and unblinding of the MANTA and MANTA-RAy trials.

Rationale for Sample Size, Endpoints and Their Analysis

Sperm concentration is considered the most reliable quantifiable parameter for providing information about effects on testicular function [28], and may be the most important stand-alone indicator of such effects [33]. However, there are no well-established differences that would be clinically relevant between treatment arms in terms of the proportions of participants with a reduction of at least 50% from baseline in sperm concentration (primary endpoint) [28]. Analyses for MANTA and MANTA-RAy were thus not powered to detect predefined differences between treatment arms for the primary endpoint. In recognition of the fact that no single semen parameter is adequate for determining drug effects on testicular function, and in line with FDA guidance, an array of additional secondary and exploratory endpoints were also evaluated, with conclusions regarding the effects of filgotinib on testicular function to be determined using the totality of the data across all endpoints [28].

Primary Endpoint

The primary endpoint in the MANTA and MANTA-RAy trials was the proportion of participants, pooled across both trials, with a ≥ 50% decrease from baseline in sperm concentration at week 13 (equivalent to one full spermatogenic cycle). This is a surrogate biomarker of testicular function that has been used in numerous trials and is the FDA-recommended primary endpoint [28, 38].

It is worth noting that a 50% reduction from baseline in sperm concentration, which is within the range of natural variation [36], will not necessarily result in absolute values that are below 15 million/mL (i.e. the 5th percentile of WHO 2010 reference values [31]). For example, an individual with a baseline sperm count of 50 million/mL will still have a sperm count of 25 million/mL after a 50% reduction, thus still exceeding this threshold value. Similarly, reductions in sperm count of less than 50% may lead to values below 15 million/mL in some individuals. In addition, high intra-individual variation in sperm concentrations may lead to regression towards the mean, that is, individuals with high ‘outlier’ values at baseline are more likely to have a 50% decrease at week 13 by chance alone, owing to the tendency of measurements at future points to be closer to the mean. Negative trends in semen parameters in placebo-treated individuals in clinical trials have been partly attributed to regression towards the mean [39], an effect likely to be enhanced by initial selection of individuals with minimum semen parameter values.

Secondary Endpoints

The proportion of participants with at least a 50% decrease from baseline in sperm concentration at week 26 was assessed as a key secondary endpoint. Additional secondary endpoints relating to semen parameters, evaluated at weeks 13 and 26, included changes from baseline in sperm concentration, sperm total motility, sperm morphology, total sperm count and ejaculate volume. It should be reiterated that only the MANTA study included a placebo comparator for week 26 outcomes (Fig. 1), for reasons already described.

Exploratory Endpoints

In participants with at least a 50% decrease from baseline in sperm concentration, and/or motility and/or morphology at the week 13 or week 26 semen assessment visits in each trial, or at any of the semen assessment visits occurring every 13 weeks during the MANTA LTE or MANTA-RAy extension phases, the reversibility of these effects was evaluated every 13 weeks during a 52-week monitoring phase (Fig. 1). Additional exploratory endpoints included effects on sex hormones (FSH, LH, inhibin B and total testosterone) at weeks 13 and 26.

Evaluations of the pharmacokinetics of filgotinib and its metabolite (GS-829845) as well as the long-term safety and tolerability of filgotinib (assessed in the MANTA LTE and MANTA-RAy extension phases—Fig. 1) were also included as exploratory endpoints.

Statistical Analysis

As noted earlier, there are no well-established differences in semen parameters that would be considered clinically relevant between treatment arms [28]. This is reflected in the methods of data presentation and analysis, which were not based on a hypothesis-driven assessment of minimum clinically important differences between treatments [28].

Each trial could recruit up to 125 participants per arm. The overall goal was to have a combined total of at least 200 evaluable participants for the pooled primary endpoint analysis of week-13 data from both MANTA and MANTA-RAy. This sample size is adequate for estimating cumulative distribution curves and producing a reasonably narrow 95% confidence internal (CI) width for the difference between treatment groups in the proportion of participants with at least a 50% decrease from baseline in sperm concentration. For the primary endpoint, a cumulative distribution plot for the percentage change from baseline in sperm concentration at week 13 was constructed, with the x-axis displaying percentage changes from baseline (range − 100% to the maximal observed increase), and the y-axis displaying the proportion of participants with a percentage change in sperm concentration equal to or less than the corresponding x-axis value. Proportions of participants experiencing at least a 50% decrease from baseline in sperm concentration were calculated for each treatment group. Differences between the filgotinib and placebo groups were calculated together with associated 95% CIs.

For secondary endpoints, continuous variables were summarized using descriptive statistics, while categorical variables were summarized by the number/percentage of participants meeting endpoint definitions. For exploratory analyses of reversibility, numbers and percentages (point and cumulative data) of participants achieving reversibility in semen parameters were displayed by semen parameter and monitoring phase visit.

Discussion

Here we report the rationale and methodology for the phase 2 MANTA and MANTA-RAy studies, which were designed to elucidate the clinical relevance (if any) in man of observed effects of filgotinib on semen parameters in preclinical animal studies [24, 40].

A significant challenge for the MANTA and MANTA-RAy studies was the need to include patients with active disease who were also willing to participate in the trials. There may be a hesitancy to be involved in trials of this nature because semen parameter measures may be mistakenly viewed as equivalent to measures of fertility and sexual performance, and steps were therefore taken to allay such fears. Exclusion of individuals with reproductive health or fertility issues, and with baseline semen parameters below the 5th percentile of the WHO reference values, likely also reduced the pool of eligible individuals, and may have selected for a population that is less representative of individuals with inflammatory diseases who would receive filgotinib.

FDA criteria recommend that drugs intended for chronic use should be administered for 26 weeks (two full spermatogenic cycles), compared with 13 weeks for drugs intended for shorter-term use. Ideally, both 13-week phases should be performed under randomized, blinded conditions with stable dosing of permitted concomitant medications. However, ethical considerations—such as the need for participants to be able to switch to a more effective therapy to avoid disease-related damage—demanded greater flexibility from week 13 onwards for participants who did not respond to study treatment. This added a challenging layer of complexity to the current trials, which would also apply to any randomized trial that aims to determine the effects of a drug on semen parameters in patients with certain chronic inflammatory conditions.

The use of a multicentre approach to the MANTA and MANTA-RAy trials helped to ensure that the minimum recommended number of participants could be achieved in a timely manner despite barriers to recruitment. However, this meant that assessments of semen parameters (except for morphology) could not be centralized. Thus, rigorous processes (e.g. stringent technician selection, initial training and follow-up training, use of travelling technicians and centralized QC methods, all with strict central oversight) were implemented to ensure that semen parameter measures were standardized across the study centres, which spanned numerous countries.

Despite being used as a surrogate marker for gonadal function, and being a key component of clinical explorations of fertility, negative effects on semen parameters do not necessarily equate to reduced fertility. As an equation involving two partners, fertility assessment should account for factors such as the overall health and fertility of the female partner, which also contribute to the chances of natural conception [41, 42]. IBD is associated with an increased frequency of several sexual health issues that may impede conception, including erectile dysfunction, reduced libido, impaired body image, and anxiety, which probably contribute to negative impacts on reproductive health [7,8,9]. Sexual function and reproduction can similarly be impaired in patients with rheumatic diseases [12,13,14]. In addition, inflammation itself may directly impair hormonal production and sperm quality in patients with inflammatory diseases [6, 14, 43,44,45]. It is reasonable to expect that successful treatment of inflammatory diseases may alleviate some of these factors to improve overall fertility.

Sexual health and reproductive health have distinct components, but are also inherently interlinked [46]. For example, concerns about the effects of diseases or treatments on reproductive health may have emotional and psychological consequences that affect sexual health. Conversely, diseases or treatments that affect aspects of sexual health, such as sexual function, may affect reproductive health. A recent post hoc analysis of data from the phase 3 FINCH trials investigating filgotinib for the treatment of RA reported that baseline disease activity negatively impacted sexual function in both male and female patients, and that active treatment with filgotinib or adalimumab resulted in early and sustained improvements from baseline in sexual function [47]. The net effects of anti-inflammatory drugs on components of fertility must therefore be carefully considered in this population.

Conclusions

In summary, despite the complexities associated with the investigation of potential drug effects on semen parameters in patients with chronic inflammatory diseases, unique and robust trial designs were achieved for the MANTA and MANTA-RAy studies. This landmark trial programme, including more than 200 participants, is the first large-scale, placebo-controlled evaluation of the potential impacts of an advanced therapy for immune-mediated inflammatory conditions on semen parameters. These studies may inform, and provide a template for future evaluations and offer a more holistic view of this under-studied and challenging aspect of living with a chronic inflammatory disease.