Background

Interstitial lung diseases in children (chILD) are a large and heterogeneous group of rare and chronic conditions, often related to inflammatory and sometimes to fibrosing disease processes [1]. They mainly affect the lung parenchyma, cause severe morbidity in a relevant proportion of affected individuals, and have a mortality of about 15% [2].

Up to date no proven anti-inflammatory or anti-fibrotic treatments of these conditions are available, as no prospective trials on efficacy and safety of treatments in chILD have ever been performed [3]. The few pharmacological options used are based on anecdotal experience and small case collections. For several decades, besides systemic glucocorticosteroids, most commonly hydroxychloroquine (HCQ) or chloroquine have been applied [4].

HCQ can inhibit the production of inflammatory cytokines (e.g. IL-1, IL-6, TNFα and INFγ) and the degradation of intracellular cargo via the autophagy pathway [5]. It can interfere with aberrantly produced proteins in cells affected by pathogenic variants in the genes for surfactant protein C [6], ABCA3 [7], COPA [8] and others. These proteins are degraded via the lysosomal pathway or may be presented as autoantigens and drive undesirable inflammatory and pro-fibrotic immune responses [5]. This interference might explain the favourable clinical responses to HCQ or chloroquine reported in cases and small series of children with interstitial lung disease [4]. As these drugs are often given for many years and potentially cause severe side effects, there is an urgent need for evidence [9, 10]. Therefore, the European-wide project chILD-EU initiated a randomized phase 2a study of HCQ in chILD, evaluating the efficacy and safety of the mid-term use of HCQ [2].

In this study we focused on the efficacy and safety of hydroxychloroquine (HCQ) in patients with chILD and a lung histology pattern of chronic pneumonitis of infancy, non-specific interstitial pneumonitis (NSIP), desquamative interstitial pneumonitis, microscopic alveolar proteinosis or cholesterol pneumonitis, pulmonary hemosiderosis, follicular bronchiolitis and lymphocytic interstitial pneumonitis, as well as on chILD caused by mutations in SFTPC, ABCA3, NKX2.1, TBX4, or COPA.

Methods

Trial design and participants

This study was a prospective, multicentre, 1:1 randomized, double-blind, placebo-controlled parallel-group/crossover phase 2 clinical trial. The study design was previously described in detail [11]. In summary, subjects with a chronic (≥ 3 weeks’ duration) diffuse parenchymal lung disease and eligible for treatment with HCQ (START arm) or withdrawal of HCQ (STOP arm) were asked to participate. We assessed in- and exclusion criteria (Tabs. 2 and 3 in [11]), obtained written consent and for logistic reasons, i.e. personalized preparation of weight adapted study medication, randomized each subject after screening evaluation (Fig. 1). In the START arm subjects were allocated to 4 weeks of placebo (group A) or HCQ (group B; receiving 10 mg/kg bodyweight/d during the first week, then 6.5 mg/kg/d orally in the evening). Then subjects from group A were switched to HCQ for 4 weeks (groups C), while group B remained on HCQ for another 4 weeks (group D). In the STOP arm subjects already taking HCQ for at least 3 months were randomized into parallel groups treated with HCQ (group E) at the dose they were already on or with placebo (i.e. means withdrawal of HCQ, group F). After 12 weeks all subjects stopped medication and moved into open observation for another 12 weeks (groups G and H, Fig. 1). Each subject could participate in each arm only once; arms were initiated in any sequence (see Fig. 1 for study scheme).

Fig. 1
figure 1

Flow Diagram (CONSORT) and trial design

Subjects had to be clinically stable between screening and baseline visit. All subjects were included in the registry and the diagnosis was verified by a structured peer review process (2). We included seven children with ABCA3 deficiency, six with surfactant protein C deficiency, two with NKX2.1 deficiency, three with COPA syndrome and one with TBX4 deficiency and fibrosing filamin A deficiency, respectively. Four subjects without genetic proof of a lung disease had a NSIP histologic pattern, one subject each had the histological pattern of pulmonary alveolar proteinosis, pleuroparenchymal fibroelastosis, and idiopathic desquamative interstitial pneumonitis. One case each of chronic tachypnea of infancy, nodular lymphoid hyperplasia of the lung, fibrosing hyper IgG4-syndrome, and sarcoidosis were also included. Two subjects were diagnosed as idiopathic pulmonary hemosiderosis (IPH) and two others suffered from chronic diffuse parenchymal lung disease, which could not further characterized. Active study centers including patients were the University Children´s Hospitals at Munich, Hannover, Essen, Frankfurt, Tübingen and Bochum.

Outcomes

The primary study endpoint was the presence or absence of a response to treatment. A responder was defined as a subject who had a predefined change in oxygenation at rest and calm wakefulness. Oxygenation was assessed by measurement of the transcutaneous O2-saturation by pulse oximetry, the respiratory support level necessary to achieve this level and respiratory rate. Different level of respiratory support were defined as invasive ventilation, non-invasive ventilation, high-flow O2 nasal cannula, low flow O2 by prongs/mask and room air.

In subjects who were on low flow oxygen O2-saturation was measured after O2 withdrawal for at least 5 min. In patients included in the START group and who were off oxygen or on low flow oxygen at study entry, response was defined as an increase of oxygen saturation by ≥ 5% and/or a decrease of respiratory rate at rest ≥ 20% compared to baseline, assessed under room air conditions. In patients with a higher level of respiratory support at the time of inclusion, response was defined as a sustained decrease of the respiratory support compared to baseline. For STOP patients O2-saturation had to decrease by ≥ 5% or the respiratory rate to increase by ≥ 20%, assessed under room air conditions, or the subject needed an increased level of respiratory support.

Secondary endpoints were exploratory and included among others the modified definition of a responder, as a subject who had a change of the oxygenation by 3%, changes in O2-saturation in room air, respiratory rate, health related quality of life (HrQoL) [12], BMI percentile, pulmonary function [13] and 6-min walk test (6MWT) distance.

Safety monitoring included adverse events (AEs), clinical laboratory values (differential blood count, glutamic oxaloacetic transaminase (GOT), glutamate-pyruvate transaminase (GPT), gamma glutamyl transpeptidase (gGT), creatinine, lactate dehydrogenase (LDH), potassium, creatine kinase, blood glucose levels), HCQ steady-state drug level [14], electrocardiography, echocardiography and repeated ophthalmological examinations.

Statistical methods

As the study was exploratory there was no formal sample size calculation. All subjects randomized were included in the intention to treat analysis (ITT), which was defined as the primary analysis population. Statistical sensitivity analysis were planned for the combined analysis of all patients in the START and STOP arms. However, depending on actual recruitment structure, the assumption of independence of subjects participating in both study parts might not be fully justified. Those subjects receiving at least one dose of study drug defined the safety population. Data are given as mean and standard deviation or frequency of events. Changes with treatment were calculated and compared between placebo and HCQ groups. The groups are defined in Fig. 1. Continuous variables were compared by unpaired or paired t-tests, responder frequencies by Fisher exact tests or Mc Nemar test. Bonferoni corrections were made for using a variable repetitively; a level of P < 0.05 was considered significant. To estimate the magnitude of the treatment effects for independent responders odds ratios with 95%-confidence intervals, for dependent responders Kappa coefficient with 95%-confidence intervals and for continuous variables effect sizes defined as the changes to baseline of both treatments divided by the pooled standard deviation of the changes to baseline with 95%-confidence intervals were calculated from treatment effects under HCQ or placebo.

Results

Enrollment and baseline characteristics of the subjects

35 subjects were assessed for eligibility, 26 for the HCQ START arm and 9 for the HCQ STOP arm (Fig. 1). There were five screening failures and one drop out before drug intake. On study medication, another two subjects ended the trial prematurely, one in the START and one in the STOP arm. Only four subjects were included in both arms; we considered these subjects as independent individuals. The trial was terminated after 3.8 years of recruitment after a temporary interruption due to a competent authority inspection, associated with losses of time and resources of more than one and a half years, resulting in insufficient capacity thereafter to continue. The baseline data of the subjects included were not different between the groups and characteristic for children affected by interstitial lung disease (Table 1).

Table 1 Baseline data

Outcome- efficacy results

The primary endpoint, the presence or absence of a response to the treatment, did not differ between placebo and HCQ groups (Table 2). In the START arm there were no responders to placebo treatment (group A), as were for HCQ in the parallel group (group B). After switching from placebo to HCQ three responder were noted (group C). Combining the two HCQ treatment groups B and C did not change the result. We obtained similar results in the STOP arm: no responder to placebo treatment (= withdrawal of HCQ) (group F), as for HCQ treatment (= continuation of HCQ) in the parallel group (group E). After open label observation (= no medication, no med.; = withdrawal of HCQ) (group G) one responder was noted (Table 2). To increase the sensitivity we explored an adapted responder definition. Based on the minimal important difference for O2-saturation we used a 3% threshold for change in oxygenation. Again, we observed no differences (Table 2). To describe the size of the treatment effects obtained we calculated the odds ratios of the responders under placebo and under HCQ; these were around one, negative or could not be calculated, as there were zero responder.

Table 2 Number of responders to treatment

For all the continuous variables, we calculated the changes for the different treatment groups (Table 3). Absolute changes in O2-saturation, respiratory rate, HrQoL and in pulmonary function or exercise tests, were not significantly different with treatment, neither in the parallel group (A vs. B), the paired (A vs. C) nor the combined (A vs. B + C) comparisons for the START arms or the STOP arms. Of interest in the START arm, BMI percentile dropped with HCQ treatment (borderline level of significance (Table 3)). Thus we did not observe significant differences with interventions.

Table 3 Absolute changes from baseline of secondary outcomes

Sensitivity analysis

In an exploratory sensitivity analysis we combined all treatment periods with placebo and all periods with HCQ from START with those from STOP, the latter adjusted by multiplying for withdrawal by -1 (Table 4) and using data from START and STOP independently. Again, we did not identify consistent treatment effects of HCQ for the primary and the secondary endpoints (Responder (MID), O2-saturation, respiratory rate in room air, and FVC absolute change). Significant decreases of HrQoL, assessed as total score and BMI percentile in the HCQ treated groups were noted; effect sizes were again small (Table 4).

Table 4 Changes observed from START and those from STOP treatment groups were combined to explore maximum number of treatment effects

Outcome: safety

Adherence to the study medication was 91% in both, START and STOP arms, and were not different between placebo and HCQ treatment. In general, the study drug was well tolerated. In almost all subjects, adverse events were observed (Table 5). These were primarily gastrointestinal or respiratory infections. There were no differences in frequency of AEs between placebo or HCQ groups. During the entire study, we observed only one serious adverse event. This occurred in the placebo group in a sick infant on non-invasive respiratory support who had to be intubated due to an intercurrent respiratory infection. The event resolved completely. Overall, the AE were characterized by the morbidity of the study population and the known side effect spectrum of the study drug. HCQ whole blood levels, measured at the end of the study in the START subjects did not differ from baseline levels in patients who were to discontinue HCQ (mean dose 6 mg/kg body-weight) (Table 1). This suggested that a steady state was achieved in blood. Of interest, intra-individual values were rather constant, whereas, inter-individual levels varied considerably.

Table 5 Adverse events during the study in the safety population. Given are numbers of subjects with events and number of events (absolute/% of total)

Discussion

In this double blind, randomized controlled exploratory phase 2 trial in paediatric patients with chILD, we evaluated the efficacy and safety of the use of HCQ. The primary outcome was change in oxygenation, determined from O2-saturation at room air, respiratory rate or a change in respiratory support. The results of these and the other key secondary endpoints did not differ between HCQ and placebo treatment periods. Adherence to the treatment was good, the drug was well-tolerated and appeared save.

The authors were aware that this investigator-initiated study in a group of ultra-rare conditions might have difficulties recruiting subjects, even in centers specialized to treat such conditions. Therefore, we classified the study as exploratory and developed a design, which allowed including many potential participants by close alignment of study procedures to everyday patient management. To treat all participants with active drug, we implemented a switch from placebo to HCQ for all subjects. Similarly, a controlled withdrawal of HCQ was ensured in all. In an exploratory statistical sensitivity analysis, we combined all observed effects in treatment and placebo periods to maximize contrasts. Although the drug had been widely used in children, the execution of the study was monitored very closely. With all these measures we took as many precautions as possible to optimize the study design, the execution of the research and the validity of the study results.

After a routine inspection by the authorities and the identification of recoverable findings, the study was temporarily suspended. Whereas we duly addressed all issues raised on the patient and center level, including shortcomings in storage of study medication and documents, documentation logs and consenting procedures, structural improvements beyond the sponsor delegated person´s liability would take longer. These involved the University hospital´s overall study structure and included defects in sponsor oversight from non-uniform SOPs, structural control deficits, non-systematic electronic case-report form user right management, missing risk analysis plan, and insufficient change control management. The study was already recruiting for almost 4 years and 35 subjects were included. In particular as the COVID-19 pandemic spread, we decided closing the trial due to insufficient capacities to continue. All data were extensively and carefully reviewed by monitors on site and remotely, where feasible. Additionally, we assessed data completeness and internal consistency by central monitoring before data base closure. Based on these suppositions, we classified the quality of the trial and obtained data as well suited for analysis.

The dichotomous primary endpoint is appropriately expressed as odds ratio of the responders under placebo and under HCQ. In both the, START and STOP arms, there were no responders in the HCQ and placebo groups. Thus, no ratio could be calculated. In an exploratory analysis we reduced the threshold for response by using a 3% change in O2-saturation. Now in the START arm odds ratios around 1 and a small kappa coefficient could be calculated, not supporting a treatment response to HCQ. To increase the study power as much as possible, we combined all treatment groups, i.e. all 27 “HCQ treatments” and the 16 “placebo treatments” (Table 4). Nevertheless, odds ratios and kappa coefficients of responders defined by protocol or MID definition, as well as the effect sizes of the relevant secondary outcomes lung function and quality of life were all marginal, most often spanning zero and clinically negligible.

When reflecting about the response rates we hypothesized that about 70% of the subjects would respond to HCQ and 35% to placebo [15]. These assumptions were based on our comprehensive literature review which identified 85 patients treated with HCQ between 1984 and 2013 who were found to have a 41% response rate [4]. However, it must be considered that in those publications “response” was primarily a clinical impression and not defined precisely. Complicating, other medications like systemic steroids were often started at the same time as HCQ. Only 16 patients were treated exclusively with HCQ and of these, 88% (14 patients) responded [4]. Such a high response rate might be due in part to a publication bias for positive studies and is very likely further skewed by uncontrolled treatment conditions, undefined response criteria and retrospective analyses. However, if such a high response rate was real, it was very unlikely to have been missed in this study, as only 32 patients would be needed to detect the treatment difference in a post-study calculation using a power of 80% and an alpha levels of 5%. Having all this in mind, we must be aware that there is a chance to incorrectly accept the null hypothesis and falsely rate this treatment negative.

Additionally to the limitations listed further issues need to be considered. First, based on a Delphi process involving chILD experts [3], we chose a treatment duration of four weeks to determine the response to the study drug. However also after 8 weeks of HCQ treatment (Group D) there were no more responders (data not shown). Longer term treatments could be investigated in future trials. Second, the wide range of chILD diagnoses reported to respond to HCQ [4] and included into the study could mask strong responses in certain conditions. However we did not identify response clusters in molecularly or histologically defined chILD sub-entities (data not shown). Whereas in adults with interstitial lung disease, lumping approaches to assess drug effects are common practice [16], a gene and mutation specific treatment of patients based on strong in vitro evidence was very successful in cystic fibrosis [17, 18]. Unfortunately, to date there is no relevant in vitro test for HCQ linking it to lung disease [5]. In ABCA3 deficiency, an important chILD subgroup, this approach has been shown to be effective for some compounds [19, 20].

Currently an industry sponsored phase 3 trial of nintedanib in children with fibrosing chILD (ClinicalTrials.gov: NCT04093024) is ongoing, aiming to include at least 30 patients [21]. This points out the extraordinary logistic effort and financial power necessary to recruit such a relatively small number of subjects in this condition.

Conclusions

For the first time this study has generated controlled evidence on the effect size of HCQ treatment in chILD. Disappointingly and considering the many precautions indicated above, we suggest that the past optimistic appraisal of HCQ in chILD needs to be revised. In every instance it is prescribed to children its efficacy should be assessed repeatedly and quantitatively, the length of treatment needs to be limited to reasonable periods, and the patient is best followed in a chILD-register for future data aggregation [2].