Introduction

Niemann–Pick disease Type C (NPC) is a devastating, rare neurodegenerative disease characterised by a defect that severely impedes cellular lipid trafficking [1]. Inherited in an autosomal recessive manner, individuals with NPC have mutations in one of two genes, NPC1 or NPC2. Approximately 95% of affected individuals have mutations in NPC1 [1]. As a result, cholesterol and sphingolipids accumulate within the endosomal/lysosomal system, degrading the central nervous system (CNS) and causing a diverse number of neurological symptoms depending on the patient’s age at onset. These symptoms may include cerebellar ataxia, dysarthria, dysphagia, cataplexy, seizures, dystonia, vertical gaze palsy, progressive dementia and death by 8–25 years of age [2].

The exact prevalence of NPC disease is difficult to calculate due to inadequate clinical awareness as well as the relative complexity of biochemical testing. However, it has been estimated to be 1 case per 100,000 live births [3]. The severe disabilities caused by NPC, particularly during the later stages of the disease, affect a patient’s entire family and optimal disease management requires highly specialised healthcare within a multidisciplinary care setting. Although NPC is not yet curable, knowledge on its pathogenesis has increased several-fold since the characterisation of the NPC1 and NPC2 genes. The focus of therapy remains symptom management, while advances are made in identifying effective disease-modifying treatments and investigational therapies.

The goal of the research into potential treatments for NPC is to develop drugs that are safe, effective and accessible to all members of the community. However, because NPC is an ultra-rare disease with considerable variability, designing and defining clinical trial inclusion criteria and endpoints can be challenging. Following a series of multidisciplinary discussions that culminated in an interactive workshop held at the Niemann Pick UK (NPUK) Annual Conference in 2019, with input from patients, clinicians, researchers, and industry representatives, it was agreed that there was a pressing need to develop a consensus on the use of existing NPC clinical severity scales in routine clinical practice and clinical trials. By determining such consensus, assessments across the world could be standardised to establish comparable data sets and demonstrate treatment efficacy through meaningful outcome measures.

Several scales have been developed and published over the past two decades but, essentially, all are based on a four-domain scale initially developed by Iturriaga et al. [4] (see Table 1). The present study aimed to establish consensus on the use of the clinical NPC severity scales listed in Table 1 in three different settings: routine clinical practice, clinical trial enrolment and clinical trial assessment. A Delphi method of consensus development was used to integrate anonymised perspectives from a group of international clinical experts with expertise in treating both paediatric and adult NPC patients and utilising scales to determine NPC severity. The Delphi method has proven to be a reliable measurement instrument to derive the opinion of a group of experts and evaluate the extent of agreement and to resolve any disagreement on a topic [5]. It has been widely used to establish a consensus across a range of subject areas. The study was coordinated as an iterative process of three surveys, with the questions in each round based on the previous round’s results.

Table 1 Six clinical NPC severity scales under investigation

The objectives of this study were to build consensus among international experts in the field of NPC on: (i) the preferred clinical scale(s) for assessing NPC severity (ii) the most suitable NPC severity scale to be used within each of the following three settings: routine clinical practice, clinical trial enrolment and clinical trial assessment.

Methods

Study design

The Delphi technique is a reliable measurement instrument for developing novel concepts and setting the course of future-orientated research [6]. It assesses the opinion of a group of experts to gauge their levels of agreement and to resolve disagreement on an issue [5] and has been used successfully across a range of subject areas to gain a clinical consensus [7,8,9]. A Delphi study was carried out to gain a clinical consensus on six existing NPC clinical severity scales (see Table 1) that can be used within the following three settings: routine clinical practice, clinical trial inclusion criteria and clinical trial endpoints. A summary of the six severity scales and how they have been used in clinical practice and trials to date was shared with participants for their reference. Twenty experts were invited by email to participate and nineteen experts, active in NPC paediatric and adult research and treatment, participated in this study, all were known to be competent in English and all materials including the survey were conducted in English.

The Delphi technique is an iterative process that comprised three rounds. Participants were sent a link to an electronic survey for each round. Ahead of the first round of this Delphi study, participants received two documents: 1) Summary of the six existing clinical severity scales and 2) Clinical trials summary (see “Appendices”). Round 1 aimed to gather opinions on the use of the six severity scales and the key domains that should be measured in each clinical setting. Round 2 and 3 strived to gain consensus on these opinions. Ahead of Round 2, participants received the summary of the opinions revealed in Round 1. Anonymity was maintained for participants. Panel members were not made aware of the other panel members, except for MP a co-author and panel member, and participant identifiers were removed from the summary of opinions given to participants ahead of Round 2. This is an important consideration in Delphi studies to allow individuals to express their opinions freely and openly. However, the results of Round 2 were not shared ahead of Round 3 to avoid influencing the response.

Round 1

In Round 1, 16 specialists took part in a nine-question survey. Each of the nine questions constituted two parts: (a) a multiple-choice question and (b) a free-text question, that asked for reasoning, further insight or a recommendation based on their answer to part (a). The first round aimed to gather opinions on the six severity scales and domains that should be assessed in routine clinical practice, clinical trial inclusion criteria and clinical trial endpoints.

Round 2

In Round 2, 16 specialists, 11 of whom took part in Round 1, participated in an eleven-question survey. Participants were asked to independently rank nine statements using a 5-point Likert scale ('strongly agree', 'agree', 'neither agree nor disagree', 'disagree', 'strongly disagree'). The final two questions of the survey were free-text questions about the NPC severity scales. Consensus was determined as agreement, or neutrality, by greater than or equal to 70% of the participants.

Round 3

In Round 3, 19 experts took part in a six-question survey, which used the same 5-point Likert scale as in Round 2. The aim of this final round was to gain consensus on what should be recommended based on opinions from Rounds 1 and 2. Consensus was defined in the same way as in Round 2.

Three survey rounds are considered optimal when trying to reach consensus [10]. They also allow the free-text question responses in Rounds 1 and 2 to be incorporated into Rounds 2 and 3, respectively. All surveys were administered using SurveyMonkey and survey links were distributed via email.

Consensus definition

Consensus was defined as greater than or equal to 70% of participants strongly agreeing/agreeing/neutrality on the Likert scale questions in Rounds 2 and 3. This level of agreement has been considered sufficient in several previous Delphi studies [11, 12]. Neutrality was included as a part of the consensus as the purpose was to identify the severity scales that the clinical community would accept for international consistency. Therefore, a neutral response implies that the individual would not be against the scale in question being adopted by the community and therefore willing to use.

Core working group

The core working group was formed from key stakeholders who agreed to be involved at the NPUK annual conference in 2019. The group represents the patient community, TM, a parent of affected NPC children and an experienced international patient advocate and leader, and WE a parent of an affected child, with WE also having previous experience of conducting clinical surveys and consensus development; an internationally recognised NPC clinician, MP; an internationally recognised NPC researcher who co-developed an approach to NPC patient stratification, FP [16], a pharmaceutical industry expert in clinical outcomes CG, and a medical communications expert, JP.

Survey development

The initial survey development involved the definition of a research question and development of the questions to be used in Round 1, based on the study team’s expertise and a review of the literature. This initial development was carried out by the Core Working Group. To meet the study objectives, the survey was split into three sections. The first round included questions to establish opinions on the most useful NPC severity scales and domains measured in each clinical setting and the second and third round aimed to gain consensus on the opinions gathered in Round 1.

Expert panel recruitment

In Delphi studies, the minimum number of participants to be considered sufficient for achieving a consensus has been debated, with recent literature suggesting that larger sample sizes can deliver diminishing returns concerning the validity of the findings and that small panels of similarly trained experts in a specialist field provide stable results to support effective decision-making. [13,14,15] In a specialist rare disease area, such as NPC, reaching a prescribed minimum target poses a challenge due to the limited total potential pool of qualified participants. Nonetheless, 20 international specialists from Europe, the United States, Australia and South America were invited to complete the Delphi study, of which 19 agreed to participate. The professional community in NPC is very small, given the rarity of the disease, so the authors of the existing clinical severity scales that are still practising as NPC clinicians were also invited to take part. The participants were identified by Dr William Evans, Chair of NPUK, and ratified by the Core Working Group as key specialists in NPC around the world and invited via email to participate in this Delphi study. Dr Marc Patterson, as the only Core Working Group member who is also a practising NPC clinical specialist, also took part in the Delphi panel.

Results

Participants

Each survey round of this Delphi study comprised a representative panel of clinical experts (the Expert Panel) treating both paediatric and adult NPC patients, from seven different countries: United States of America (n = 6), United Kingdom (n = 5), Germany (n = 3), Spain (n = 2), Brazil (n = 1), France (n = 1) and Australia (n = 1). A little more than half (58%) of the study participants included in the study were paediatric specialists.

Round 1

In Round 1, consensus was reached amongst the 16 international experts on the five most important domains to be measured to assess NPC clinical severity in the context of all three clinical settings (routine clinical practice, trial enrolment and clinical trial outcome measures). These included: ambulation, cognition, fine motor, speech and swallowing. Although these are the five domains captured in the 5-domain NPCCSS scale, the group was far from unanimous in the ambition to use a single scale across each of the clinical settings. Nonetheless, the 5-domain was among the highest-ranked for preferred use within all three settings: the top choice for 43.75% of participants for routine clinical use (versus 18.75% for the 17-domain NPCCSS, Disease specific disability scale and Functional disability scale); 37.5% for trial enrolment (second to the more granular 17-domain NPCCSS, chosen by 43.75 of participants); and 50% for clinical trial outcome measures (followed by the 17-domain NPCCSS preferred by 31.25% of participants). The most divisive question of the survey was regarding the adoption of a single severity scale in all scenarios, with some responses supportive of the consistency and optimisation of a scale on a global scale while others suggested that a single scale would be too reductive. Based on Round 1 results, detailed in Table 2, the second round focused on questions that asked participants to rate statements according to a typical Likert scale.

Table 2 Responses to statement included in Round 1 (16 respondents)

Round 2

In Round 2 consensus was achieved amongst 16 of the experts for six of the nine statements (see Table 3). The panel of experts agreed that it was ‘desirable’ (81%) and ‘achievable’ (75%) to determine a single, standardised NPC clinical severity scale for routine clinical practice and clinical research on a global scale within the scope of the existing scales. Further, 100% of respondents agreed that a clinical paper recommending which NPC clinical severity scale should be used in each clinical setting would be valuable to the international clinical and patient community. Consensus was also reached on the statement that the domains measured in the 5-domain scale provided an accurate clinical understanding of NPC severity in clinical practice and trials (87%) and, if there was only one international scale recommended for use evaluating the disease, it would be the 5-domain NPCCSS (81%).

Table 3 Responses to statement included in Round 2 (16 respondents)

Two further statements narrowly missed reaching a consensus by 1% (69% consensus respectively). These related to whether it was essential to measure all 17-domains during a clinical trial and whether the 5-domain scale satisfies the requirements for use in all clinical settings. The final statement on which consensus was not reached related to the feasibility and need to develop a novel NPC clinical severity scale that satisfies requirements for use in all clinical settings.

The key themes of the responses about a new, universal NPC clinical severity scale (Question 10) included: a need to incorporate quality of life measures, age/subtype dependant items (such as epilepsy and cataplexy in late infantile-juvenile) and a video of patient performance during a 9-Hole Peg Test (9HPT) and 8-min walk test. When asked for recommendations to implement a more uniform approach to the use of NPC severity scales, participants suggested a published systematic review of the current scales, a published expert consensus, the inclusion of biochemical markers and neuroimaging, and to provide more agency to each patient (such as an app to fill in regularly) to help the doctors achieve personalised treatment. The key insights from the open-ended questions in Round 2 are summarised in Table 3.

Round 3

In Round 3, consensus was reached on five out of the six statements (see Table 4). Despite consensus (81%) achieved during Round 2 that the 5-domain NPCCSS scale was the preferred scale for routine clinical practice and trials, the suggested recommendation in Round 3 that this be positioned as the first-choice scale in routine clinical practice, did not quite reach consensus (68%). However, the panel of 19 experts agreed that the 17-domain NPCCSS scale should be recommended as the first choice to assess the severity of NPC in clinical trial settings, with the domains listed in the 5-domain scale prioritised as the primary endpoints (74%). Furthermore, 74% of respondents agreed that there is no need for a new universal scale for all settings to be developed. However, resources or training on how to apply the NPCCSS (17- and 5-domains) should be developed and provided to clinicians working in NPC (89%). Further, 84% agreed that the consensus paper should be reviewed every five years to ensure that recommendations remain accurate.

Table 4 Responses to statement included in Round 3 (19 respondents)

Discussion

This Delphi study achieved consensus during Round 2 that the domains measured in the 5-domain NPCCSS scale provided an accurate clinical understanding of NPC severity. If there was only one international scale recommended for use in routine clinical practice, the respondents would recommend use of the 5-domain NPCCSS scale. Although this statement achieved consensus in Round 2, amongst a panel of 16 NPC specialists who completed the first two rounds, it did not quite reach consensus in Round 3 from a panel of 19 experts.

In Round 1, respondents highlighted the 5-domain NPCCSS scale as simple, accurate and quick to administer and complete in a routine clinical examination and that its simplicity was valuable for multi-centre trials to support reproducibility and reliability across sites. Further, it was noted that the domains measured in the 5-domain scale are present in nearly all cases of NPC as the disease develops, unlike hearing loss and seizures, which are typically present in only a small percentage of patients. Respondents also noted that the domains measured in the 17-domain scale posed several challenges. For example, as a domain, memory is difficult to separate from the cognition domain and that measuring changes in the eye movement domain can be problematic.

However, the 5-domain scale was seen as insufficient for evaluation of specific subsets of patients, such as those with mainly psychiatric involvement or experiencing seizures. Moreover, answers in Round 1 stressed the importance of the granularity of scores and the comprehensiveness provided by the 17-domain NPCCSS scale, in capturing the progression of late-onset patients with a slowly progressing disease, as well as for measuring change and baseline assessment in clinical trials. This likely led to the 74% consensus in Question 2 of Round 3 that the 17-domain NPCCSS should be the first-choice severity scale in clinical trial settings.

Given these insights, the Core Working Group recommends that the 17-domain NPCCSS is used as the preferred scale to assess NPC severity across clinical trial enrolment and trial outcome measures. However, the domains listed in the 5-domain scale (ambulation, cognition, fine motor, speech and swallowing) should take precedence as primary endpoints as they are the most relevant to describe neurological disease progression and quality of life [16]. As supported by the experts in Round 1, use of the 5-domain NPCCSS is recommended in multi-centre trials to support reproducibility and reliability of results across multiple trial sites. Lastly, the Core Working Group recommends that the 5-domain NPCCSS scale is used within routine clinical practice to assess the clinical severity of NPC patients. These recommendations provide greater global consistency and optimisation of both the 17- and 5-domain NPCCSS scales, whilst not becoming too reductive, which was noted as important by respondents in Round 1.

The Core Working Group also recommends that resources or training on the NPCCSS scales (17- and 5-domains) should be developed and provided to clinicians working with NPC patients to optimise the standardisation of their application. Further, it is advised that this consensus paper should be reviewed every five years to ensure that the recommendations remain accurate.

This Delphi study gathered consensus on the use of six existing NPC clinical severity scales, the findings for which have enabled the research team to deduce several significant recommendations and areas for further development. Drawing on an international panel of NPC clinicians, who treat both paediatric and adult NPC patients, views were gathered from a select, yet representative panel of experienced experts in the field. However, the rarity of NPC disease means that there is a limited global community of NPC specialists. As a result, the size and composition of the expert panel may reduce the generalisability of the results, and consideration should be given in future international consensus work to ensure the panel’s composition represents the global NPC community with if necessary, the inclusion of translated materials into the participants first language to reduce potential bias. Nonetheless, the final sample size (16 participants in Round 1 and 2 and 19 participants in Round 3) was greater than broadly accepted sufficient panel size of 10–15[17]. Given the global scale upon which this field operates, the Delphi consensus method, which can be conducted quickly and online, was an appropriate tool for collecting responses. In addition to identifying the areas of consensus, the study highlighted areas where there is less certainty in the field, such as balancing the need for greater consistency of a single, global multi-domain scale with the concern of becoming too reductive.

While a strength of the study was its ability to access an international network of specialists in the field of NPC research and treatment, some of the participants included in the study were those who developed the clinical severity scales under evaluation. The strong opinions from these participants may therefore have introduced some response bias. Further, it is acknowledged that the concept of ‘consensus’ is fairly fluid. While we have consensus, there are still experts among the group who strongly disagree with the recommendations and hold these views firmly. Given the small size of the expert community, research is unlikely to ever to reach consensus across all statements. However, the fact that 19 out of 20 invited participants took part in the Delphi study highlights both the perceived importance of this piece of work to the NPC community, and the influential role that patient groups can have in bringing together stakeholders for such projects. According to guidance from the National Institute for Health Research (NIHR) Health Technology, the Delphi technique typically results in a 20% dropout rate over the three rounds of consensus development. In this study, there was an absence of dropouts in any of the three rounds, therefore substantiating the validity of our recommendations.

A key limitation of this study is that it does not offer definitive guidance, as consensus in Round 2 on the 5-domain NPCCSS as the preferred scale for routine clinical practice did not reach final consensus in Round 3. This may be a result of nuances in question phrasing, or the use of a 5-point Likert scale, the use of a 9- or 10-point scale in future studies may provide a more sensitive measure to draw more nuanced conclusions.

However, the insights obtained were adequate to make several reliable recommendations. As a result, this consensus might facilitate a platform to enable standardisation of data capture and agreement on use for outcome measures.

We believe this study can help to inform and position future discussion around the use of the existing NPC clinical severity scales in clinical practice and trials. As more data, including genomic data, for NPC become available, the findings will become even more important and there may be a need to reconsider which parameters are most important and whether the preferred scales should be amended accordingly. Similarly, outcomes of ongoing trials of disease-modifying therapies for NPC will drive the need to identify the most appropriate clinical severity scale for determining drug efficacy.

Conclusion

Within this Delphi study, experts confirmed that there was no need for a new universal scale for all settings to be developed. However, they highlighted a need to strike a balance between greater optimisation of a global, single multi-domain scale and it becoming too reductive when choosing between the six existing scales. Although consensus was achieved in Round 2 on the 5-domain NPCCSS as the preferred scale for routine clinical practice, this did not achieve a final consensus in Round 3. Given the small size of the expert community, research is unlikely to ever reach consensus across all statements. However, several meaningful recommendations could be drawn from the study. In line with the consensus achieved in Round 3, this study recommends the use of the 17-domain NPCCSS scale across clinical trial settings, but the five domains measured in the 5-domain scale should be prioritised as primary endpoints. Further, this study recommends the use of the 5-domain NPCCSS scale in routine clinical practice. The findings also indicate a need to develop educational and training materials on how to apply the NPCCSS (17- and 5-domains) for clinicians working in NPC.