Phantom limb pain (PLP) is a subtype of post-amputation pain with prevalence between 17 and 88% by one meta-analysis study [1] and lifetime prevalence between 76 and 87% by another recent systematic review [2]. A wide range of treatment methods for the condition have been proposed over the years [3, 4]; however, reviews and surveys have repeatedly stressed that there is little evidence from appropriate randomized controlled trials (RCTs) to guide treatment choice, be it pharmacological or non-pharmacological [5,6,7,8]. Although experts’ recommendations seem to converge on a handful of approaches (including graded motor imagery, mirror therapy, and amitriptyline), they are based on limited evidence with mixed results [9]. Consequently, the optimal treatment for PLP remains a challenge to this day, in part undoubtedly due to the uncertainty on its pathophysiology [10].

Recent systematic reviews have also highlighted how the vast majority of RCTs do not meet the necessary criteria and present flawed design, conduct, analysis, and/or reporting [5, 8]. Low-quality RCTs make replicability and meta-analysis harder to achieve, limiting the extent of the evidence in support of treatment recommendations. Evidence-based tools such as the Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) [11] and Consolidated Standards of Reporting Trials (CONSORT) [12] guidelines have become instrumental in helping researchers to develop high-quality protocols and outcome reports respectively: these are great antidotes against compromised scientific evidence. Yet, the risk for selective reporting of outcome and analysis persists [13, 14]. To this end, the Journal of the American Medical Association (JAMA) published a statistical analysis plan (SAP) guidance document in 2017 containing a checklist of minimum items to include when reporting details of the statistical analysis of RCTs [15]. To ensure the much-needed high-quality evidence in the field of PLP research, not only protocols, but also SAPs should be published.

Phantom motor execution (PME) has been found to reduce PLP in case studies and in a single-arm clinical trial [16,17,18,19]. This therapy is currently investigated in an international, double-blind, randomized controlled clinical trial which aims at confirming the previous results [20]. The protocol of the RCT has been previously published [20], and the current article presents the detailed SAP adhering to the checklist provided by the JAMA guidelines [15]. Here, we describe the pre-specified statistical analysis principles and procedures to be followed by the statisticians responsible for analyzing the trial data.

Statistical analysis plan

Introduction and objectives

Synopsis of the trial background

PME promoted by myoelectric pattern recognition, virtual/augmented reality (VR/AR), and serious gaming is investigated in an international, double-blind, randomized controlled clinical trial [20], for which this SAP was created. Specifically, PME is compared to phantom motor imagery (PMI), which is implemented by keeping everything identical as in the experimental intervention (same device, VR/AR environments, and interaction with therapists), except that phantom movements are only imagined and not executed.

Study hypothesis

The working hypothesis of PME is that the purposeful activation of the affected motor circuitry will dissociate the pathological relationship between sensorimotor and pain processing resulting in PLP [19]. Because PMI does not engage as much of the sensorimotor neural circuitry as PME, the latter is hypothesized to be more likely to reduce PLP, framing the investigation as a superiority trial.

Study objectives

The primary objective of the study is to examine whether 15 sessions of PME can induce a greater PLP relief, compared to PMI. The secondary objectives are to examine whether 15 sessions of PME provide a greater improvement in different aspects related to PLP compared to PMI. Aspects considered by the secondary objectives are other measures of PLP such as pain duration, pain intensity as measured by other metrics, and the patient’s own impression about the effect of treatment. Another important aspect investigated in the trial is the long-term retention of treatment benefits, and therefore, another secondary objective is to assess whether PME brings a greater positive change in all the variables (both primary and secondary) between baseline and follow-up timepoints (1, 3, and 6 months post-treatment).

Trial methods

Trial design

The trial protocol was previously published [20] and is briefly described here. Sixty-seven subjects with upper or lower limb amputations were planned to take part in this study. Subjects were assigned randomly to PME and PMI treatments (2:1 proportion). The study consists of baseline assessment, 15 treatment sessions of 2 h each, and three follow-up interviews at 1, 3, and 6 months post-treatment. The design is double-blinded as the patients were informed that the treatment received, regardless of which, has been shown effective in previous studies and were therefore unaware to be assigned to the active control arm. The evaluators did not take part in providing treatment (evaluator and therapist were different persons), making the study a double-blinded. In addition, the data analysis will be performed by a person different than the therapist, the evaluator, or the person that randomized the subjects.


The experimental intervention consists of PME decoded via myoelectric pattern recognition and promoted via serious gaming in virtual and augmented reality. The active comparator consists of PMI in which the participants imagine performing the movements visualized in the virtual environments instead of executing them. In PMI, myoelectric activity is used to monitor that the subjects do not produce muscular contractions but only imagine the movements. Conversely, myoelectric activity is used to drive the actions taking place in the virtual environments in PME.


Participants that meet the inclusion criteria (see protocol [20]) and signed the informed consent were assigned to the experimental or active control group in proportion 2:1, according to the optimal allocation scheme of minimization. Initial description and methods of minimization were introduced independently by Taves [21] in 1974 and Pocock and Simon [22] in 1975. In short, the minimization algorithm was chosen because it allows dynamic allocation of the subjects, in which each allocation is influenced by the current state of overall treatment balances. Specifically, the minimization schemes ensure that allocation balance is maintained by giving higher allocation probabilities to interventions selected in favor of reducing total imbalance. The randomization strategy was chosen to reduce the likelihood of baseline imbalance of potentially prognostic covariates between the two treatment groups and to maximize the amount of information collected on the experimental treatment. The following minimization factors were considered: level of amputation (upper and lower), baseline PLP based on the Numeric Rating Scale (NRS) (low 1–4 and high 5–10), and investigation site (eight centers). The minimization process is conducted using the opensource desktop application MinimPy [23] operated by the monitor of the clinical trial. Patients were randomized as they were enrolled. Specifically, a therapist at the investigation site was responsible for evaluating the eligibility of the subject by carrying out all the assessments included in visit 0. If the patient was deemed eligible, the therapist would assign a subject ID, which also identifies the investigation site, and communicate it to the monitor of the study together with information regarding the abovementioned minimization factors (level of amputation and the NRS value of PLP). The monitor would then input these values in the software and get as result the allocation group. The allocated treatment would then be communicated back to the therapist responsible for the treatment.

Sample size

This confirmatory clinical trial builds on the results of three previous studies [16,17,18]. Sample size calculation was informed by a previous one-armed clinical investigation of PME [17], where pain decrease in PRI was found to be 51% relative mean improvement with an absolute mean improvement of 9.6 (SD 8.1) and effect size of 1.18. In the one-armed trial, the PRI was computed using the pain descriptors as in the Short-Form McGill Pain Questionnaire [24] and scored individually using the present pain intensity scale [25] (scale none to excruciating, 0 to 5), thus giving a score with range 0 to 75. In the present study, we used the classic version of the Short-Form McGill Pain Questionnaire, which scores the single pain descriptors on a scale from 0 to 3 (none to severe), thus reducing the range to 0–45. The absolute mean improvement on the new scale would then become 5.75 and maintaining the same effect size yields an SD of 4.89. Considering that in this trial we had two groups and some improvement for the control group was expected, we chose a difference of 4 between the two groups (which would grant a reduction above 30%) to be adequate and still representative of a clinically meaningful change [26]. To find this difference between the two randomized groups (in proportion 1:2) in the primary outcome measure, with power of 80% resulting from a two-sided Fisher’s non-parametric permutation test at 5% significance level, at least 60 participants are required. Considering a drop-out rate of 10%, which is akin to what was found in the previous clinical investigation, at least 66 patients are expected to be randomized in total. Sample size calculation was carried out using SAS 9.2 PROC.


This trial uses a superiority hypothesis testing framework between treatment groups for all outcomes.

Timing of outcome assessments

See the trial protocol [18] and Table 8 in the Appendix for further details of all outcome measures and their assessment timings.

Statistical interim analysis and stopping guidance

There are no planned interim analyses for efficacy and no planned interim assessment for futility. The monitor of the study will provide stopping guidance should the necessity arise due to adverse events. The trial will be stopped after 60 subjects have completed the study.

Timing of final analysis

The final analyses and unblinding of the data will take place only after the database is locked, and SAP is signed. The first part containing primary analyses and all the data relative to the treatment (collected between visit 0 and visit 15) will be locked after all subjects have completed the treatment phase. The second part of the database containing all the data relative to the follow-up assessments will be locked after all subjects have completed the last follow-up visit.

Statistical analysis principles

Confidence intervals and p-values

The main results will be presented as mean differences with 95% confidence intervals between the two groups together with effect sizes and p-values. All significance tests will be two-sided and conducted at the 5% level.

Multiple comparisons and multiplicity

For the confirmative analyses, the fixed sequential method will be used for controlling type I error, as indicated in Dmitrienko et al. [27]. This theoretical framework will be applied for primary analysis and important secondary analyses. If a test gives a significant result at the 5% significance level, the total test mass will be transferred to the following number in the test sequence until a non-significant result is achieved. All the significant tests before the first non-significant test will be considered confirmative. If the primary analysis will be non-significant, then no analysis will be confirmative.

Adherence and protocol deviations

A protocol violation in eligibility is defined as when a patient was randomized but did no longer qualify for the study according to the eligibility criteria (defined in the study protocol [20]). In addition, patients who withdrew their consent after randomization will be excluded from further follow-up and analysis. A protocol deviation from the assigned intervention can occur when a patient is assigned to either PME or PMI, but at any time during the study, the participant did not receive the prescribed intervention resulting in fewer than 15 sessions in total and/or gaps longer than two consecutively skipped sessions. Successful protocol adherence is defined as successfully completing the baseline assessment and the 15 treatment sessions without any protocol violation in eligibility, protocol deviation, and/or withdrawal of consent. Patients lost to follow-up are patients that completed the treatment without any major protocol violation or deviation but that did not complete the follow-up.

Analysis populations

  • The intent-to-treat (ITT) population: The ITT population will be all patients randomized except patients withdrawing consent for the use of data or patients with a protocol violation concerning eligibility.

  • The full analysis set (FAS): The full analysis set will be all subjects in the ITT population without imputation.

  • Per-protocol (PP) population: All randomized subjects with no major protocol violations and deviations will be included in the per-protocol (PP) population.

Overall protocol adherence, violations, and deviations for the entire clinical trial will be summarized in the CONSORT flowchart presenting the patient inclusion for ITT and PP analyses (see Fig. 1 for the template). Detailed results of treatment allocation and adherence per investigational site will also be presented using Table 1 in the Appendix.

Fig. 1
figure 1

CONSORT flowchart. Abbreviations: PME phantom motor execution, PMI phantom motor imagery

The ITT population, FAS, and the PP population will be specified in detail at the Clean file meeting before the database lock and before breaking the code. The number of subjects included in each of the ITT and PP populations will be summarized for each treatment group and overall. The number and percentage of subjects randomized and treated will be presented. Subjects who completed the study and subjects who withdrew from the study prematurely will also be presented with a breakdown of the reasons for withdrawal by treatment group for the ITT and PP populations.

Trial population

Screening data and recruitment

We will report the number of screened patients who met study inclusion criteria, the number of excluded patients, and the number of participants included in the final analyses. The flow of trial participants will be displayed in a CONSORT diagram. The diagram will present the level of withdrawal (e.g., whether participants withdrew from intervention and/or from follow-up) and the timing and reasons of withdrawal/lost to follow-up data will be presented separately.

Baseline characteristics

A complete list of all the outcome measures assessed during baseline (visit 0) will be presented similarly to Table 8 in the Appendix, while the list of baseline variables will be reported similarly in Table 3 in the Appendix. Additionally, the level of amputation (whether it is an upper or lower limb amputation), investigational site, and level of PLP (high if ≥ 5, or low) were the minimization factors considered for the randomization. Categorical data will be summarized by numbers and percentages. Continuous data will be summarized by mean, standard deviation (SD), median, Q1, Q3, minimum, and maximum. Demographics and baseline characteristics will be summarized by treatment group for the ITT and PP populations. Tests of statistical significance will be performed for baseline characteristics but not included in the baseline tables in the article reporting the results; rather, the clinical importance of any imbalance will be noted.

Prior and concomitant medications

The use of concomitant medications is allowed provided that at the time of inclusion, the patient has stable consumption for at least 1 month before entering the study and any pain reduction potentially attributable to the drug occurred at least 3 months before entering the study. Medication intake is thus considered as a baseline variable but also monitored throughout the study as an efficacy variable. Prior and concomitant medication at baseline will be summarized by higher level Anatomical Therapeutic Classification (ATC) group and generic term for each treatment group for the ITT population.

Outcome variables

Table 8 in the Appendix summarizes the schedule of data collection for the trial.

Primary outcome variable

The primary efficacy variable is the Pain Rating Index (PRI) (Fig. 2). The PRI is a continuous variable computed as the sum of the scores for all descriptors of the Short Form of the McGill Pain Questionnaire (SF-MPQ). The primary outcome of the study is the change in PRI between baseline (visit 0) and end of treatment (visit 15).

Fig. 2
figure 2

Example of subjects’ profile of Pain Rating Index (PRI) over time by treatment group. The data used for this plot are fictitious

Secondary outcome variables

The secondary efficacy variables are other measures related to PLP and are listed below. As we are following the theory of fixed sequential multiple testing, the variables will be tested in the order in which they appear in the list below. All the variables apart from the Patients’ Global Impression of Change are assessed as changes between baseline and the specified timepoint.

  • Patients’ Global Impression of Change measured at visit 15 (single score [1–7])

  • PRI change between baseline and follow-up timepoints (1, 3, and 6 months)

  • PRI defined as a dichotomous variable indicating whether the patient has experienced a clinically meaningful reduction in pain (achieved with a decrease 50% or more [28] of the PRI)

  • From the Questionnaire for Phantom Limb Pain (Q-PLP):

    • Weighted Pain Distribution Index (range [1–5]): sum of the scores given to pain ratings during the day, weighted by the amount of time spent in the respective level of pain [16]

    • PLP intensity (NRS, [0–10])

Outcomes will be evaluated as changes between baseline and end of treatment and baseline and follow-up timepoints, if not otherwise stated.

Exploratory outcome variables

  • EuroQol-5D-5L total score (mean value of the score given to the 5 items, [1–5])

  • Pain Catastrophizing Scale Short Form total score (sum of the individual scores [0–24])

  • Pain Disability Index (sum of the scores given to each item [0–70])

  • Patient Health Questionnaire total score (sum of the individual scores [0–6])

  • Pain Self-Efficacy Questionnaire total score (sum of the individual scores [0–12])

  • Health Care Climate Questionnaire (HCCQ) total score (sum of the individual scores)

  • Expectations for Complementary and Alternative Medicine Treatments Short Form (EXPECT-SF) total score

  • Opinion About Treatment (OAT), total score [3–18]

  • From the Questionnaire for Phantom Limb Pain (Q-PLP):

    • PLP intensity on Present Pain Intensity (PPI) scale, categorical, 6 levels

    • Telescoping (1 most proximal–6 most distal)

    • Pain locations (1 most proximal–6 most distal, multiple selection possible)

    • Stump pain, NRS, [0–10]

    • Medication consumption (improved or not)

    • Pain interference with work, NRS, [0–10]

    • Pain interference with daily life activities (NRS, [0–10])

    • Pain interference with sleep, NRS, [0–10]

    • Pain qualities from the McGill Questionnaire

    • Pain frequency (improved or not)

    • Exponential decay model of trial-by-trial improvement

Outcomes will be evaluated as changes between baseline and end of treatment and baseline and follow-up timepoints, if not otherwise stated.

Safety variables

Safety variables for the trial are treatment-emergent adverse events (that is, events either not present at baseline or present at baseline but of increased severity) and any serious adverse event (SAE).

All variables are measured at visit 0 (baseline), visit 15 (end of treatment), and follow-up assessments, if not otherwise specified. Adverse events will be reported as cumulative percentages at the end of the trial by treatment group.


General statistical methodology

All the main analyses will be performed on the ITT or FAS population between the two randomized groups. Complementary analyses will be performed on the PP population. For the confirmatory analyses, the fixed sequential test method for adjustment for multiplicity will be applied started with primary analysis and followed by the secondary analyses as numbered above. All other analyses will be exploratory. For comparison between the two randomized groups, Fisher’s non-parametric permutation test will be used for continuous variables, Fisher’s exact test for dichotomous variables, Mantel-Haenszel chi-square test for ordered categorical variables, and Pearson chi-square test for non-ordered categorical variables. Adjusted analyses will be performed with analysis of covariance for continuous outcome variables and with logistic regression for dichotomous outcome variables.

The main results will be the mean differences with 95% CI between the two groups, effect sizes, and p-values. For analyses of changes within groups, Fisher’s non-parametric permutation test for paired observation will be used for continuous variables and the Sign test for dichotomous and ordered categorical variables. All correlations will be performed with Spearman’s correlation coefficient.

All statistical tests will be two sided and conducted at the 5% significance level. The distribution of continuous variables will be given as mean, SD, median, Q1, Q3, minimum, and maximum, and distribution of categorical variables will be given as numbers and percentages. All major changes from the statistical analysis plan will be specified by subdividing into before and after database lock and unblinding. Normality of the data will be assessed inspecting the histograms.

Analysis of the primary outcome variable

The main analyses will be performed using the ITT population and it will be unadjusted. The comparison of change in PRI between baseline (visit 0) and end of treatment (visit 15) between the two treatment groups will be carried out with a two-sided Fisher’s non-parametric permutation test on significance level 0.05. We chose to use a non-parametric test for the main analysis as it does not rely on the assumption that the data are normally distributed, which we cannot ensure a priori. Missing data will be imputed with stochastic imputation using 50 datasets; it should be noted that this method of imputation differs from the use of last observation carried forward (LOCF) proposed in our previous work [20], as advised by our senior statistician consultant and reviewers. The difference in change in PRI with 95% CI will be given together with effect size and p-value.

The following sensitivity analyses will be performed for the primary analysis:

  1. 1.

    The robustness of the estimate of treatment effect will be assessed by adjusting for the baseline characteristics used for randomization (level of PLP at baseline (NRS), baseline PRI, level of amputation). For adjusted comparison between two groups, the analysis of covariance (ANCOVA) will be used with intervention/control as the independent variable and the baseline characteristics used for randomization as covariates. The effect estimates will be presented as group difference with 95% confidence intervals. This analysis will be conducted on the ITT population using stochastic imputation to handle missing data.

  2. 2.

    The impact of the imputation method will be assessed by running the same analysis as point 1 using LOCF for missing data imputation.

  3. 3.

    The impact of missing data will be assessed by running the unadjusted comparison on the full analysis set (FAS) using the same statistical methods as in primary analyses without imputing missing data.

A complementary analysis on the primary variable will also be performed on the PP population.Changes within groups will also be performed but with lower evidence value.

Analysis of secondary outcome variables

All secondary outcome variables will be analyzed between the two randomized groups according to the statistical methods presented in the section general statistical methodology, both on the ITT population using stochastic imputation and on the PP population. Within-group comparison will also be performed on the ITT population.

Analysis of exploratory outcomes

All exploratory outcome variables will be analyzed between the two randomized groups according to the statistical methods presented in the section general statistical methodology, on the ITT population using stochastic imputation. Within-group comparison will also be performed on the same population. Prosthetic usage data will be descriptively analyzed.

Subgroup analysis

The effect on the primary outcome will be analyzed also in subpopulations defined by: level of amputation, cause of amputation and gender. Subgroup analyses will be carried out using the approach followed for the main unadjusted comparison.

Missing data

The percentage and absolute withdrawal of participants lost to follow-up will be reported for each study arm in the CONSORT flowchart and reasons for missing data will be documented.

Safety analyses

Only treatment-emergent AEs will be included in the summaries for the safety population. The number and percentage of patients experiencing SAE will be presented for each treatment arm.

Software details

MATLAB version 9.10.0 (or above) will be used for all analyses [29].


The present article describes the analysis principles and specific statistical procedures chosen for analyzing the primary, secondary, and exploratory outcomes of an international, double-blind, randomized controlled clinical trial on the use of phantom motor execution as a treatment for phantom limb pain [20]. The SAP has been developed in accordance with the JAMA guidelines [15] and the main objective is to make it publicly available before unblinding, in order to minimize the risk of outcome reporting bias.

The statisticians and clinical investigators who developed the plan were blinded to the treatment allocation and treatment-related study results and will remain blinded until the database is locked for final data extraction and analysis. The trial is registered at with registration ID NCT03112928 and began recruiting patients in May 2017. The final follow-up visit is expected to be conducted in August 2021. The data is expected to be ready for analysis and unblinded in December 2021.