Introduction

Upper extremity (UE) impairment is one of the most common motor impairments in injuries to the central nervous system. The functionality of the hand is closely related to the object that must be used. Gripping requires motor control for reaching, grasping, and manipulation, which requires correct somatosensory information from the skin’s mechanoreceptors [1, 2].

Manual ability refers to the subject’s attempt to perform a particular activity. Manual ability and the performance of dexterity tasks require both gross and fine hand motions and coordination [3]. Manipulative dexterity and the ability to develop fine movements are directly related to participation in activities of daily living. An adequate assessment to quantify the ability of subjects during interaction with objects can predict independence in basic and instrumental activities of daily life [2].

The Nine Hole Peg Test (NHPT) is an instrument that was developed to measure finger dexterity for the assessment of fine manual dexterity. It was originally introduced by Kellor et al. [4] in 1971 in an official publication of the American Occupational Therapy Association. The NHPT consists of a rectangular base composed of a small container with nine holes and nine pegs. The subject must insert the nine pegs, one by one, into these holes at the maximum possible speed. Then, the pegs should be removed from the holes, one by one, and replaced back in the container. The NHPT registers the time that the subject takes to complete the task. This process must be done with both hands independently [5]. The NHPT is one of the main manipulative skill assessment instruments that offers information on motor aspects and levels of strategy that affect the functionality of the subjects’ hands [1]. The NHPT is quick and easy to administer across all ages to measure finger dexterity. It demonstrated good and comparable test–retest reliability, and concurrent and known group validity. For this reason, the NHPT was recommended for inclusion in the motor battery of the National Institutes of Health Toolbox [6, 7]. Although it has been extensively used, the measurement properties of this test have not been investigated in all clinical conditions, for example, in patients with cerebral palsy (CP).

CP is defined as a group of developmental disorders of posture and movement that is due to non-progressive lesions of the brain in the fetal or infantile stages [8]. CP is the most common cause of neurological disability in childhood, presenting a stable incidence worldwide of 2.11 per 1000 live births [9]. CP can be classified according to muscle tone as spastic, dyskinetic, ataxic, and mixed. Spastic CP is the most frequent, representing between 85 and 90% of cases [10], and can occur unilaterally (one-third of spastic CP), or bilaterally (two-thirds of spastic CP) [10, 11]. The presence of muscle weakness in CP can lead to limitations in range of motion, precise timing, power, and gross and fine motor skills, such as hand manipulation skills [12]. More than half of children diagnosed with CP experience various upper limb problems of different severity and heterogeneity. Children with CP usually have difficulties performing manual activities such as grasping, releasing, or manipulating objects, which is crucial in the performance of many activities of daily living [12]. Hand function problems in children with CP are often associated with problems of motor control, active range of motion, grip strength, and the persistence of primitive grasp reflex [12, 13]. This dysfunction of the hand interferes with its use and limits the child’s ability to perform activities of daily living, communication and social contact, which is one of the main challenges of CP [14]. The impact of CP on a child’s hand functioning may be formalized through the theoretical framework of the International Classification of Functioning, Disability and Health (ICF) [15].

According to the ICF, CP may affect three separate but related domains of functioning: body functions and structures (body domain), activities (individual domain), and participation (social domain). Body functions include the physiological or psychological functions of the different body systems. Body structures refer to the anatomic parts of the body (e.g., organs, limbs, and their components). By definition, CP is a consequence of early brain lesions that may affect the corticospinal tract. CP may impact the hand and its components (e.g., muscles, joints, and bones), as well as several body functions (e.g., muscle strength, control of rapid coordinated movements, touch-pressure detection, and the recognition of common objects and shapes). CP may also limit the ICF domain of activities, which refers to the ability to execute an essential task or activity of daily living (e.g., eating, drinking, grooming, or dressing) [16].

The importance of a correct diagnosis of fine manual dexterity in children with CP may be essential when planning treatment and developing more effective movement strategies. The manual strategies used differ depending on the size of the object. When manipulating larger objects, children use different manual strategies to perform tasks, contrary to fine finger dexterity, which requires more complex movements [17]. The NHPT is considered the gold standard test in multiple sclerosis, due to its clinical utility and excellent psychometric properties in terms of reliability, and discriminant, concurrent, and ecological validity [18]. In addition, it has been used to assess the efficacy of some interventions in other pathologies as CP or stroke [19, 20]. However, to the best of our knowledge, there are no studies that have studied the reliability and measurement errors of NHPT in CP.

Accordingly, the objective of the present work was to assess the intra-rater inter-session reliability and agreement of the NHPT in CP with spastic-type hemiparesis.

Materials and methods

This manuscript follows the guidelines for reporting reliability and agreement studies (GRRAS) [21] and the complete checklist can be verified at Supplementary File 1. We did not follow the COSMIN guidelines for reporting studies on measurement properties because the only available instruction is related to patient-reported outcome measures (PROM), which is not the case for the NHPT [22]. Moreover, according to Gagnier et al. [23], the GRRAS was the most relevant guideline for the development of the COSMIN reporting guidelines and can be applied to studies on reliability and agreement of non-PROMs.

Participants

Subjects with unilateral CP were recruited from the Associations of ASPACE (Association of People with Cerebral Palsy) of Extremadura. All subjects had to meet the following inclusion criteria: a diagnosis of CP whose topography is unilateral and spastic symptoms, spasticity ≤ 2 in pronators and flexors of the wrists and fingers on the modified Ashworth scale, level of manual function ≤ III in the Manual Ability Classification System (MACS), gross motor function level ≤ 3 on the Gross Motor Function Classification system scale, and sufficient cognitive ability to follow and understand instructions, which implies a score ≥ 20 in the Mini-Mental and Mini-Mental Test for children. Subjects with CP whose topography was different from the involvement of a hemiparesis, presented dystonia or hypotonia; complete tactile anesthesia in the affected hand; and recent changes in medication and visual deficit type hemianopia were excluded.

Instrumentation

The commercial version of the NHPT (Smith and Nephew), a plastic version, was used, as it seems to be the most commonly used and recommended [19]. Importantly, there is no significant difference in the task time between the plastic version and the original wooden square [24]. The NHPT consists of a rectangular base and 9 pegs (7 mm in diameter, 32 mm in length). The base is divided into two parts; the container is located where the pegs will be housed in one, while a pegboard composed of 9 holes (10 mm in diameter, 15 mm deep) in which each of the pegs will be inserted is in the other.

Procedure

The protocol established in the present study consists of two evaluations on two different days with an interval between the assessment of 5 and 7 days [25,26,27]. All of the evaluations were carried out between March and May 2021 by the same therapist. The analysis and interpretation of data was carried out by two expert therapists who were not the evaluator.

First, all of the subjects were evaluated using the MACS, which systematically assesses how children with CP use their hands when handling objects in daily activities. Several levels are established whose distinction is based on the child’s ability to manipulate objects and the need for assistance or adaptations. The scores range between I and V, where level I indicates that it “manipulates objects successfully” and level V “does not manipulate objects” [28]. Then, the NHPT was carried out according to the original instructions established by Mathiowetz et al. [5]. The subject was familiarized with one practice of the test, followed by the independent assessment of both upper limbs, starting with the non-affected UE. The NHPT was repeated identically on the second day of assessment.

Statistical analysis

The sample size was calculated by adopting a 95% confidence interval, expecting an intraclass correlation coefficient (ICC) of 0.80 and accepting a confidence interval of ICC of 0.30, resulting in a minimum sample of 24 individuals [29].

Statistical analysis was carried out using SPSS software for Windows (SPSS Inc., Chicago, IL, USA; version 26.0). The significance level was set to 0.05 for all tests. Those participants who did not attend the second evaluation were excluded. There was no missing data from those who completed both assessments. Whether the variables followed a normal distribution was assessed using the Shapiro–Wilk test. The hypothesis that the variables on the affected side did not follow a normal distribution was accepted due to the results of the test and the verification of the histograms for each variable. Participant data and their respective scores in both assessments of the NHPT were described by the mean, standard deviation, frequency, and percentage. The ICC3,1 was used to analyze the intra-rater inter-session reliability of the NHPT, its 95% confidence interval, using a mixed effects model and absolute agreement [30, 31]. The ICC value ranges from 0 to 1, being classified as poor (< 0.4), moderate (0.40–0.70), good (0.70–0.90), or excellent reliability (> 0.90) [32].

Measurement error was determined by estimating the standard error of the measurement (SEM = SD√1-ICC), where SD is the standard deviation of the scores from all subjects and ICC is the reliability obtained for the intra-rater inter-session reliability and the minimum detectable change (MDC = 1.96 * √2 * SEM) [33].

The agreement between NHPT repetitions was verified by the Bland-Altman method [34, 35]. The presence of significant bias was tested by the one-sample t-test applied to the difference between measurements. If a significant difference is observed using this test (p > 0.05), there is no agreement between the measures. Once the agreement between them was confirmed for the affected and non-affected UE, a Bland–Altman graph displaying the mean difference and the 95% limits of agreement (LOA) was made for each UE. Finally, the proportional bias was verified by a linear regression using the difference between NHPT scores as a dependent variable and the mean scores as the independent variable. The proportional bias is identified if a significant model is observed (p < 0.05).

Results

In total, 32 subjects were screened, of which 5 did not attend the assessment due to COVID-19 isolation. Therefore, they were excluded from the study, leaving a total of 27 participants. Participant features are described in Table 1. The NHPT scores of the participants as a function of their MACS level are presented in Fig. 1.

Table 1 Participants features
Fig. 1
figure 1

Nine Hole Peg Test (NHPT) scores for the affected and non-affected upper extremity in both assessments (NHPT-1 and NHPT-2) in the function of their level of manual function by Manual Ability Classification System (MACS)

The intra-rater inter-session reliability was excellent for both UE; the non-affected side presented an ICC = 0.94 (95%CI = 0.86 to 0.97) and the affected side an ICC = 0.96 (95% CI = 0.91 to 0.98) (Table 2). For the non-affected UE, an SEM of 2 s was identified which represents 6% of the mean time observed for this side, with an MDC of 4 s (13%). For the affected UE, the SEM was 19 s, being proportional to 13% of the mean observed for the affected side, with an MDC of 12 s (8%) (Table 2).

Table 2 Intra-observer inter-session reliability of the Nine Hole Peg Test

According to the Bland–Altman method (Fig. 2), there is no evidence of bias between the repetitions for the NHPT score for the non-affected UE (t = 1.051, p = 0.30) nor for the affected UE (t =  − 0.983, p = 0.34). For the non-affected UE, the mean difference is − 0.77 s with an LOA varying between − 8.31 and 6.76 (Fig. 2). For the affected UE, the mean difference was − 7.14 s, with an LOA ranging from − 81.19 to 66.90. No significant proportional bias has been observed for the non-affected UE (F = 0.230, p = 0.64) or the affected UE (F = 1.196, p = 0.28).

Fig. 2
figure 2

Bland–Altman plots comparing results between sessions of measurements (intra-rater reliability) for the non-affected (a) and affected upper extremity (b). Bias (red line) and limits of agreement (green lines) are shown for Nine Hole Peg Test. The mean is plotted on the x-axis, and the difference between sessions (mean of the differences) is plotted on the y-axis (mean difference ± 1.96 SD). The best fit linear regression line is represented by the black line without any significant proportional bias for the non-affected (F = 0.230, p = 0.64) and affected upper extremity (F = 1.196, p = 0.28)

Discussion

The purpose of the current study was to evaluate the intra-rater inter-session reliability and agreement of the NHPT in spastic unilateral CP. Since it is common to use the test to detect changes in the functionality of the UE, determining the potential of the test would help us to verify the function of the UE with confidence and consequently the level of independence in activities of daily living.

The reliability parameters of a test express how well the patients can be distinguished from each other despite the presence of measurement errors [36]. The excellent ICC results found show that the NHPT is a reliable tool for assessing hand dexterity in patients with CP. These findings are in line with other NHPT studies in different pathologies and healthy subjects. Excellent reliability of the NHPT has also been demonstrated for patients with multiple sclerosis [37], both intra- and inter-rater reliability presented an ICC greater than 0.9. Moreover, an excellent test–retest reliability was reported for healthy children, aging between 4 and 19 years old, with an ICC = 0.98 for the dominant hand and ICC = 0.96 for the non-dominant hand [38].

In addition to the excellent reliability, it has also been possible to provide the parameters of measurement error that aid in the interpretation and use of the NHPT in practice with CP patients. Using the SEM, it is possible to observe how distant the test scores are after repeated measurements [36]. We can confirm that the repetition of NHPT in patients with CP in the assessed sample is associated with errors of up to 13% of the total score when the patient remains stable. Furthermore, due to the MDC values found, it is possible to assume that changes greater than 12 s in the NHPT of the affected UE are necessary to believe that there is a real change in the patient’s manual dexterity, rather than a measurement error. For the non-affected UE, this change must exceed 4 s.

In Parkinson’s disease, Earhart et al. [39] showed an MDC of 2.6 s for the dominant hand and 1.3 s for the non-dominant hand. In multiple sclerosis, Lamers et al. [40] obtained an MDC of 4.38 s for the dominant hand and 7.46 s for the non-dominant hand. Moreover, in subjects with stroke, Chen et al. [41] described an MDC of 6.8 s for the non-affected UE and 32.8 s for the affected UE. Compared with our results, we can suggest that greater errors can be expected on the affected side in neurological conditions of unilateral involvement, while the error may be greater on the non-dominant side in conditions that affect individuals bilaterally.

Concerning the agreement assessed here, the analysis and Bland–Altman graphs indicate that there is no significant bias in the repetition of NHPT, that is, the repetition neither leads to an improvement nor a deterioration in the manual dexterity test score. Furthermore, there was no confirmed proportional bias, which rules out the probability of greater errors in CP patients who have the most impaired manual dexterity.

There is no cutoff value in the literature that indicates ideal or acceptable values for measurement errors. The ICC, SEM, and MDC values and the distribution pattern in the Bland–Altman graphs can be used and interpreted according to the specificities and particularities of interest, whether in clinical practice, in comparison studies, or in interpreting the efficacy of interventions that use NHPT as a variable of interest to assess patients with CP.

It should be considered that all of the properties obtained in the present study apply exclusively when the NHPT is performed with the same method applied here, which followed the methodology of Mathiowetz et al. [5]. In our study, the subject was familiarized with one practice of the test; we consider it necessary to practice the test at least once, to determine with certainty whether the subject has understood the test instructions and to become familiar with it. We believe that one practice of the test was sufficient and that there was no learning effect. Other authors suggest performing several measurements on each hand and taking the lowest and highest score from each side [42]. They assume that this would eliminate the variability, considering the better functions of the dominant hand and the better test results due to improvements from repeated practice [42]. Conversely, Feys et al. [19] considered reliability to be influenced by the effects of learning with repeated administrations. They question the repeated practice of the test, since the subjects would be more trained and the scores would be better; in addition, the assessment would take a longer time, which increases the fatigue of the subjects. Therefore, there is no absolute agreement between researchers on how to obtain maximum reliability.

Besides the questions about NHPT repetitions, Johansson and Häger [43] proposed some modifications for the test to optimize the reliability in post-stroke patients with spasticity, developing the standardized NHPT(S-NHPT). They proposed a double pegboard, replacing the container by another board with holes in which the pegs would already be inserted. Moreover, they established the order of the task, making it necessary to pick the pegs from the lateral pegboard and transport them to the holes of the medial pegboard and return them all to their initial position. On the one hand, grasping was facilitated, since the pegs were already inserted in the board, which implies that it requires less fine control of the hand and may allow moderate disabilities to perform the test, but they point out that the complexity was increased since it requires greater demand for attention and memory.

We should recognize the limitations of the present study. First, the age range is relatively large, including patients between 4 and 28 years of age. Poole et al. [38] provided significant scores by age range in healthy children, stating that the time it took for minors to complete the NHPT was longer, and the time was similar between 10 and 16 years, decreasing considerably after 16 years. They also found that women were quicker to complete the NHPT assessment. It is not known whether the same is reported in patients with CP, but a much larger sample would be necessary to assess the properties of measures in various subgroups of age and sex. In addition, the selection of the sample through non-probability sampling and with too much homogeneity could be a threat to external validity and force good reliability. For this reason, we decided to keep the sample more heterogeneous and balanced for age and sex, respectively.

We restricted the type, including only subjects with spastic unilateral CP, because the spasticity could affect the reliability differently, as observed for patients with stroke [41]. Given the great diversity of CP types and the need to unify measurement tools that are accessible to all types and severity, reliability should be tested for all types of CP in further studies. Future studies may also assess these measurement properties in clusters of distinct ages (such as infant school-aged children, adolescents and adults) or MACS levels.

However, based on our results, it is possible to conclude that the use of the NHPT in patients with spastic unilateral CP is supported by excellent intra-rater inter-session reliability values, small proportions of measurement errors, and the absence of significant bias due to repetition of the test.