During the last four decades, more than 20 assessments for upper limb apraxia have been published. While some of the assessments were developed as a diagnostic tool within the clinical setting, other assessments were primarily developed for scientific purposes. One reason for the high number of different assessments is that apraxia is a very heterogeneous syndrome and many assessments capture merely single aspects of apraxia (e.g., either imitation of gestures or object use) that can be affected differentially [3, 34, 52, 55, 58]. This especially applies to assessments primarily developed for research purposes, which are often focused on the specific apraxic impairment under investigation. In contrast, apraxia assessments used for the daily clinical routine need to provide a high diagnostic sensitivity, which can usually only be achieved when many apraxic symptoms are concurrently assessed. Taking into account that apraxia is often accompanied by aphasia [17, 44], tests for clinical application should focus on test items that use objects or gestures rather than language as the trigger for actions because in patients with comorbid aphasia it is difficult to differentiate whether the motor deficits observed after verbal instructions are primarily due to the apraxic or aphasic (e.g., reduced language comprehension) impairment. Moreover, a tool used in a clinical environment to test for upper limb apraxia often necessitates a quick and easy application, and should hence require as few test items as possible. Finally, it is important that the psychometric properties of clinical apraxia tests are available with reliable cut-off values.
Accordingly, we performed a literature search to identify all tests developed for diagnosing upper limb apraxia that have been published so far. The literature search targeted the period from January 1965 until April 2011. The following keywords were entered into the medical search engine Medline (PubMed): “apraxia” AND “assessment” OR “test”. In addition, the reference list of all relevant articles was reviewed for further references. Table 1 summarizes all apraxia assessments identified by means of the search. In particular, Table 1 provides information on the stimulus types used to trigger the actions and the involved processing routes (semantic, structural; see below). Table 1 also indicates whether cut-off values for an apraxic impairment are provided, and whether psychometric test properties have been assessed. Furthermore, details on the test duration, the required test material as well as on the population in which the test was initially validated are provided. Finally, it is indicated whether an item analysis or rather an item reduction was carried out, and whether, in addition to gestures of the upper limbs, bucco-facial gestures were also included in the test.
Table 1 Summary of the published assessments developed for diagnosing upper limb apraxia (until April 2011; note that tests are ordered by publication year)
As a detailed description and evaluation of all published assessments would go beyond the scope of this article, we defined a priori criteria to select assessments for a more detailed appraisal: As described above, apraxic impairments affecting meaningless gestures as well as apraxic deficits related to meaningful gestures have both a negative impact on independently performing ADLs and on the outcome of (stroke) rehabilitation. According to previously described models of apraxiaFootnote 1 [14, 23, 55], these deficits represent impairments of the structural (for meaningless gestures) and the semantic (for meaningful gestures) processing route. Meaningful gestures that are recognized after initial processing are processed along the semantic route, which means that information about the gesture is retrieved from the so-called action semantic system, which enables the activation of all required movement elements as a whole, while processing along the structural route allows for activation of single movement elements only. This notion has—in part—been proposed by previous apraxia models [14, 55]. The structural route is based on visuo-motor conversion mechanisms directly transferring the visual analyses into motor programs without assessing semantic information. Thus, disturbed processing in both the structural and the semantic processing routes affect ADL performance and neurorehabilitation in apraxic patients. Therefore, we considered it relevant that a clinical test assesses impairments of both the structural and the semantic processing route as otherwise patients with deficits leading to impairments of motor functions relevant for daily living might be overlooked. Moreover, for a diagnostic tool, we consider it indispensable that clear cut-off values are provided according to which the patient’s test performance can be classified as either normal or impaired. Hence, only apraxia tests that comprise both items tapping the structural and the semantic route (i.e., including meaningless and meaningful items) and that also provide defined cut-off values will be described in more detail. Based on these criteria, eight assessments were selected (see lines highlighted in dark grey in Table 1). Note that one of these assessments was not primarily developed for assessing upper limb apraxia but for diagnosing Alzheimer’s disease, where upper limb apraxia represents one of the early symptoms due to atrophy of the parieto-temporal cortex [19]. Accordingly, in a population of healthy elderly people and patients with Alzheimer’s disease, the cut-off value of this test was chosen in a way to ensure that patients with Alzheimer’s disease were identified as reliably as possible (high sensitivity) and that the healthy elderly people without cognitive impairments should be reliably classified as not suffering from Alzheimer’s disease (high specificity). Importantly, this test was not applied to patients with (left-hemisphere) stroke.
The remaining seven tests, which examine both the structural and semantic processing routes and report cut-off values, were explicitly developed as diagnostic tools for detecting upper limb apraxia in stroke patients and thus will be discussed in detail. For clarity, we differentiate between short screening tests for symptoms of apraxia, tests for a clinical diagnosis of apraxia, and comprehensive test batteries, which might be used for scientific purposes, but which are too time-consuming for every-day clinical routine.
Short screening tests for upper limb apraxia
Two short apraxia screenings [68, 71] which are accomplishable at the bedside fulfilled the selection criteria described above.
Apraxia screen of TULIA (AST)
The AST (Apraxia screen of TULIA) by Vanbellingen et al. [68] is based on a more comprehensive test procedure of the same study group (Test for Upper-Limb Apraxia, TULIA; see below) and was constructed via an item reduction of the original test procedure. By means of item reduction, the test was shortened from originally 48 to 12 items. A high specificity (93%) as well as a high sensitivity (88%) was achieved with the cut-off-values determined by the authors. A highly significant correlation between AST and TULIA scores points towards a good validity of the screening test; a validation with an external (i.e., an independent) assessment is not yet provided, but would be desirable. Furthermore, the high correlation between the scores of the 12 AST items and the partial scores when the same 12 items are tested within the framework of the TULIA indicates a good test–retest reliability of the AST. However, it should be critically noted that after item reduction there was just one item left in the AST tapping the structural pathway.
Cologne apraxia screening (CAS)
The CAS is an apraxia screening that was developed to create a sensitive, reliable, and valid screening tool for clinical purposes [71]. The CAS requires patients to pantomime the use of objects (i.e., transitive gestures) as well as to imitate abstract and symbolic (intransitive) gestures. Pantomime of object use is tested by presenting the patient (black-and-white) photos of objects whose handling the patient should pantomime. Objects are always displayed in a way that suggests the usage of the left hand, i.e., the non-paretic hand for patients with left-hemisphere stroke. Using photographs reduces the required verbal instructions and assures a standardized test application. In contrast to many other assessments, Weiss and colleagues also use photographs in the imitation tasks thereby removing stimulus differences that inevitably occur when gestures are demonstrated by different examiners. The CAS assesses impairments of the structural as well as the semantic pathway and takes two out of three possible input modalities into account (objects and gestures). Weiss and colleagues purposely refrained from using verbal instructions (language) as an input modality as motor deficits for verbally instructed test items may result from the often co-morbid aphasia.
In comparison to many other apraxia assessments, an important advantage of the CAS is that an item reduction was performed: based on the performance of a sample of 30 neurological patients and 19 healthy control subjects, those test items were selected that discriminated best the performance of the two groups. Due to a subsequent item reduction, the CAS was then confined to 20 items and can hence be administered within approximately 10 min. The inter-rater reliability is high. In addition, the construct validity, which was assessed using the test for imitating hand postures and finger configurations by Goldenberg [30] as external criterion, is also high. Especially the correlation between the CAS scores and the scores of the hand imitation test, which is known to be sensitive in detecting apraxic deficits, was very good. Moreover, the CAS has a high sensitivity and specificity.
Assessments for the clinical diagnosis of upper limb apraxia
Apraxia test by De Renzi et al. [17]
Thirty years ago, a test for upper limb apraxia assessing both processing routes and providing cut-off-values was published by De Renzi et al. [17]. This test is solely comprised of imitation tasks. The gestures that have to be imitated are each classifiable according to three dimensions: (1) the gesture requires either independent movements of the fingers or a movement of the whole hand, (2) the gesture is merely a static posture or a motor sequence, (3) the gesture is either meaningful or meaningless. The combinations of these three dimensions result in eight categories, which are represented by three items per category. Gestures are presented to the patient up to three times, but fewer points are given when the gesture is not immediately imitated correctly. As the administration of the whole test requires approximately 15 min and as no test material is required, this test can be used at the bedside. De Renzi and colleagues applied the test to 100 patients without brain damage, 80 patients with right hemisphere damage, and 100 patients with left hemisphere damage. A cut-off value was determined based on the performance of the patients without brain damage. The main disadvantage of this test procedure is that no psychometric properties have been determined. However, information on reliability and validity as well as information on specificity and sensitivity is of great importance for tests to be used for clinical diagnoses.
Test of upper limb apraxia (TULIA)
Recently, Vanbellingen and colleagues [69] developed a test battery for the assessment of upper limb apraxia (TULIA) that comprised tasks requiring the production of abstract as well as symbolic gestures, thereby testing both the structural and the symbolic route. Moreover, the test allows for testing both transitive and intransitive symbolic gestures. All gestures have to be produced after the examiner demonstrates them in a mirrored fashion (imitation) and after verbal request. Hence, two out of three possible input modalities (gestures, language) are tested by the TULIA. All gestures require the use of one hand only and can be performed by the patient with his non-paretic hand. In total, this test procedure contains 48 items that can be accomplished in roughly 20 min. Validation of the TULIA was carried out in a sample of 133 stroke patients (84 with left-hemisphere and 49 with right-hemisphere stroke) and 50 healthy control subjects. Based on this sample, a cut-off value was determined according to which apraxia is diagnosed if a patient scores two standard deviations below the mean of healthy control subjects. Both inter-rater and retest reliability were calculated. Most items showed a (very) good (κ 0.65–0.99), and only few items (n = 6) a moderate (κ 0.35–0.50) inter-rater reliability. Likewise, the retest reliability (assessed by testing 20 patients three times within 24 h) was very high for nearly all subtests (Cronbach’s alpha > 0.83); merely the subtest for imitation of meaningless gestures had a slightly lower Cronbach’s alpha (0.67). Vanbellingen and colleagues also provide information on the criterion and construct validity. The criterion validity describes the relationship between the results of the diagnostic tool and an empirical criterion. The external criterion chosen by Vanbellingen and colleagues was the clinical observation that impairments of gesture production occur more frequently after left-hemisphere than after right-hemisphere stroke [30, 74]. As the TULIA clearly classified more left than right hemisphere damaged patients as apraxic (68 vs. 39%) and as in the majority of cases apraxia severity was more pronounced in patients with left-hemisphere stroke, good criterion validity of the TULIA was assumed. A good construct validity of a test is given, if either the test scores correlate high with the scores of an instrument measuring the same construct (convergent validity), or correlate low with scores of a test measuring a different construct (discriminate validity). In order to assess the convergent validity of the TULIA, a subgroup of the patients (21 patients with left-hemisphere and 12 with right-hemisphere stroke) was additionally tested with the test for apraxia by De Renzi and colleagues [17]. A high correlation (r = 0.82) between the scores of the two test procedures points to a good convergent validity. In contrast, sensitivity and specificity were not assessed as the authors argued that there is no suitable instrument that could be used as an external criterion. Altogether, the test procedure by Vanbellingen and colleagues constitutes a reliable and valid instrument for the assessment of upper limb apraxia.
Apraxia test by De Renzi and colleagues [16]
Although the apraxia tests and screenings introduced above all fulfill our selection criteria, none of these includes an assessment of actual object use. However, as the clinical situation (or the scientific study) may demand a quantitative assessment of actual object use, we here describe a further test by De Renzi and colleagues [16] that contains a subtest for assessing how patients actually use objects, although this test does not fulfill the a priori selection criteria. In addition to the subtest of actual object use, this test contains a subtest for the imitation of intransitive meaningful gestures. For the assessment of actual object use, patients are consecutively given seven objects (hammer, toothbrush, pair of scissors, revolver, pencil eraser, lock and its key, and a candle together with a matchbox) and the patient is asked to actually use each of the seven objects (see also [22]). For the second subtest, the patient is required to imitate ten intransitive, meaningful gestures demonstrated by the examiner (e.g., waving goodbye). Based on the examination of 40 patients without brain damage and 205 patients with brain damage (45 with right-hemisphere lesions, 160 with left-hemisphere lesions) cut-off values for both subtests were determined. As the authors determined a separate cut-off value for the object use subtest, the test may allow the detection of (isolated) object use deficits. However, other psychometric properties were not assessed. Nonetheless, the object use subtest can be used in addition to one of the better validated diagnostic apraxia tests described in “Short screening tests for upper limb apraxia” and “Assessments for the clinical diagnosis of upper limb apraxia” (e.g., CAS, TULIA) as these tests do not assess actual object use.
Apraxia tests primarily applicable for scientific purposes
Apraxia test by Alexander and colleagues [1]
Alexander and colleagues [1] conceived a test for the assessment of apraxia in a study designed to examine the relationship of different motor impairments with lesion size and localization as well as with different forms of aphasia. The apraxia test developed by Alexander et al. [1] comprises four subtests testing different body parts (bucco-facial, axial, upper limb, and respiratory movements), which all include both meaningful and meaningless items, thus assessing deficits of both the semantic as well as the structural processing route. Based on the test performance of 23 healthy control subjects, the inter-rater reliability and cut-off values were determined, but no further psychometric properties (e.g., validity, specificity, and sensitivity) were reported. The apraxia test by Alexander and colleagues is exemplary for many tests for which no complete psychometric analyses have been conducted [30, 31, 57] as they were developed for scientific purposes.
Test battery by Bartolo and colleagues [4]
In the following two comprehensive test batteries for the assessment of upper limb apraxia will be described [4, 50]. Both test batteries are based on a cognitive model of limb apraxia originally devised by Rothi and colleagues [55, 56] or a slight modification thereof by Cublli and colleagues [14]. The aim of the comprehensive test battery by Bartolo et al. [4] is to assess as many aspects as possible of their apraxia model. To examine the semantic pathway (here called lexical route), the following tasks were proposed by the authors: production of intransitive, meaningful gestures, pantomime of object use, and actual use of single objects (note that no items for the assessment of complex object use were included). Furthermore, the semantic pathway was tested via different input modalities. That is, intransitive, meaningful gestures were executed either after verbal command or after visual presentation of pictures on which different scenes were displayed prompting for a specific gesture. In addition, intransitive meaningful gestures have to be imitated by the patient after they had been demonstrated by the examiner. Also, pantomime of object use (transitive gestures) is tested via different input modalities, namely either after verbal command, after visual presentation of the object, or after recognizing the object via tactile exploration. Finally, also transitive gestures are tested by means of an imitation task. Impairments of the structural (here: non-lexical) pathway are assessed with the help of tasks requiring the imitation of meaningless gestures. This test battery specifically contains tasks in which the recognition and comprehension of symbolic gestures are assessed without the requirement of actually producing these gestures. This subtest is motivated by the fact that the cognitive model of apraxia by Cubelli et al. [14] allows for a dissociation between gesture production and gesture comprehension. In fact, such a dissociation (intact imitation but impaired recognition of meaningful gestures) has previously been described in patients with left-hemisphere brain damage [54]. Overall, this extensive test battery by Bartolo and colleagues [4] contains 13 different kinds of tasks with each comprising at least 20 items. As a result of the huge amount of test items, the administration of this test battery takes about 2 h already in healthy subjects and requires an extensive amount of test material. Cut-off values indicating an apraxic impairment were determined based on the means and standard deviation of 60 healthy control subjects (mean −3, standard deviations −1). For subtests on which healthy subjects did not make any mistakes at all (mean equates to maximal score, standard deviation equates to zero), the cut-off value was defined as the maximal subtest score minus one. However, these cut-off values have to be called into question, as the test battery has not yet been applied to an adequate patient sample. For the same reason, no details can be provided about specificity and sensitivity as well as reliability and validity of this apraxia test battery.
Florida apraxia battery-extended and revised Sydney (FABERS)
A second comprehensive test battery for the assessment of upper limb apraxia (FABERS) was published recently by Power and colleagues [50]. Their test battery was based on the cognitive model of apraxia by Rothi and colleagues [55, 56]. Like the test battery by Bartolo and colleagues, this apraxia test also contains tasks for the assessment of both the semantic and the structural pathway. Tasks requiring the production of transitive and intransitive but meaningful gestures are adopted to test the semantic pathway. Again, different input modalities are used to prompt the gesture production (language: verbal instruction for transitive (i.e., pantomime) and intransitive, meaningful gestures; objects: visual presentation of objects triggering transitive gestures; gestures: imitation of transitive and intransitive gestures). Similarly, Power and colleagues used tasks that require the imitation of meaningless gestures and motor sequences to detect possible impairments of the structural pathway. In contrast to the test battery by Bartolo and colleagues, the test battery by Power and colleagues does not contain any items assessing the actual use of objects.
In addition to the tasks assessing gesture production, this apraxia test battery also contains tasks to examine the comprehension of meaningful gestures. Administration of the whole test procedure is estimated to take about 45 min. Based on the data of a sample of 16 healthy elderly control subjects, the inter-rater reliability of the different subtests was shown to be high (≥89%). Moreover, a cut-off value indicating impairment was determined based on the performance of these 16 healthy elderly controls (values below the 10th percentile of the 16 subjects indicate an impairment). However, the test battery has not yet been applied to stroke patients and thus specificity and sensitivity of the test procedure are unknown.