The effects of performance-based assessment criteria on student performance and self-assessment skills
- First Online:
- Cite this article as:
- Fastré, G.M.J., van der Klink, M.R. & van Merriënboer, J.J.G. Adv in Health Sci Educ (2010) 15: 517. doi:10.1007/s10459-009-9215-x
- 1.8k Downloads
This study investigated the effect of performance-based versus competence-based assessment criteria on task performance and self-assessment skills among 39 novice secondary vocational education students in the domain of nursing and care. In a performance-based assessment group students are provided with a preset list of performance-based assessment criteria, describing what students should do, for the task at hand. The performance-based group is compared to a competence-based assessment group in which students receive a preset list of competence-based assessment criteria, describing what students should be able to do. The test phase revealed that the performance-based group outperformed the competence-based group on test task performance. In addition, higher performance of the performance-based group was reached with lower reported mental effort during training, indicating a higher instructional efficiency for novice students.
KeywordsCompetence-based assessment criteria Mental effort Performance-based assessment criteria Self-assessment skills Task performance
In competence-based education, authentic learning tasks based on real-life problems are the driving force behind training, simultaneously encouraging the development of professional skills and more general competences like being self-directed. Competence-based education is a dominant trend in vocational education in many European countries (Wesselink et al. 2007). The aim is to prepare students for the workplace where people are expected to be broadly educated while stimulating lifelong learning (van Merriënboer et al. 2002, 2009). Because competences are context-bound and the aim of vocational education is preparing students for the workplace, students should always develop competences in the context of a profession (Biemans et al. 2004). When teachers want to judge the competence development of their students, student assessments performed in a real-life context can support their findings.
Describing and making clear and public what the learner is intended to achieve changes the nature of assessment from a tutor-led system with fuzzy objectives and undisclosed criteria, to a student-led system with greater emphasis on formative development and personal responsibility. (p. 45).
In the behavioural tradition of instruction and instructional design, assessment criteria were performance-based, meaning that they described the desired performance in terms of what the student has to do (e.g. Mager 1984). With the introduction of competence-based education, assessment criteria are often formulated as competences, in terms of what the student is able to do. However, no research so far has investigated the effects of this introduction of competence-based assessment criteria. The main goal of this study is to investigate the effects of competence-based versus performance-based assessment criteria on learning, test task performance and students’ self-assessment skills.
The difference between performance-based and competence-based assessment criteria should be seen as a continuum, where on the one end assessment criteria are formulated as competences, which are an integration of knowledge, skills and attitudes; and on the other end assessment criteria are formulated as performance indicators. Performance-based criteria can be linked directly to competence-based criteria and vice versa as they complement each other. When discussing the continuum, the two extremes and their underlying connection will be tackled. The discussion will be coupled to the level of experience students have as it can be assumed that students with different levels of experience will have different needs concerning assessment criteria (Kalyuga 2007). In this article the focus is on the needs of novice students.
First, with regard to what is assessed, when assessing with competence-based criteria, the competences underlying the performance are the focus of the assessment. What is assessed is the student’s ability to perform a certain task. However, competences as a whole are not directly observable (Grégoire 1997). Certain aspects of competences are observable, like particular skills the students demonstrate, but certain aspects are hidden, like their self-concept and personal characteristics that influence their performance (Spencer and Spencer 1993).
When assessing with performance-based criteria, the observable behaviours produced by the students are the heart of the assessment. The question is not if the student is able to perform the task, but if the student shows good performance (Grégoire 1997). In order to show this good performance, students probably also know how to perform and consequently master the underlying competences necessary for performing the task (Miller 1990). For example, in the case of stoma care, the student shows he can remove the stoma in a correct way. An underlying competence is supporting the patient according to protocols, regulations and the vision of the organisation but the performance criterion is removing the stoma in a correct way. This means there is a direct link between what students show (performance) and what students are able to do (competence). Every performance shown involves one or more competences the student has to possess to perform well, and every competence can be shown in several behaviours of the student.
Because for novice students it is important in an early stage to obtain an idea of how well they are doing, the directly observable character of the performance-based criteria may be expected to be more beneficial to assess their task performance. Based on these performance-based criteria, the development of the students from the beginning on can be monitored. In order to improve novice students’ self-assessment skills, it is easier to assess what they are actually doing because this is more objective than their ability to do so. Therefore, with regard to what is assessed, performance-based criteria are expected to be more beneficial for supporting novice students’ learning than competence-based criteria. In later stages, it is important for students to learn to see the link with the underlying abilities they are developing.
Second, with regard to the nature of the criteria, to uncover competence development, consistency of proof of competence level across different tasks is needed (Albanese et al. 2008; Grégoire 1997). It is therefore important to formulate competence-based assessment criteria in a way that they can be used across different tasks and thus are task-independent. For example, a nurse has to be able to conduct nursing technical skills. In one situation this means replacing a stoma bag while in another situation this means washing a patient.
To judge student performance on a certain task, performance-based assessment criteria should be formulated on task-level as for each task a different set of criteria is relevant. Performance-based criteria are thus task-dependent. As is shown by Fastré et al. (2009), for novice students it is important to know the relevant criteria in every task. For example, when a nurse has to conduct stoma care, some of the relevant criteria are to remove the old stoma bag and apply a new one.
It is likely that when students know exactly what to do, their motivation, learning and performance will increase significantly (see for example Ecclestone 2001). Moreover, Miller (2003) argues that having task-specific assessment criteria leads to a better quantitative differentiation of performance levels. This more detailed view on students’ performance, would argue for the use of performance-based assessment criteria. Following the results of Fastré et al. (2009), it can be concluded that the use of performance-based criteria is especially beneficial for novice students because of their task-specific character.
Third, the competence-based assessment model currently used in Europe, starts from a fixed set of competences that are categorically divided (e.g. communication skills, nursing technical skills). No further decomposition of the competences is made. The formulation of the competence-based assessment criteria is therefore holistic (Grégoire 1997). This does not mean that when working with competence-based assessment criteria only a holistic judgment on the end result is given, but the criteria are more holistically formulated than the performance-based criteria.
Gulikers et al. (2008) discuss the notions of analytic versus holistic grading from the perspective of the level of experience of students. They argue that novice students need analytic criteria as guidelines in a step-by-step process leading to the desired behaviour. In future tasks, this helps to set appropriate learning goals (Eva and Regehr 2005). For more experienced students, analytic criteria may hamper their learning process because they have to be stimulated to keep their focus on a certain outcome level and they do not need the step-by-step approach any more (Scheffer et al. 2008). Following these ideas, for novice students it would be better to receive performance-based assessment criteria.
Finally, with regard to mental effort, when designing a study program, including assessment, it is important to strive for the optimal level of using students’ cognitive capacity (van Gog and Paas 2008). Cognitive load theory presupposes that people have a limited working memory capacity (Sweller et al. 1998; van Merriënboer and Sweller 2005). Because of this limited capacity, it is essential for learning to properly allocate the available cognitive resources (Kalyuga et al. 2003).
An important difference can be distinguished here between novice students and more experienced students. For novice students, it is important to provide sufficient guidance that compensates for the limited knowledge they have on the task at hand (e.g. stoma care) by providing them performance-based assessment criteria because this requires less cognitive capacity for the assessment and most of their working memory capacity can be devoted to the task of stoma care. For more experienced students, who already have some knowledge on the task at hand (e.g. stoma care), competence-based assessment criteria can provide them with an extra stimulus to think about the task in another way and thereby make the extra cognitive capacity beneficial for them. In addition, providing these students with performance-based assessment criteria would give them redundant information on the task which may hamper their learning. This is called the expertise reversal effect (Kalyuga 2007).
Summarising, it appears that for novice students, performance-based criteria have more advantages than competence-based criteria because: (1) They are directly observable, (2) they lead to a better quantitative differentiation of levels of performance, (3) they stimulate a step-by-step process leading to desired performance, and (4) they require less cognitive capacity for assessment leaving more capacity for learning the task at hand. The following section describes the hypotheses following this comparison.
The first hypothesis is that students who receive the performance-based criteria during learning will show superior test task performance compared to students who receive the competence-based criteria because they know better what is expected from their performance. The second hypothesis is that students who receive the performance-based criteria will experience a lower mental effort during assessment than students who receive the competence-based criteria. The third hypothesis is that students who receive the performance-based criteria will be better self-assessors than students who receive the competence-based criteria because they are better able to assess their performance.
Thirty-nine second-year students of a school for Secondary Vocational Education, attending a Nursing and Care program (Level 3 and 4 in the European Qualifications Framework, 2 males and 37 females) participated in this study as part of their regular training on the nursing task of stoma care. Their mean age was 18.07 years (SD = 1.05). Participants were randomly assigned to one of the two conditions: competence-based criteria (n = 20) and performance-based criteria (n = 19).
A lecture was developed that provided students with the theoretical background of stoma care. The two teachers who were responsible for this lecture set up the lecture together.
Video examples and video assessment
An electronic learning environment was developed including six video fragments (±3 min each) in which an expert nurse shows good stoma care behavior. All fragments are subsequent parts of the whole task of stoma care: (1) Introduction, (2) preparation, (3) removing the old stoma bag, (4) applying the new stoma bag, (5) finishing off care, (6) evaluation and reporting. Students individually watched the video fragments on a computer screen. They were not allowed to put the fragment on hold, and they could watch the video a maximum of three times. On average, students watched the video 1.14 times (SD = .29). No differences between conditions were found.
In order to encourage students to make the assessment criteria more concrete, students in both groups had to indicate the manner in which the nurse in the fragment showed good behaviour on the criteria by typing their answer in the text boxes.
Practical lesson, peer assessment, and self-assessment
A practical training session was developed in which students had to practice in pairs or groups of three the task of stoma care with a fellow student being the patient. After students had performed the task, they had to score their peers’ task performance on the same list of criteria as in the assessment of the video examples. The students in the competence-based condition received the list with competence-based criteria (PA-C) and students in the performance-based condition received the list with performance-based criteria (PA-P). They had to indicate how well their peers mastered the criteria on a four-point scale: (1) behaviour not shown, (2) behaviour shown but insufficient, (3) behaviour shown and sufficient, (4) behaviour shown and good. In addition to this peer assessment, students had to self-asses their task performance using the identical list of competence-based criteria (SA-C) or performance based criteria (SA-P), using the same four-point scale. While practising the task, students also received oral feedback on their task performance from the instructor in the room.
Examination and self-assessment
An examination was developed in which students individually had to perform the task of stoma care with a simulation patient. Afterwards they had to assess their own performance on that particular task by filling in a blank paper with the question: assess your own performance on this task and indicate what went well and what went wrong.
Reliability of the self-directed learning skills questionnaire
Self-directed learning skills questionnaire
Relevance of self-assessment
I think the opinion of the teacher is more important than self-assessment
Ability to self-assess
I can assess to what extent my performance fits the assessment criteria
At the end of the lecture, a 15-item multiple choice test was taken to test the students’ knowledge on this subject.
Judgment scheme for video assessment
To measure the accuracy of the video assessment, judgment schemes specified the quality of the video assessments. The overall score for quality of video assessment was the sum of the z-scores of the following aspects: how many words the students used because it is expected that performance-based criteria stimulate students more to elaborate on their answers (count of the number of words), if they gave concrete examples of the nurse’s behaviour (0 = no concrete behaviour, 1 = concrete behaviour), and if they gave a judgment on the behaviour of the nurse (0 = no judgment, 1 = judgment). The higher the sum of the z-scores, the better the score for quality of video assessment as it is important that the combination of these factors is of a high quality. The quality of the video assessments was judged by two raters, with a high interrater reliability of r = .82, p < .00.
Mental effort rating scale
After the assessment of each video fragment, students were required to fill in the rating scale of Paas (1992) that measured their mental effort as the ‘effort required to perform the assessment’, ranging from a very small amount of effort (1) to a very high amount of effort (7).
Peer assessment of task performance
The peer assessments during the practical lesson indicated the task performance of the students assessed by the peers, using the competence-based criteria in one group and performance-based criteria in the other group. Peer assessed task performance was the average score on all the assessment criteria.
Self-assessment of task performance
The self-assessments during the practical lesson indicated the task performance of the students by the students’ own opinion, using the competence-based criteria in one group and performance based criteria in the other group. Self-assessed task performance was the average score on all the assessment criteria.
Teacher assessment of test task performance
During the examination, the teachers observed and assessed the test task performance of the students, who took care of the stoma of a simulation patient, on the list of performance-based criteria. A second assessor co-assessed with each of the teachers to measure the reliability of the assessments. The correlation between the scores of the teacher and the second assessor, r = .77, p < .01, appeared to be acceptable.
Judgment scheme for self-assessment
The overall score for quality of the self-assessments during examination was the sum of the z-scores of the following aspects: how many words the students used because it is expected that performance-based criteria stimulate students more to elaborate when self-assessing (count of the number of words), how many criteria the students came up with (count of the number of criteria), if students had a critical attitude to their own performance (0 = no critical attitude, 1 = critical attitude), and if they formulated points of improvement (0 = no points of improvement, 1 = points of improvement). The higher the sum of the z-scores, the better the score for quality of self-assessment because it is important that the combination of these factors is of a high quality. The quality of the self-assessments was judged by two raters, with an interrater reliability of r = .82, p < .00.
Reliability of the perception measures
Inventory of perceived study environment
Interesting course materials
The learning task is interesting
I know what is expected of me when performing the task
Intrinsic motivation inventory
Interest and pleasure in learning tasks
I enjoy working on the learning task
Interest and pleasure in reflection
I find it interesting to reflect
I would like to conduct more learning tasks because they are useful
Measure of agreement
For the peer assessments and the self-assessments during the practical lesson, the agreement of the scores between the self- and peer assessments was measured by computing the Pearson’s correlation.
High efficiency indicates that with a relatively low mental effort during training a relatively high task performance in the examination is accomplished, while a low efficiency indicates that with a relatively high mental effort during training a relatively low task performance is accomplished. For example, instructional efficiency is higher for an instructional condition in which participants attain a certain performance level with a minimum investment of mental effort than for an instructional condition in which participants attain the same level of performance with a maximum investment of mental effort.
At the start of the lecture, the background questionnaire was administered. After students had filled in the questionnaire, the lecture was given and the multiple choice test was taken. This phase lasted for 90 min.
After the lecture students were instructed to assess the video examples. While doing this, students were exposed to the stoma care by watching video examples of an expert nurse showing the intended behaviour, which is the first step in the taxonomy of Steinaker and Bell (1979). Students were split up in the two experimental groups to work on the assessment of video examples. Students could work on the assessment of video examples for maximum 90 min. After the assessment of video examples, the practical lesson with peer and self-assessments took place for 90 min. In this lesson, students could participate in stoma care by practicing on a doll (second step).
One week after the practical lesson, students had to conduct the examination after which they had to assess their own performance. In this examination, they could identify with the stoma care because they were exposed to a simulation patient in performing the care (third step). Student performance was assessed by a teacher. At the end of the examination the evaluation questionnaire was filled in by the students. The examination including self-assessment lasted for 40 min. After the whole experiment, students were sufficiently prepared for further practice during internships which leads them to internalise the competence of stoma care (fourth step).
Means and standard deviations for dependent variables
Competence-based (n = 20)
Performance-based (n = 19)
Score on prior knowledge test
Score quality of video assessment
Number of words
Concreteness of answer
Task performance scored by peer
Task performance scored by self
Test task performance scored by teacher
Score quality of self-assessment
Number of words
Number of criteria
Points of improvement
On the background questionnaire, no significant difference between the conditions was found, indicating that students did not differ in background at the end of the lecture.
On the knowledge test, no significant difference between the conditions was found, indicating that all students had the same level of knowledge at the end of the lecture. Thus, students had the same background and prior knowledge before they started to study the video examples.
On the overall score for quality of video assessment, a significant difference between the conditions was found, z = −1.964, p < .05. Students in the performance-based condition had an average rank of 18.21, while students in the competence-based condition had an average rank of 12.00. More specifically, on number of words no difference was found. In concreteness of answers, a significant difference was found, z = −1.716, p < .05. Students in the performance-based condition had an average rank of 18.40, while students in the competence-based condition had an average rank of 13.75. No significant difference in judgment was found. A further qualitative analysis of the data reveals that students in the competence-based condition often decoded the competence-based assessment criteria into the performance-based criteria as an answer but were not able to describe the concrete behaviour.
Mental effort during assessment of the video examples is an average score of the scores during assessment of the six film fragments. On mental effort, a significant difference between conditions was found, z = −3.964, p < .001, indicating that students in the performance-based condition had an average rank of 12.61, while students in the competence-based condition had an average rank of 27.03.
On peer assessment and self-assessment of task performance in the practical lesson, no significant differences between conditions was found. Yet, a moderate agreement between peer and self-assessment was found, r = .65, p < .00, indicating that students’ self-assessment scores corresponded with the scores of their peers. For the performance-based condition r = .66, p < .01, and for the competence-based condition r = .63, p < .01.
On test task performance, a significant difference between conditions was found, z = −2.037, p < .05. Students in the performance-based condition had an average rank of 23.82, while students in the competence-based condition had an average rank of 16.38. On the overall score for quality of self-assessment, no significant differences between both conditions were found. Although not significant, the direction of the differences was in line with the expectations. On instructional efficiency, a significant difference between conditions was found, z = −3.962, p < .001, indicating that students in the performance-based condition had an average rank of 27.42, while students in the competence-based condition had an average rank of 12.95.
Means and standard deviations for evaluation questionnaire
Competence-based (n = 20)
Performance-based (n = 19)
Interesting course material
Interest and pleasure
Interest and pleasure in reflection
No significant differences were found between conditions. Being in the performance-based or competence-based condition did not influence students’ perceptions of the learning task.
The goal of this study was to investigate the effects of competence-based versus performance-based assessment criteria on students’ test task performance and self-assessment skills. The first hypothesis, stating that students who receive the performance-based criteria will be better task performers than students who receive the competence-based criteria is confirmed by the data. It seems that novice students who receive the performance-based criteria during training know better what is expected from their task performance and are better able to show desired performance than students who receive the competence-based criteria. A possible explanation is the finding that students who receive the performance-based criteria had a higher quality of video assessments in the learning phase. They were especially better in being concrete on the desired behaviour, which may have led to better task performance in the test phase. This is in line with the ideas of Eva and Regehr (2005), who state that performance-based criteria make it easier to distinguish levels of performance, enabling a step-by-step process of performance improvement.
The second hypothesis, stating that students who receive the performance-based criteria experience a lower mental effort during assessment than students who receive the competence-based criteria is also confirmed by the data. It appears that by providing novice students with performance-based assessment criteria, they have to invest less mental effort to assess their task performance. This effect is positive when it leads to a better test task performance because this would mean that during training the reduced load of assessment permits more cognitive capacity for learning to perform the task of stoma care.
Indeed, the findings concerning the first and second hypotheses together allow to conclude that the performance-based assessment criteria result into a higher instructional efficiency, since students in the performance-based condition experience a lower cognitive load during the learning phase, followed by a higher performance on the test task (Paas and van Merriënboer 1993; van Gog and Paas 2008). Providing novice students with performance-based assessment criteria thus leads to more efficient learning.
The third hypothesis, stating that students who receive the performance-based criteria become better self-assessors than students who receive the competence-based criteria, is not confirmed by the results. This finding is, however, in line with the findings of Dunning et al. (2004), who also found that for novice students knowing the assessment criteria does not necessarily imply the ability to assess their own performance on those criteria. As self-assessment can be seen as a complex cognitive skill, one of the key words in developing this skill is sufficient practice (van Merriënboer and Kirschner 2007). It is likely that students need considerably more practice than provided in the current study to improve their self-assessment skills.
Finally, students did not differ in their perceptions of the learning environment. Receiving competence-based or performance-based criteria thus did not influence their appreciation of the learning task. The findings indicate that both groups were positive about the learning task as a whole and especially valued the provided video examples.
The results of this study show that for novice students performance-based assessment criteria do lead to a lower mental effort during learning and a higher test task performance, which is in line with our theoretical assumption that for novice learners it is better to use performance-based criteria than competence-based criteria. The question remains, however, what causes the observed effects. The relative importance of the separate dimensions of Fig. 1 was not investigated in this study and further research is required to determine the contribution of the various dimensions to the reported effects on mental effort during learning and test task performance. Is it because these criteria refer to directly observable behaviour? Or is it because the criteria are more task-dependent? Maybe the analytic character of the criteria is the driving force behind these effects? These insights could serve as a guideline for teachers in the development of performance-based assessment criteria and should be further examined.
Furthermore, the effects of providing students with performance-based assessment criteria should be examined with students in later years of the educational program to explore differences between novice and more experienced students as it is expected that students in later phases of their educational program have to learn to think on a higher level and thus work more efficient with competence-based criteria.
A shortcoming of this study is the limited duration of the intervention. Because this intervention was restricted to only one learning task (i.e. stoma care), students did not get the opportunity to practice extensively on their skill development. This was most visible for the complex cognitive skill of self-assessment. According to van Merriënboer and Kirschner (2007), more training is needed to develop this kind of skill. Furthermore, only a small sample was used in the study. The question remains if the results are transferable to larger groups of students or students in other domains. Nevertheless, the fact that this intervention yielded some important results concerning mental effort expenditure during learning and test task performance is a sound basis for further research on this topic.
The findings yield the clear guideline that novice students should be provided with performance-based assessment criteria in order to improve their learning process, and reach higher test task performance. For instructing young nurses in the beginning of their study, performance-based assessment criteria are a necessity to guide their learning process. It should be noted, however, that formulating such performance-based criteria is a demanding task. To assure a sound implementation, training should be provided to teachers to increase their skills in formulating performance-based assessment criteria, based on a systematic process of drawing up a skills hierarchy with related criteria. When students progress in the study program, explicit attention should be paid to training students to interpret their own behaviours in terms of the underlying competences. In this way, students learn to see the link between performance and competence development. If this is not explicitly in the program, students remain on a lower level of thinking.
To conclude, the introduction of competence-based education primarily consisting of authentic learning tasks based on real-life problems, leads educators to solve the issue of how to redesign their assessment programs. Our results show that stating that competence-based assessment criteria are the answer to this problem is a step too far. Whereas competences seem to be a good starting point to develop professional education, they do not always serve this purpose for assessment. At least for novice students, providing them with performance-based assessment criteria is more beneficial than providing them with competence-based criteria. This study shows that novice students need less mental effort to assess their task performance and show higher test task performance, that is, they learn more efficiently when being provided with performance-based assessment criteria.
We would like to thank the participants in this study and the staff of ROC A12 for all their help in conducting this research. Participants were offered confidentiality. Ethical approval was not necessary as this study was part of the normal education program.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.