Introduction

Professionalism is becoming increasingly central in undergraduate and postgraduate training, and the herewith associated research results in a vast increase in the number of papers on the topic (van Mook, de Grave et al. 2009). Tools for assessing professionalism and professional behaviour have been developed to identify, counsel, and remediate the performance of students and trainees demonstrating unacceptable professional behaviour (Papadakis et al. 2005, 2008). Since validated tools are scarce (Cruess et al. 2006), combining currently available instruments has become the current norm (Schuwirth and van der Vleuten 2004; van Mook, Gorter et al. 2009). Self- and peer assessment and direct observation by faculty during regular educational sessions (Singer et al. 1996; Asch et al. 1998; Fowell and Bligh 1998; van der Vleuten and Schuwirth 2005; Cohen 2006) are some of these tools. Self-assessment is defined as personal evaluation of one’s professional attributes and abilities against perceived norms (Eva et al. 2004; Eva and Regehr 2005; McKinstry 2007). So far, there is a scarcity of published studies on self-assessment of professionalism (Rees and Shepherd 2005). Given the poor validity of self-assessment in general (Eva and Regehr 2005), it seems ill advised to use self-assessment in isolation without triangulation from other sources. Peer assessment involves assessors with the same level of expertise and training and similar hierarchical institutional status. Medical students usually know which of their classmates they would trust to treat their family members, which illustrates the intrinsic potential of peer assessment (Dannefer et al. 2005). However, a recent analysis of instruments for peer assessment of physicians revealed that none met the required standards for instrument development (Evans et al. 2004). Studies addressing peer assessment of professional behaviour of medical students are beginning to appear (Freedman et al. 2000; Arnold et al. 2005; Dannefer et al. 2005; Shue et al. 2005; Lurie, Nofziger et al. 2006a, b). Observation and assessment by faculty using rating scales is another commonly used method of professional behaviour assessment (van Luijk et al. 2000; van Mook, Gorter et al. 2009; van Mook and van Luijk 2010). Prior studies have revealed that such teacher-led sessions are highly dependent on the teacher’s attitudes, motivation and instructional skills (van Mook et al. 2007). When teachers’ commitment declines, assessment of professional behaviour may become more trivialised. This may misplace emphasis on attendance rather than participation and on completion of tick boxes rather than informative feedback and students’ contribution and motivation (van Mook et al. 2007). In an attempt to further improve professional behaviour assessment, the triangulated teacher-led discussion of self- and peer-assessment of professional behaviour using a paper form is the contemporary practice at Maastricht medical school (van Mook and van Luijk 2010).

However, digital technologies have come to influence our ways of working and communicating, and created technology driven ways of teaching, learning and assessing (De Leng 2009). Adaptation to some of these changes can be useful (De Leng 2009), for example to reduce the time and expense involved in collecting self and peer ratings and facilitate anonymous information gathering and information analysis. In the National Board of Medical Examiners (NBME)’ Assessment of Professional Behaviours (APB) program such web-based technology is contemporarily used (Mazor et al. 2007, 2008; National Board of Medical Examiners 2010). The study presented in this paper investigated the potential advantages of a web-based instrument versus a ‘classic’, paper-based method to assess professional behaviour in tutorial groups in a problem-based curriculum. In a comparison of these two approaches we focused on:

  1. 1.

    The quantity and quality of comments provided by students and the feedback provided by their tutor and peers, and on

  2. 2.

    The feasibility, acceptability and perceived usefulness of the two approaches.

Methods and research tools

The study involved all medical students enrolled in the second, ten-week course in year 2 at the Faculty of Health, Medicine and Life Sciences, Maastricht University, the Netherlands. During the bachelor programme of the six-year problem-based medical curriculum, professional behaviour is assessed on various occasions in tutorial groups during all regular courses (van Mook and van Luijk 2010). Each tutorial group consists of ten students on average and a tutor/facilitator, and each meeting lasts 2 h. For the purpose of this study, the students were divided into two groups: those in tutorial groups with even numbers and those in groups with odd numbers. The first group used a web-based instrument to assess professional behaviour and the other group used the usual method with a paper assessment form. We will first describe the two assessment methods in some detail.

The ‘classic’ paper-based professional behaviour assessment form

The working group Consilium Abeundi of the Association of Universities in the Netherlands, proposed a practical definition of professional behaviour (Project Team Consilium Abeundi van Luijk 2005; van Luijk et al. 2010; van Mook and van Luijk 2010). They framed professionalism as observable behaviours, reflecting the norms and values of the medical professional. Three categories of professional behaviour were distinguished: ‘Dealing with work and tasks’, ‘Dealing with others’, and ‘Dealing with self-functioning’ (van Luijk et al. 2000; Project Team Consilium Abeundi van Luijk 2005). These categories, together with the related clarifying descriptions, are the basis of the professional behaviour assessment form that is in use at Maastricht medical school since 2002 (Fig. 1) (van Mook and van Luijk 2009, 2010). Early in the curriculum students are familiarised with its use. Professional behaviour is assessed at the start, halfway through and at the end of each regular course. For the halfway assessment each student prepares a self-reflective assessment and enters it in the form using the clarifying descriptions as a reminder and starting point. In the subsequent plenary session of the tutorial group, chaired by the tutor, the professional behaviour of each student is assessed by the group. All group members (students and tutor) are required to contribute to the discussion. All the comments and feedback are documented on each student’s form by the tutor. At the end of the course this process is repeated, followed by a final summative assessment, resulting in a pass or fail (van Mook and van Luijk 2010).

Fig. 1
figure 1

The paper form used for professional behaviour evaluation and assessment at the Faculty of Health, Medicine and Life Sciences, Maastricht University, The Netherlands

The web-based instrument

The web-based instrument is based on an application that consists of a 360° feedback system specifically designed for higher education. Its development involved more than thirty pilot studies and evaluations by over 6,000 students. Prior to the current study, the tool was piloted at Maastricht in a group of first year students, which did not participate in the current study. Providing adequate practical information to students and tutors prior to using the application and rephrasing of items to achieve a more detailed focus on aspects of professional behaviour were considered prerequisite for the successful implementation of web-based assessment (unpublished data). The web-based instrument used for assessment of professional behaviour pertained to the same three categories (and clarifying descriptions) also used on the paper form (Project Team Consilium Abeundi van Luijk 2005). Ample information about background, confidentiality, timing and some practical matters was made available to students and staff electronically and in writing prior to, and during the opening session of the course, as well as verbally to the tutors during the tutor instruction session. Halfway and at the end of the course each student in the web-based assessment group received an internet link in an e-mail. Clicking the link gave access to the web-based assessment instrument. The students were asked to complete the questions themselves and then invite five peers and the tutor of their group to evaluate their professional behaviour and provide feedback. Selection of the peer students was standardised to the five students listed immediately below the student’s name on the centrally randomly generated list of the tutorial group members, resulting in a semi-anonymised feedback procedure. Ample space for narrative feedback relating to the three categories of professional behaviour was provided for each questionnaire item (Fig. 1). All items were also answered using a Likert scale (1 = almost never to 5 = almost always). The students received the results of the feedback process in the form of a printable report presenting the results of their self-assessment relative to the assessment by their peers as well as an overview of all the narrative comments. The web-based group used the printed reports and the paper-based group used the completed paper-based forms to discuss each student’s professional behaviour during the end-of-course assessment in the final tutorial group of the course.

End-of-course questionnaire

At the end of the last tutorial group of the course, all students of the two groups were asked to complete a questionnaire addressing fourteen aspects of feasibility, acceptability and perceived usefulness of the two instruments. The tutors were invited to report their findings by e-mail. All data were recorded and analysed anonymously.

Analysis

All narrative comments were independently coded and analysed by two blinded researchers (WvM and SG). Units of comments consisting of one grammatical clause but covering different topics were considered to be different units of comments. They used five generally accepted feedback rules to label the units of comments as incorrect or correct feedback (Table 1) (Pendleton and Schofield 1984; Branch and Paranjape 2002). Since feedback should preferably meet as many criteria as possible, an aggregated score across all five feedback categories was constructed (total scores of 0–3 were considered unsatisfactory and requiring improvement; total scores of 4 and 5 were considered satisfactory). The researchers (WvM and SG) discussed any discrepancies in their coding until agreement was reached. The number and nature of the comments from the interim and final evaluations were compared between the web-based (intervention) group, and the paper-based (control) group using Pearson’s Chi Square test. Independent samples t testing was used to perform quantitative survey analysis of the questionnaire scores. SPSS 16.0.1 was used for the statistical analyses (SPSS 2007). Effect sizes were computed where applicable as an indicator of the results’ practical (clinical) relevance (independent of sample size). For between-group comparison of proportions \( p_{\text{paper}} ,{\text{and }}p_{\text{web}} \), effect sizes (ES) were calculated according to

$$ ES = 2\left| {\arcsin \sqrt {p_{\text{web}} } - \arcsin \sqrt {p_{\text{paper}} } } \right| $$

For between-group comparisons of mean scores \( m_{\text{paper}} ,{\text{and }}m_{\text{web}} \), effect sizes ES were calculated according to

$$ ES = \left| {m_{\text{web}} - m_{\text{paper}} } \right|/\sigma_{\text{paper}} $$

where \( \sigma_{\text{paper}} \) is the standard deviation in the paper-based group. ES ≅ 0.30 was considered a small effect of negligible practical importance; ES ≅ 0.50 was considered a medium effect of moderate practical importance, and ES ≅ 0.80 large effect of considerable practical importance (Cohen 1987; Hojat and Xu 2004).

Table 1 Commonly used feedback rules (Pendleton and Schofield 1984; Branch and Paranjape 2002)

Results

Of the 307 (198 females, 109 males) medical students enrolled in the second course of the second year, 150 were assigned to tutorial groups with even numbers (web-based group) and 157 to the tutorial groups with odd numbers (paper-based group). Since assessment of professional behaviour is mandatory, the participation rate was 100%. We will first present the results of the quantitative and qualitative analyses of the narrative feedback provided at the interim and final assessment of professional behaviour in the two groups. After that we present the results for the feasibility, acceptability and perceived usefulness of the two assessment instruments.

Analysis of the amount of feedback

The total numbers of comments per category of professional behaviour for the interim and the final assessments are presented for the web-based and the paper-based group (Table 2). In the web-based group a mean of 4.5 invitations to peers to provide feedback were sent by each student (standard deviation 3). The number of comments was significantly higher in this group than in the paper-based group for all three categories (dealing with work, others and oneself). However, at the final assessment the total number of comments relating to the category ‘Dealing with oneself’ had halved compared to the interim assessment and this decrease was mainly attributable to a marked decrease in the comments provided by the web-based group.

Table 2 The total number of comments per category of professional behaviour in the web-based and the paper-based group at the interim and final assessment

Analysis of the quality of the feedback

The results for the quality of feedback at the interim and the final assessments are presented for the categories of professional behaviour (dealing with work, others and oneself) as correct and incorrect in relation to the aggregated score on the five feedback rules (Table 3). The inter-rater agreement during primary coding was 15834 out of 16165 codes (97.9%) for 3233 comments (Kappa = 0.87).

Table 3 Number of correct and incorrect scores in the web-based and paper-based groups regarding the aggregated scores on the five feedback criteria for the categories of professional behaviour at the interim and final assessment

When statistical significance was corrected for sample size no difference in the aggregated scores on all five feedback categories was found. Several differences of potential practical significance pertaining to the five generally accepted feedback rules were however, found, all favouring paper-based assessment. The feedback in the web-based group pertaining to the categories ‘Dealing with others’ and ‘Dealing with oneself’ was for example more often unrelated to observed behaviour. However, in the view of the statistical ‘multiple comparisons’ problem, the statistical significance regarding the between-groups differences revealed by these individual analysis is questionable. Finally, the majority of comments relating to the category ‘Dealing with oneself’ consisted of descriptions of a student’s attendance to the neglect of other aspects of personal functioning.

Feasibility, acceptability and perceived usefulness

The response to the questionnaire was 96% (143/157) in the paper-based group and 81% (121/150) in the ‘web-based’ group. Table 4 shows the mean scores per questionnaire item for the two groups. The differences between the groups were significant for all items and in favour of the paper-based form. The paper-based group yielded fourteen student comments on the use of the paper form. Nine comments emphasised the usefulness of the form, four offered suggestions for minor adaptations and one stated that the form was time consuming. The tutors did not comment on this form. The students in the web-based groups provided 179 comments (Table 5) and the tutors provided fifteen comments. The tutors emphasised the absence of interpersonal contact and the necessity of face-to-face contact during assessment of professional behaviour. They also thought that the web-based instrument was time consuming and they reported some technical problems (e.g. printing).

Table 4 Scores on the items of the questionnaire on the acceptability, feasibility and usefulness of web-based versus paper-based assessment of professional behaviour
Table 5 Remarks by students regarding web-based professional behaviour assessment

Discussion

Although other web-based approaches to assessment of professional behaviours have been studied (Mazor et al. 2007, 2008; Stark et al. 2008) as well as are contemporarily used(National Board of Medical Examiners 2010), very few studies specifically address the amount and quality of feedback resulting from using such approach. The assessment method that is currently used at Maastricht medical school requires each student to reflect on their professional behaviour and requires all members of a tutorial group (tutor and students) to provide feedback on the professional behaviour of each student, which is then recorded by the tutor on the assessment form. This process was deliberately mimicked in the web-based instrument, which elicited feedback from students and tutor on the same three categories and items relating to professional behaviour that are included in the paper form.

The study reveals that the number of comments was significantly higher in the web-based group compared with the paper-based group. The quality of the feedback, however, did not parallel the quantitative increase. When considering the aggregated scores on the five feedback criteria no differences in quality of feedback was found between the groups (Table 3). Nevertheless, the feedback provided by the web-based group showed poorer quality in relation to several feedback criteria (e.g. was unrelated to the observed behaviour; data not shown). However, as previously mentioned, the statistical significance revealed by these individual analysis is questionable.

Moreover, the survey results on acceptability, feasibility and usefulness of the instruments were strongly in favour of the paper form. It should be noted that this result might be partly due to technical difficulties that were experienced with the web-based instrument despite adequate technical preparation and extensive tutor and student instruction. However, even when these limitations are taken into account, the web-based instrument did not show an improvement in educational impact compared with the existing method of assessing professional behaviour.

Another striking finding, which is unrelated to the nature of the assessment instrument, was the emphasis on attendance in the feedback relating to the category ‘Dealing with oneself’ and the relative absence of feedback on other aspects of self-functioning. An earlier analysis of 4 years of experience at Maastricht with paper-based assessment of professional behaviour had yielded similar findings (van Mook and van Luijk 2010). This suggests that the context (small group sessions) in the earlier years of medical school may be less suited to stimulate self-reflection (van Mook and van Luijk 2010). Perhaps attendance and time management as measures of responsible behaviour should be evaluated separately from feedback on other aspects of professional behaviour, a suggestion that was also put forward during the plenary discussion at a recent symposium on professionalism (Centre for Excellence in Developing Professionalism 2010; van Mook and van Luijk 2010).

Comparison of the results of this study to results reported in the literature is difficult since few studies on web-based instruments to assess professional behaviour have been published. However, there are some published reports on the development and use of web-based assessment in general (Wheeler et al. 2003; Tabuenca et al. 2007). In one study, implementation of a web-based instrument resulted in a substantial reduction in administration and bureaucracy for course organisers and proved to be a valuable research tool, while students and teachers were overwhelmingly in favour of the new course structure (Wheeler et al. 2003). Another study described a successful multi-institutional validation of a web-based core competency assessment system in surgery (Tabuenca et al. 2007). However, the transferability of these more general studies to web-based self- and peer- assessment of professional behaviour seems limited. Studies addressing the NMBE’s APB program however, report comparable promising results, with improved faculty comfort and self-assessed skill in giving feedback about professionalism as an example (Stark et al. 2008). It seems therefore advisable to conduct further studies to examine the effectiveness and optimal use of web-based assessment of professional behaviour.

Although the literature pertaining to web-based assessment is sparse, the peer assessment literature provides evidence of the importance of anonymity, or at least confidentiality, for the acceptance of peer assessment (Arnold et al. 2005; Shue et al. 2005). That is why we used a semi-anonymous feedback procedure in the web-based assessment in this study. Although reliability can be enhanced by increasing the number of raters (Ramsey et al. 1993; Dannefer et al. 2005), the desired number of raters may not be feasible or acceptable, for example due to time constraints. Consequently, in the current study we limited the number of peer raters to five randomly selected students (Ramsey et al. 1993; Dannefer et al. 2005). It seems reasonable for medical schools to base the selection of peer raters on practical and logistical considerations (Arnold et al. 1981; Arnold and Stern 2006; Lurie, Nofziger et al. 2006a, b), since bias due to rater selection has been shown not to affect peer assessment results(Lurie, Nofziger et al. 2006a, b). Although some anticipated problems could thus be adequately addressed, mention must be made of some remaining limitations of the current study.

Study limitations

In the preparation phase we were confronted with limited availability of tools for web-based multisource feedback. Because re-designing an existing tool proved costly, a for the purpose of this study superfluous feature, (the Likert scales), was left unchanged, and this may have unavoidably influenced the results, for instance those relating to time investment. The content of the web-assisted and paper versions of the instrument was otherwise identical. Furthermore, it cannot be excluded that the results were negatively affected by the participants’ unfamiliarity with web-based assessment instruments, even though the implementation process was carefully prepared based on feedback from a pilot study. Ample time was spent on technical preparation and the participants received information and instruction on multiple occasions. Furthermore, the possibility of omitting redundant comments and concretizing comments before noting them by the tutor, or the more limited space may have contributed to the lower number and/or the higher quality of the comments in the paper based group. Finally, automated data extraction only enabled feedback analysis at the level of the whole year group, although analysis of data at individual (students, tutors) or tutorial group level would have been preferable.

Conclusions

The results revealed that a confidential web-based assessment instrument for professional behaviour yielded a significantly higher number of comments compared to the traditional paper-based assessment. The quality of the feedback obtained by the web-based instrument was comparable as measured by several generally accepted feedback criteria. However, judging by the questionnaire results students strongly favoured the use of the traditional paper-based method. The interpersonal nature of professional behaviour prompted comments that professional behaviour was eminently suitable for ‘en-groupe’, face-to-face discussion and assessment. Although teachers and students are nowadays preferably ‘wired for learning’ it seems that, so far, professional behaviour assessment does not necessarily require the use of advanced assessment technologies, although such new ‘innovative’ electronic and/or web-based assessment methods thus do result in more feedback of comparable quality. Their exact position among the currently used, labour-intensive traditional assessment armamentarium needs to be subject of further study.