Introduction

Problem-Based Learning (PBL) is an instructional method originating from the active learning perspective, allocating the responsibility of learning on learners. PBL is also a constructivist and collaborative learning environment, meaning that people learn by giving meaning to experiences and interactions with others (Savery and Duffy 1996). As its name indicates, this instructional method starts with an authentic problem that is relevant for the future practice of the profession. Students meet each other twice a week, during which they work together in small groups of approximately eight to 10 students. During the first meeting, students start with activating their prior knowledge about the given problem. Next, the problem is analyzed, possible explanations are generated and learning issues are formulated. After these first steps, students individually study these learning issues and search for relevant information. During the second meeting students discuss and synthesize their findings, which should lead to a deeper understanding of the problem. In short, PBL consists of three phases: problem analysis in the tutorial group, individual self-study, and (re)structuring of the newly acquired knowledge in collaboration with the other group members (Barrows 1988; Schmidt 1993). This whole process is facilitated by a tutor, who monitors and if necessary improves the quality of the discussion in the group. The quality of the discussion within the tutorial group seems to play a crucial role in the success of PBL (Savery and Duffy 1996; Gijselaers and Schmidt 1989). Tutorial groups that are perceived as well functioning by the students within the group, achieve better (Gijselaers and Schmidt 1989). At the same time it is still unclear what the exact relationship is between students’ contributions to the group discussion and achievement.

Collaborative learning

PBL can be seen as a type of collaborative learning environment where students work together on a common task (Dolmans et al. 2005). According to this learning principle learning increases when students share common goals, are dependent on and accountable to each other (Johnson et al. 2007; Van der Linden et al. 2000). The factors that affect the effectiveness of different types of collaborative learning environments have been studied extensively over the years. In a recent study, Van den Bossche et al. (2006), developed a model of team effectiveness. Participants in this study were enrolled in higher business education and worked together on an assignment for a period of 7 weeks. According to this model team effectiveness is affected by both cognitive and social factors. The cognitive factors that contribute to the effectiveness are referred to as team learning behavior and include construction, co-construction and constructive conflict. This implies that within a group different viewpoints are articulated and discussed, and a common understanding is negotiated (Van den Bossche et al. 2006). Teams in which these three types of cognitive behavior take place reported a higher shared cognition [i.e., “shared conception of the problem” (Van den Bossche et al. 2006, p. 492)] within the group. This, in turn, seemed to enhance the perceived team effectiveness. Furthermore, the occurrence of these team learning behaviours depends on different social factors. Students have to be committed towards the task, to experience a sense of interdependence, feel save to express their ideas and belief that their group can work in an effective way (Van den Bossche et al. 2006). If these conditions are met team learning behaviour will more likely occur. Decuyper et al. (2010) developed an even more extensive model of team learning. In this model they also demonstrated that shared cognition, and therefore successful collaborative learning, arises from co-construction and constructive conflicts within the group. But according to this model, as opposed to the model of Van den Bossche et al. (2006), the occurrence of co-construction and constructive conflicts is not only affected by (social) variables at the team level, but also by variables at the individual level. At this individual level they distinguish individual characteristics and individual behavior of team members as important input variables for successful collaborative learning. Examples of such individual variables are: motivation, appreciation of team work, construction, responsibility and participation.

Effectiveness of PBL

The effect of PBL on learning has been subjected extensively to study (e.g., Albanese and Mitchell 1993; Colliver 2000; Dochy et al. 2003; Schmidt et al. 2009a), of which the majority has focused on medical education. Research on effectiveness of PBL has not so much focused on the influencing factors, but more on the comparison of PBL and conventional curricula. This particular learning environment seems to facilitate the development of a flexible knowledge base within students and the development of problem-solving skills and self-directed learning skills in medical education (Hmelo-Silver 2004). Dochy et al. (2003) concluded that students enrolled in medical PBL curricula are better at the application of knowledge. Recently, Schmidt et al. (2009b) also found that medical curricula that promote active learning through small group instruction have higher graduation rates and lower study duration, compared to the more conventional curricula that emphasize direct instruction. Another recent Dutch study of Schmidt et al. (2009a) summarized studies in which a single well-established PBL medical curriculum was compared to several conventional medical curricula. This meta-analysis indicated that students graduating from this single PBL curriculum have better interpersonal and practical medical skills than the students from the more conventional, teacher centered, curricula. Medical knowledge and diagnostic reasoning was also slightly better in the PBL curriculum, but these effects were only small. In addition, the drop-out rate was lower and the graduation time was shorter in this curriculum.

Group functioning

Although PBL seems to have a beneficiary effect on student learning, it has also become evident that a high quality group discussion is conditional to the success of PBL (Cohen 1994; De Grave et al. 1996; Savery and Duffy 1996). Only within well functioning PBL groups, cognitive conflicts (i.e., conflicting ideas within the group) will arise and new knowledge will be constructed and tested through negotiation, leading to true constructivist learning (Savery and Duffy 1996). The quality of interactions in tutorial groups is perceived by students as an important predictor for the productivity of a tutorial group (Dolmans et al. 1998). This is in line with studies on collaborative learning environments other than PBL (e.g., Decuyper et al. 2010; Van den Bossche et al. 2006). Merely putting students together in a group with a common task does not automatically lead to successful learning. Within collaborative learning groups such as PBL tutorial groups, different factors can impede successful group functioning. In their model Decuyper et al. (2010) list the most mentioned factors in literature. One of these factors is that group members tend to take less responsibility because they can rely on their group members to do the work. This so-called ‘free riding’ has also been seen in PBL tutorial groups and students have indicated that they believe this has a negative effect on group functioning (Dolmans et al. 1998). The same is true for overly dominant students, because this might restrain other group members from giving their opinion. Another problem in tutorial groups is that students sometimes avoid constructive conflicts (Moust et al. 2005). Students tend to merely state the undifferentiated main issues that resulted from their self-study, without explicitly discussing differences in viewpoint between students or sources. This closely resembles one of the impeding factors that Decuyper et al. (2010) called ‘group thinking’.

Causal models have demonstrated to be a very suitable tool to further elucidate the relationship between group functioning and achievement. Back in 1989, Gijselaers and Schmidt developed a causal model of learning in a PBL environment in which group functioning played a crucial role. In their model group functioning was influenced by the degree to which a student’s prior knowledge linked up with the subject-matter, the quality of the problems and the tutor performance. Students in high functioning groups spent more time on self-study and consequently achieved better on a test. In addition, well functioning tutorial groups had a positive influence on students’ intrinsic interest in the subject-matter. Van Berkel and Schmidt (2000) added another variable to this model, namely group attendance which represented the commitment or willingness of students to engage in PBL. They found that attendance had a mediating effect between group functioning and achievement. Effective tutorial groups had higher attendances, which consequently led to higher achievements. In addition, the results indicated that the better the attendance is in a tutorial group, the less time for self-study is needed.

The weakness of these studies is that they treat group functioning as a single variable, leaving it unclear which specific student activities and group processes are conditional for effective group functioning and consequently for student achievement. In others words, these causal models do not differentiate, by indicating which student activities in the tutorial group are of central importance. Although the functioning of the tutorial group seems to steer learning, it remains unclear which student activities are crucial for good learning achievements (Dolmans and Schmidt 2006; Hak and Maguire 2000). Hak and Maguire (2000, p. 1) refer to it as “the black box of studies on PBL”.

Student activities

In order to unravel the relationship between tutorial student activities and achievement, we first have to take a look at what is known about students’ tutorial behavior and tutorial group processes. According to Slavin et al. (2003), there are three major perspectives on the effectiveness of learning in small groups. The first is the cognitive perspective, which comes down to the notion that students learn through interaction with peers and through elaboration (e.g., by explaining in own words or asking critical questions) on the subject-matter. Observational studies have demonstrated that students in tutorial groups display these cognitive activities such as summarizing, asking each other critical questions and correcting misconceptions (Visschers-Pleijers et al. 2006; Yew and Schmidt 2009). Within a group these activities need to occur in order to develop shared cognition (Van den Bossche et al. 2006). The second perspective is the collaborative perspective, meaning that the degree of cohesiveness within a group is thought to have a positive influence on achievement. Van der Linden et al. (2000) state that, in order for collaborative learning to be successful, students working together within groups should be mutually dependent, should share a common goal and should share responsibilities. Yew and Schmidt (2009) identified these collaborative processes in tutorial groups, for instance in the form of sharing information. The third and last perspective on the effectiveness of learning in small groups is the motivational perspective. According to this perspective motivation to complete the learning task at hand prompts students to actively contribute to the group discussion (Slavin et al. 2003). Situational interest, a somewhat related type of motivation, is mentioned by Hidi and Renninger (2006). They refer to it as the willingness to engage, which springs from an interaction between a person and a specific content or even a specific learning environment. Hidi and Renninger (2006) suggest that academic achievement is positively affected by situational interest of students.

Self-study time

Since PBL intends learning to be self-directed (Dolmans et al. 2005), time spent on self-study also takes up an important position within the PBL process. Based on the model of Gijselaers and Schmidt (1989), which was discussed earlier, time spent on self-study is thought to play a mediating role between group functioning and student achievement. Van den Hurk et al. (1999) found that students in PBL tutorial groups were of the opinion that more useful learning issues were formulated when they participated in a well functioning group. The same students also indicated that they used these learning issues as a starting point for self-study. The relation between self-study time and achievement has been extensively investigated. In their review, Frederick and Wahlberg (1980) found that time spent on learning is positively related to achievement, but they also argue that this relation is influenced by the quality of instruction and student ability. Van den Hurk et al. (1998) also suggest that there is a relation between time spent on self-study and achievement but that it may not be as straightforward as one would expect. They investigated the relation between time spent on self-study and achievement in a medical PBL curriculum. The results indicated that there was no significant relation between self-study time and achievement scores for the first and second year students, but that further research is needed into the influence of different qualitative factors. For instance, there is no insight in the influence of group processes within the tutorial group on self-study time.

Peer evaluations

Research on student performance in PBL tutorials is often based on students’ self-evaluation and the validity and reliability of self-evaluation is not optimal (e.g., Eva 2001). Yew and Schmidt (2009) also argued that data gathered from observations instead of self-evaluations is preferable, because it allows you to be as close to the learning activities as possible. In this study we tried to get an even closer look at the contributions of students, by using peer observations. Peers might have a more accurate picture of the how students perform during tutorial meetings. There are two possible explanations for this (Eva 2001). First, students see their peers intensively (twice a week for period of 8 weeks) and second, each student is evaluated by multiple peers. Therefore one judgment about an individual student consists of multiple evaluations.

In a previous study (Kamp et al. 2011) the Maastricht Peer Activity Rating Scale (M-PARS) was developed with which students can evaluate three main aspects of tutorial peer activity: a peer’s constructive, collaborative and motivational contributions to the tutorial group. This study also demonstrated that, students are able to evaluate their peers on these three aspects in a reliable and valid way and that, only four evaluations per student are needed for a reliable judgment.

The current study

Although it can be concluded from the previous paragraph that group functioning is affected by cognitive, collaborative and motivational activities, it is still unknown how these activities relate to student achievement. In this study we will, therefore, focus on specific student behavior that positively influences student achievement. The mediating effect of self-study time on achievement was already mentioned and therefore it will also be taken into account in this study. The central research question that is addressed in this study is: ‘Do students who perform better in tutorial groups (i.e., display more constructive, collaborative and motivational activities according to their peers) also spend more time on self-study and subsequently achieve better or, in other words, does tutorial performance have any predictive value with regard to self-study time and subsequent achievement?’ Based on the causal model of Gijselaers and Schmidt (1989) it is hypothesized that the student’s activities have an indirect effect on achievement on the unit test and the group assignment and that this effect is mediated by the time students spend on self-study (Fig. 1).

Fig. 1
figure 1

Hypothesized model of the current study

Method

Participants

Participants were 650 first- and second-year students attending the medical curriculum (PBL) at Maastricht University during the academic year 2009–2010. These students were divided into 62 tutorial groups, each group consisting of approximately 10 students. Students met each other twice a week for a 2-h session over a period of six (first year students) to 10 (second year students) weeks. During these meetings students analyzed the problem that was presented to them and discussed and restructured the information they gathered during self-study collaboratively.

Instruments

Students’ activities were measured with the Maastricht Peer Activity Rating Scale (M-PARS) (Kamp et al. 2011). This 14-item rating scale consists of three subscales (See Appendix). The first subscale (5 items), constructive activities, measures activities that promote the development of shared cognition (e.g., This student corrected misconceptions about the subject matter). The second subscale (5 items), collaborative activities, measures activities that contribute to the social aspect of collaboration (e.g., This student promoted collaboration between group members). The last subscale (4 items), motivational activities, measures activities that give the impression the student is motivated for the group work (e.g., This student demonstrated to be motivated). With this scale students can evaluate the performance of their peers in the tutorial group by responding to these items on a Likert-scale ranging from 1 (strongly disagree) to 5 (strongly agree). Students had to also rate the overall performance of their peers in the tutorial group on a scale from 1 (very poor) to 10 (excellent). Students were asked to fill out the M-PARS during the last week of the unit. Scores on the three types of activities were obtained by averaging all ratings from the same student, resulting in an average score (ranging from 1 to 5) per type of activity.

Student achievement was measured in twofold with a group assignment and a unit test at the end of the unit. For the group assignment students had to collaboratively prepare and give a presentation about a unit content related subject to their peers. The unit test consisted of 75 (for first year students) and 125 (for second year students) true–false questions which corresponded with the content of the unit. Both the grade of the test as well as the grade of the assignment was transformed to a scale ranging from 1 to 10. With regard to the final unit grade, the grade on the test had a higher weight (80%) than the grade on the group assignment (20%).

Self-study time was measured at the end of the course, before the unit test. Students were asked to estimate the average number of hours per week they spent on self-study during this unit. Moust (1993) already proved the validity of this method. He found a strong correlation between this post hoc self-estimation and a log method, were students had to systematically record the time they had spent on self-study.

Procedure

At the beginning of the course students were informed about the procedure of this study. The M-PARS and the self-study time question were administered to the students during a tutorial meeting in the last week of the course. Students received an envelope with the self-study time question and approximately nine rating scales, one to be completed for each group member. Students were ensured that the data would be processed in a confidential manner and would not be reported back to their peers. The rating scales and the self-study time question were collected a couple days later during the next tutorial meeting. Twenty gift certificates were raffled among the participants who completed all items.

Since the aim of this study was to investigate the causal relations between student activities, self-study time and achievement, a structural equation modeling (SEM) analysis was undertaken using the AMOS statistical program (Version 17.0). SEM is a widely accepted statistical technique with which a theoretical model can be confirmed. As long as it’s used for confirmatory use and not exploratory and as long as the model is based on substantial theory, it can be used to test the hypothesized underlying structural and causal relationships (see Violato and Hecker 2007). However, as Violata and Hecker also indicate SEM can only provide evidence and not proof for causality, and it can never rule out any explanatory factors.

Model fit will be determined using the following fit indices: Chi-square (CMIN), RMSEA, and NFI. Good fit is indicated by CMIN/DF <3 (CMIN divided by the degrees of freedom) with a significant p-value, RMSEA ≤.50, and NFI >.95 (Garson 2009).

Results

A total of 375 students completed the rating scales. This is a response rate of 61%. A total of 538 students were evaluated by four or more of their peers in the tutorial group. Inspection of the histograms, the skewness and the kurtosis of the variables, led to the removal of five outliers. This resulted in a normal distribution for all variables (data may be assumed to be normal if skew and kurtosis is within the range of ±1.5 (Schumacker and Lomax 2004: 69)]. Multivariate normality was checked by inspecting the Mahalanobis distance values. This led to the removal of six more outliers, resulting in a sample size of 527. Table 1 shows the descriptives of the variables involved in this study. This table illustrates that the average achievement scores range from 3.8 to 9.5 (scale 1–10), with standard deviations of 0.72 and 0.89. Time spent on self-study varies between one and 36 h, with a standard deviation of 5.96. The mean scores on the three factors of the M-PARS range from 1.6 to 4.9, with standard deviations varying between 0.51 and 0.60. As can be seen in this table, skew and kurtosis are within the acceptable range.

Table 1 Descriptives of the study variables

Since the purpose of this study was to test the relations between the three types of activities, time spent on self-study and achievement the model depicted in Fig. 1 was tested in AMOS, using maximum likelihood as estimation method. The fit indices of the initial model (Fig. 1) indicated a poor fit with the data. The modification indices for the hypothesized relationship between constructive activities and achievement on the unit test, between collaborative activities and achievement on the group assignment and, between achievement on the unit test and achievement on the group assignment were high, indicating that adding these three relationships would significantly improve model fit. Therefore, these three pathways were added. The pathway between self-study time and achievement on the group assignment was dropped because it did not contribute to a better fit of the model. This resulted in the model in Fig. 2, which illustrates a significant relation between a student’s constructive activities and his/her score on the unit test, between a student’s collaborative activities and his/her score on the group assignment and between a student’s score on the group assignment and his/her score on the unit test.

Fig. 2
figure 2

Path diagram with standardized regression weights and percentages of explained variance. * p < .05; ** p < .01; *** p < .001

Fit indices

The model depicted in Fig. 2 generated the following fit indices in AMOS: Chi-square [5df] = 5.167, p = 0.396, a root mean square error of approximination of 0.008 and a normed fit index of 0.997 (Table 2). These results suggest good model fit. In order to test the stability of the model, the model was cross-validated by splitting the data set into two random sub-sets. The fit indices of both data sub-sets are also shown in Table 2 and are similar to the fit indices of the total data-set.

Table 2 Fit indices for the total data set and for the two random sub-sets

Discussion

The purpose of this study was to investigate the relationship between the contributions students make to the tutorial group process observed by their peers, self-study time and achievement. Therefore, the M-PARS was administrated to a large group of first and second year students attending a bachelor in medicine. With this rating scale students had to evaluate the constructive, collaborative and motivational contributions of their peers during the tutorial meetings. In a theoretically substantiated model the relation between these three activities, average hours of self-study time per week and achievement scores were studied. It was hypothesized that the three types of activities would have an indirect effect on the achievement scores of the students. This effect was expected to be mediated by time spend on self-study. Based on the analyses and results there is evidence for the relationships between the three types of activities and self-study time and achievement. This implies there is not only a relationship between social and cognitive factors and successful collaborative learning on the team level (e.g., Van den Bossche et al. 2006), but also on the individual level between individual contributions and individual achievement. This is in line with the model on team learning by Decuyper et al. (2010), which already emphasized the importance of individual behavior in groups in order for collaborative learning to be successful. This study indicated that within groups these inputs at the individual level, may not only promote shared cognition but also affect the individual achievement of a student while learning within a group. More specifically, the results indicated a strong relationship between a student’s constructive contributions within the tutorial group and the achievement score on the unit test. In other words, this implies that students who, according to their peers, ask critical questions, are able to correct misconceptions and can distinguish main from lateral issues, achieve better on the unit test than their peers who make less constructive contributions within the tutorial group. On the group level, Van den Bossche et al. (2006) already demonstrated a similar relationship; teams who display more learning behaviors (e.g., construction) experienced an increased sense of shared cognition and perceived themselves as more effective. In addition, the results indicated an effect of a student’s collaborative contributions within the group on their achievement score of the group assignment. Students who are perceived by their peers as more committed to the group and promote collaboration within the group achieve better on the group assignment than students who are perceived by their peers as less committed to the group. This also confirms previous research which showed that successful group learning is, in addition to constructive learning behavior, affected by social factors within the group (Johnson et al. 2007; Van den Bossche et al. 2006). There also was a small effect of achievement on the group assignment on achievement on the unit test. When one assumes that the achievement score represents the extent to which students have learned about the subject-matter, it is plausible that a good score on the group assignment will predict a good score on the following unit test. One could argue that the effect of group assignment achievement on unit test achievement is expected to be larger, but it is also very plausible that both tests address different aspects. The unit test is expected to rather measure on factual knowledge and the group assignment more on collaborative, social and presentation skills. What was indeed surprising was that there was no mediating effect of time spend on self-study. Self-study time appeared not to be affected by the contributions students made in the tutorial group, and it showed no effect on the achievement scores of students. A possible explanation for the absence of this effect can be found in an article by Plant et al. (2005). They argue that learning achievements are not so much influenced by the amount of time a student spends on self-study, but by the manner in which he or she spends this time. Students can spend a lot of time on individual study activities but if they do not use this time effectively it will not represent the extent to which they have learned about the subject matter. Van den Hurk et al. (1998) also suggest that achievement is not so much influenced by the amount of self-study time, but more by the manner in which they spend this time and the individual learning needs of students.

Although the results provide evidence for the causal relationships between the variables, it is impossible to rule out any underlying exploratory factors which have influenced these relationships. One of these underlying factors might be that students who are active in the PBL tutorial also achieve better because they have a more engaged and motivated attitude towards their education and as a consequence invest more into their education.

What would be the implications of these findings for future research in area of PBL? Since a relation was found between a student’s contributions in the tutorial group and his or her achievement scores, it is important to monitor the quality of the contributions each individual student makes during tutorial group meetings. To that end, it is worthwhile to investigate whether or not the M-PARS could be used as a feedback tool for student’s tutorial behavior. This tool could then be used by students to provide peer feedback and subsequently improve their tutorial group contributions. As a next step it would be interesting to investigate whether or not this has a positive effect on their achievement. But since we have argued that there might be a discrepancy between what is addressed in the tutorial group and what is measured with the test, it would be a good idea to look for other methods to measure learning outcomes. Another interesting idea for future research is to repeat this study on the level of the whole tutorial group. Since the success of PBL depends on the quality of the collaborative process within the tutorial groups and since the behavior of one group member is heavily influenced by the behavior of other group members, it would be of added value to discover if well contributing groups achieve better than poor contributing groups.

A first limitation of this study is that the relations between the variables in the model are not very strong (with the exception of the one between a student’s constructive contributions and the score on the unit test) and, consequently, that the model only explains a small percentage of the variance of the variables. There are two possible extenuating explanations for this result. First of all, the relation between the discussion in the tutorial group and the unit test or group assignment might not be as straightforward as is expected. A test might not always completely agree with was discussed. In addition, the unit test in these two units measure mainly factual knowledge and might therefore be less suitable for the detection of deep learning which is the intended and expected effect of PBL. In others words, there might be a discrepancy between what is addressed in the tutorial group and what is measured with the test. Second of all, the population of medical students in the Netherlands is highly homogeneous. The greater part of the students have had the same preparatory training in secondary school and have an above average mean grade for their final exams in secondary school. In addition the students are, on the whole, highly motivated. Because of these two characteristics, the variances between our participants might be small. Therefore it can be perceived as a positive result that, despite the small differences within the population, a fitting model was found.

Another limitation is that the score on the group assignment in this study is a group score, meaning that there was no individual grade for each individual group member, but instead that all four group members received the same grade. Students who contributed poorly to this assignment could, therefore, have received a higher grade than they would have if their individual contributions would have been graded.

A last limitation of this study is the high correlation between the three types of activities measured with the M-PARS. These correlations could indicate that we are dealing with only one type of student tutorial activity. However, a confirmatory factor analysis in a previous study (Kamp et al. 2011) has proved that there are indeed three different factors with regard to the three types of tutorial student activity. Furthermore, in the model one factor (constructive activities) predicts the score on the multiple choice unit test and the other factor (collaborative activities) predicts the score on the group assignment.