1 Introduction

Teamwork is very common in many modern organizations and employees are required to have good teamwork skills in order to be able to perform effectively. Consequently, educational programs in higher education emphasize the need for teamwork skills development (Curşeu et al. 2012). Collaborative learning is the most common educational practice used in higher education to help students develop their teamwork skills (Cohen 1994; Curşeu and Pluut 2013; Dillenbourg 1999). Next to the development of teamwork skills, collaborative learning is an effective instructional method that facilitates the transfer and acquisition of curricular knowledge through social interaction as it reduces the cognitive load on the individual learner (Kirschner et al. 2009).

The key to successful collaborative learning is the quality of social interaction among participants (Järvelä et al. 2008). In various higher education settings, students are organized in groups with varying degrees of interdependence and are asked to perform various collective tasks. These tasks range from case analysis to more elaborated research projects. Through group debates and discussions, students share, analyze, remember and evaluate curricular knowledge while at the same time they exercise and develop their teamwork skills. The collective performance of these student groups is part of the course grading scheme and the benefits of such collaborative group projects have been widely documented (Deeter-Schmelz et al. 2002; Kirschner et al. 2009). It is however important to also understand the factors that influence the quality of group debates and the performance of collaborative learning groups.

Next to the individual knowledge, skills and expertise, collective performance also depends on the quality of interpersonal interactions in collaborative learning groups (Curşeu and Pluut 2013; Kirschner et al. 2009). Systemic group performance models (Gladstein 1984; Hackman and Morris 1975) state that the influence of group composition on group performance is mediated by teamwork quality (including discussion and debate quality). We build on these theoretical (systemic) models to argue that in collaborative learning groups, the quality of group interactions mediates the influence of group composition on group performance.

In educational settings, gender is one of the most commonly used attributes to compose collaborative learning groups (Davies 2009) in order to reflect the class’ demographic characteristics (Webb et al. 1998). Using gender as a criterion for composing collaborative learning groups is a convenient strategy (as gender is a visible attribute that can guide social categorization), yet gender differences cover a wide range of factors that are highly relevant for group functioning. Due to their qualitatively different life experiences, men and women bring to the group a variety of perspectives that will ultimately foster the complexity of the collective understanding of the task (group cognitive complexity, Curşeu et al. 2007). Moreover, gender-related differences in engagement with educational tasks, as well as gender related differences in interpersonal relations (women tend to engage more with educational tasks and to have a stronger relational orientation, than men do) are other relevant factors for group dynamics and performance in student groups. Therefore, our first aim is to test the extent to which the quality of group discussions mediates the association between the proportion of women in groups and group performance.

Next to gender however, motivational factors also influence the way in which students engage with group projects (Järvelä et al. 2008; Kirschner et al. 2009). In order to better understand the relationship between motivation and collaborative learning effectiveness, research needs to pay attention to those motivational attributes that impact both on students’ involvement in educational tasks and activities as well as on the interpersonal interactions that unfold in collaborative learning groups (Järvelä et al. 2008).

Need for cognition (NFC) and core self-evaluations (CSE) are two motivational traits correlated with (cognitive) task involvement as well as with the interpersonal interaction in groups. NFC reflects students’ willingness to engage in cognitive endeavors (Cacioppo and Petty 1982; Cacioppo et al. 1996) and has been documented as an antecedent of teamwork quality (Curşeu and Pluut 2013) and extensive information search (Curşeu 2011) in student groups. CSE influences individual performance across a large variety of jobs and “refers to fundamental appraisals that people make of their own self-worth, competence, and capabilities” (Chang et al. 2012, p. 82). Because CSE has also a positive influence on teamwork processes (Haynie 2012; Tasa et al. 2011) it is a motivational factor that is likely to influence the effectiveness of collaborative learning effectiveness. As NFC stimulates information search in small group settings and openness to diverse viewpoints (Curşeu 2011) and a positive self-appraisal is likely to stimulate engagement in interpersonal interactions, we set out to test the extent to which the quality of interpersonal interactions in collaborative learning groups mediates the association between the two motivational factors and group performance. To summarize, we answer the call for integrative research that investigates the interplay between motivational and social factors as they influence the effectiveness of collaborative learning (Kirschner et al. 2009) and we test the extent to which discussion quality in groups mediates the influence of gender diversity and group motivation on the performance of collaborative learning groups.

2 Hypotheses

The implications of gender diversity for group performance received considerable scholarly interest during the last decades especially due to increasing representation of women in management and their active participation in organizational workgroups (Pearsall et al. 2008). Arguments derived from the gender roles and gender differences literature emphasize that men and women bring different resources into the group and as such gender diversity is beneficial for group dynamics and effectiveness. In particular, women tend to be more socially sensitive than men (Hall 1978), they tend to have a stronger relational orientation due to their heightened communal traits (Abele 2003) and to be more emotionally intelligent than men (Mandell and Pherwani 2003). In collaborative learning settings, men tend to adopt a more confrontational and assertive communication style, while women tend to focus on relationship building and collaboration (Carr et al. 2004). Therefore, during group debates, women are expected to devote more attention than men do to the development and maintenance of harmonious interpersonal interactions. The proportion of women in groups is also positively associated with a positive affective climate within groups (Curşeu et al. 2015) that ultimately fosters the quality of interpersonal interactions in groups. Previous research on group emotions also supports this claim and shows that the percentage of women in groups fosters a positive emotional climate in the group through the emergence of collective emotional intelligence (Curşeu et al. 2015). We therefore expect that in collaborative learning groups, the proportion of women should foster the quality of interpersonal interactions that in turn results in higher group performance.

Literature to date also points towards gender differences in engagement in and satisfaction with educational activities. Women tend to plan and organize better their learning activities, to ask for more teacher support and to be more satisfied with educational activities than men (Gonzalez-Gomez et al. 2012). Moreover, research on the emergence of collective intelligence reports a positive association between the percentage of women in the group and the group’s performance in a variety of cognitive tasks (Woolley et al. 2010). In collaborative learning settings, female-only and balanced gender groups outperformed male-only and male-dominated groups in terms of academic achievements (Zhan et al. 2015). Women tend to value educational achievements more positively than men as educational attainment has a direct association with life satisfaction for women, while only an indirect association, mediated by occupational status for men (del Mar Salinas-Jimenez et al. 2013). These insights suggest that the proportion of women in a group is positively related to the quality of the group discussion, i.e. the group members’ evaluation of the level of effectiveness and satisfaction experienced during group discussions and discussion development (Burgoon et al. 2002; Lowry et al. 2006). In turn, this leads to higher group performance. Therefore, we expect that the proportion of women is positively related to group performance through discussion quality.

Hypothesis 1

Discussion quality mediates the positive relationship between the proportion of women in groups and group performance.

Simply forming student groups and asking them to perform collective tasks does not guarantee that students will engage with the educational task and will work together as a group (Kirschner et al. 2009). Motivational factors are key antecedents for the quality of group debates and ultimately for collective performance. In other words, students need to be motivated to engage both in task related activities as well as interpersonal interactions in order for the collaborative learning groups to be successful. Cognitive motivation reflects one’s inclination to get involved in and enjoy cognitive endeavors (Cacioppo et al. 1996). Need for cognition (NFC), the construct used to capture cognitive motivation (the drive to engage in information processing activities), was extensively explored as an individual difference that predicts a variety of individual and group level outcomes (Cacioppo and Petty 1982; Kearney et al. 2009).

Individuals scoring high on NFC engage in more thorough information searches (Curşeu 2011; Heidar et al. 2013), are more likely to recall information and arguments and generate more alternative solutions to problems (Cacioppo et al. 1996), all processes conducive for high educational attainment. More recent research (Therriault et al. 2015), also shows that NFC also influences the choice of leisure activities, with people scoring high in NFC preferring leisure activities high on cognitive load. Therefore, NFC reflects a general drive of processing and seeking information across various life domains and as such, we expect that NFC also drives the involvement in group discussion.

Previous research supports this claim and shows that individuals with a high NFC are more likely to contribute actively and persuasively in team discussions (Petty et al. 2009), to recognize and recall information during group discussions and are less likely to engage in social loafing (Henningsen and Henningsen 2004). In collaborative learning groups, group NFC has a positive influence on teamwork quality (Curşeu and Pluut 2013). Because people scoring high on NFC are “mature social perceivers” (Levy 1999), they rely less on social stereotypes (Carter et al. 2006) and we can therefore expect that in groups in which the members on average have a high NFC the negative consequences of social categorization processes will be less prevalent. In such groups, people will be more open to and accepting of the points of view expressed during social interactions. We therefore posit that high group NFC is positively related to group performance through discussion quality.

Hypothesis 2

Discussion quality mediates the positive impact of group NFC on group performance.

Students’ motivation to engage in complex cognitive tasks and participate in group discussion is influenced by the general self-evaluation students have about themselves or their core-self evaluations (CSE). CSE as a general appraisal of one’s competence and self-worth (Judge et al. 1998, 2003) or positive self-concept (Judge and Kammeyer-Mueller 2011) is associated with a wide range of work-related outcomes (Chang et al. 2012) as well as team dynamics and effectiveness (Haynie 2012; Tasa et al. 2011; Zhang and Peterson 2011). Individuals with positive core self-evaluation are known to be more motivated to perform and in team settings, core self-evaluations are important drivers of task related behaviors. High team CSE increases task involvement and performance as well as the quality of interpersonal interactions (when agreeableness is high, team CSE is positively associated with the interpersonal behaviors) (Tasa et al. 2011). Moreover, CSE also positively affects team performance if the quality of interpersonal exchanges is high (Haynie 2012). Individuals with a positive self-concept perform better in the presence of others, therefore they are more sensitivity to social facilitation processes (Sanna 1992). Moreover, the meta-analysis of Chang et al. (2012) shows that high levels of CSE are negatively related to counterproductive work behaviors. In other words, individuals with high levels of CSE are not only more likely to perform their task well but are also more likely to contribute to the psychosocial environment within their group. They are more likely to perform teamwork behaviors (Tasa et al. 2011) and as such contribute to a high quality of group discussions. Therefore, we build on previous research that explored the role of team CSE on team dynamics and outcomes (Haynie 2012; Tasa et al. 2011) and argue that that the positive effect of CSE on group performance is mediated by the quality of group discussion.

Hypothesis 3

Discussion quality mediates the positive impact of group CSE on group performance.

3 Methods

3.1 Ethics statement

The data collection for this study started in the academic year 2010–2011 and at the time according to the Dutch ethical guidelines for research involving human participants, studies conducted on educational practices aimed at knowledge acquisition and using surveys that do not require any personal data with the potential to embarrass the participants were exempt from the IRB approval. Our study was carried out as part of course related activities and no foreseeable risks beyond those present in regular educational activities were anticipated, therefore we did not ask for further approval from the local IRB. Students were informed that their questionniare answers will be used for scientific research and were offered the possibility to opt out if they wished so.

3.2 Sample and procedure

Three hundred seventy-five first year students (244 women) enrolled in an introductory course at a Dutch University participated in the study. Data were collected across two academic years of the same course and the students were randomly assigned to 118 small groups (average group size 3.18) and they were asked to work on an assignment together for 8 weeks. Each group was required to present their work in the form of a poster presentation in week 7 and write a report that was graded. This assignment was part of their regular curricular activities in the course. Students were asked to fill in several questionnaires with questions related to the demographic information as well as items such as need for cognition and core-self-evaluation (week 1), group discussion quality (week 5). Group performance was evaluated at the end of the study unit (week 8).

3.3 Measures

3.3.1 Group performance

In order to measure group performance the written group report was graded by an evaluator on an interval scale from one to ten (the evaluation was based on the Dutch ten points grading system, 1 = very bad to 10 = outstanding). Students were asked to write a report in which they had to use a theory driven approach for comparing two to four organizations of their choice. The report evaluation was based on five criteria: clear description of the organizational characteristics (10%), clear description of the organizational structure (30%), clear description of the organizational environment (30%), clear comparison of the selected organizations based on their structure and environment and (20%), to the form and style of the written report (10%).

3.3.2 Discussion quality

We have used the Discussion Quality Scale from Davison (1999) to evaluate the quality of group discussions. Group members were asked to rate on a semantic differential scale with 10 intervals (1 to 10): (1) the meaningfulness of meetings (meaningful to meaningless); (2) the appropriateness of meetings (appropriate to inappropriate topics in relation to the group assignment); (3) the open nature of the meetings (open to closed) and 4) the level of imagination in meetings (imaginative to unimaginative). The Cronbach’s alpha for the scale was .777 and the within group agreement index (James et al. 1993) ranges from .86 to 1.00 (M = .96, SD = .02) showing substantial within agreement to support the group level aggregation.

3.3.3 Proportion of women in the group

In line with the arguments presented in Williams and Mean (2004) we used the proportion of women in the group as an index for gender diversity. This measure is suitable to evaluate the effect of gender differences because it captures any nature of the effect and allows for an unbiased examination of the data (Williams and Mean 2004, p. 466). The proportion of women in the group was computed by dividing the number of women in each group by group size. Therefore, this measure varies between 0 and 1, the larger the score, the more dominant the women specific attributes in each group.

3.3.4 Need for cognition

Need for cognition was evaluated using the eighteen item scale of Cacioppo and Petty (1982) using a 5-point Likert scale (1 = totally disagree, 5 = totally agree). Examles of items are: “I would prefer complex to simple problems” or “The notion of thinking abstractly is appealing to me”. The Cronbach’s alpha for the scale was .816 and because previous research argued that the scale is multidimensional and only the dominant (trait) factor is indicative of cognitive motivation (Bors et al. 2006; Cacioppo et al. 1996; Sadowski 1993), we have used the first (dominant) factor score as indicator for NFC. Similar with previous studies (for a comprehensive overview see Cacioppo et al. 1996), our results indicate that the dominant (trait) factor score for the NFC scale accounts for around 26% of the score variance (all items loaded significantly on this first dominant factor), while the second factor only accounts for 7.3% of the variance and similar to the results reported by Bors et al. (2006) all negatively phrased items loaded on this second factor. Individual factor scores were then averaged and aggregated at the group level to obtain the group cognitive motivation. As our study focused on the elevation of cognitive motivation within groups and not on averaging individual evaluations of group climate or other processes/emergent states we did not deem it necessary to compute agreement indices before aggregation.

3.3.5 Core self-evaluations

For core self-evaluations we used the Dutch version (De Pater, Schinkel, and Nijstad 2007) of the scale developed by Judge et al. (2003). We used nine items from the original scale that measured the self-esteem, locus of control and emotional stability. Examples of items include: “I am confident I get the success I deserve in life” and “Sometimes when I fail I feel worthless” (reversed). Because the scale is multidimensional we have used the first dominant factor score as the indicator for CSE. In line with previous research (Bono and Judge 2003; Chang et al. 2012; De Pater et al. 2007; Erez and Judge 2001), the CSE trait factor accounted for the most variance in the scores (34.23%) and all nine items loaded significantly on this factor. Answers were recorded on a 5-point Likert scale (1 = totally disagree, 5 = totally agree and the Cronbach’s alpha for the scale was .748. Similar to NFC, individual scores were aggregated at the group level by using the group mean as indicator of CSE elevation within groups. As our study focused on the elevation of general self-evaluation within groups and not on averaging individual evaluations of group climate or other processes/emergent states we did not deem it necessary to compute agreement indices before aggregation.

We used group size as a control variable since earlier research has shown that group size influences group coordination and information exchange (Lowry et al. 2006).

4 Results

Table 1 presents the descriptive statistics and the correlations for the variables included in our analyses.

Table 1 Means, standard deviations, and correlations

In order to test our three hypotheses, we used a mediation procedure presented in Preacher and Hayes (2008). For each mediation test, we used as covariates all the other predictors hypothesized. The bootstrapping results reveal a significant indirect association between the proportion of women and group performance mediated by the quality of group discussion (Indirect effect size = .15; SE = .07, CIlow = .03; CIhigh = .35), and because the confidence interval does not include zero, we can conclude that the first hypothesis was supported. However, the direct effect of proportion of women on group performance is significant (B = .56, SE = .23, p = .02). Therefore we can conclude that the quality of discussions is a partial mediator of the relationship between proportion of women and group performance.

For the second hypothesis we used a similar analytical procedure as for the first hypothesis and the results show a significant indirect association between group mean NFC and group performance mediated by the quality of group discussion (Indirect effect size = .05; SE = .03, CIlow = .003; CIhigh = .13) supporting the second hypothesis. As the direct effect of group mean NFC on group performance is not significant after adding the mediator (B = − .14, SE = .12, p = .22) we can conclude that the quality of group discussion fully mediates the influence of group mean NFC on group performance.

Finally, the bootstrapping results support the third hypothesis and reveal a significant indirect association between CSE and group performance, mediated by the quality of group discussion (Indirect effect size = .05; SE = .03, CIlow = .004; CIhigh = .15). The group mean CSE have no significant direct effect on group performance (B = − .06, SE = .11, p = .56), therefore we can also conclude that the quality of group discussion fully mediates the influence of group mean CSE on group performance.

In order to check the robustness of our findings, we have tested the mediation hypotheses without any of the covariates used in the first set of tests. The results of the simple mediation tests, fully support the mediation hypotheses as reported above as none of the confidence intervals for the indirect effects includes zero.

Finally, in order to account for potential covariances between the independent variables included in the separate mediation analyses, we ran a Structural Equation Model using the AMOS version 22. SEM is a versatile analytic technique as it allows for the simultaneous test of multiple causal paths and it offers absolute and incremental fit indices (Tomarken and Waller 2005). The results of the SEM analysis are summarized in Fig. 1.

Fig. 1
figure 1

The results of the overall path model. Notes Standardized path coefficients are presented in the final model (**p < .01, and *p < .05); NFC = main factor score for the Need for Cognition scale; CSE—main factor aggregated score for the CSE scale. Fit indices: Chi square = 5.18, df = 5, p = .39, CFI = .99, TLI = .97, RMSEA = .018

As illustrated by the absolute fit indices, the theoretical model test is not significantly different from the data and the incremental fit indices show that the model cannot be substantially improved, therefore we can conclude that the SEM results fully support the results of the bootstrapping analyses.

5 Discussion

Group discussion is the process through which students share insights, remember, evaluate and process knowledge during collaborative learning. In our study, we took a group level perspective on collaborative learning and we aimed to identify relevant antecedents for discussion quality in collaborative learning groups. Building on the insights from systemic models of group performance, we focused on testing an integrated model in which discussion quality mediates the impact of group diversity and group motivation on collaborative learning effectiveness. We argued that the proportion of women in groups, group level need for cognition and core self-evaluations are important predictors of discussion quality, which in turn predicts group performance in collaborative learning. All mediation claims were supported by the analyses and we show that discussion quality is an important mediator in the relationship between collaborative group design features and group performance.

The proportion of women in the group has both a mediated as well as a direct association with group performance. Due to their relational orientation women stimulate harmonious interpersonal interactions in groups. Next to the mediated effect, the proportion of women in the group had also a direct positive association with group performance. This direct association shows that next to the relational dimension, captured by the quality of group discussion, another (probably cognitive) mechanism is at play. Our results are in line with previous research on collective intelligence, showing that the proportion of women in groups is a strong positive predictor of collective intelligence both in face to face as well as computer mediated groups (Woolley et al. 2015). Collective intelligence could therefore be the factor that explains the direct association between the proportion of women in the group and group performance in collaborative learning groups. Future research could further explore this claim and test the extent to which collective intelligence mediates the impact of proportion of women on group performance in collaborative learning.

The two motivational factors that we included in our research impact both on students’ engagement in the educational tasks as well as on the quality of interpersonal interactions in collaborative learning groups. The association between group level need for cognition (NFC) and group performance is mediated by the quality of group discussions. This result adds to the existing evidence showing that NFC is a relevant attribute for the design of collaborative learning groups (Curşeu and Pluut 2013). Groups with higher levels of NFC, will have higher quality discussions that will result in higher performance. Core self-evaluations (CSE) are also important drivers of collective performance in collaborative learning as they influence task engagement and they stimulate teamwork behaviors. If students hold positive self-evaluations, they also generate better group discussions increasing the depth information processing in groups and ultimately group performance.

The positive effect of gender diversity has also some clear practical implications. Since composing groups in terms of gender is convenient, teachers could use this to stimulate the collaborative learning process. As the quality of group discussions positively predicts group performance, educators using collaborative learning groups should focus on facilitating group debates. Group trainings or the use of group norms are easy ways of creating a context that facilitate positive group interactions. However, as group composition in need for cognition and core self-evaluations are not always open to manipulation, other factors that could influence the quality of interpersonal interactions in groups have to be explored as well.

Our study has a few limitations. First, is not an experimental study, therefore causal claims are not warranted because we did not directly manipulate the independent variables included in the model. The gender diversity variable could be considered as a quasi-manipulation, as is gender in general in psychological research. However no causal claims can be made concerning the other motivational factors included in the study. Second, some of the variables used in our model were collected from the same source, therefore our results are likely to be influenced by common method bias. In order to correct the common method bias, group performance was evaluated by an external rater, yet the independent and mediating variables were based on self-reports and although the evaluations were separated in time, the existence of common method bias cannot be fully excluded. Finally, because the study was carried out in an introductory first year course, it was not possible to control for individual academic performance that is likely to be correlated with the collaborative performance in student groups.

6 Conclusion

Our study shows that the quality of group discussions is an important antecedent of performance in collaborative learning groups. Further, we explored two types of group composition variables that drive the quality of group discussion: a social/demographic factor (gender diversity) and two motivational factors (need for cognition and core self-evaluations). The results of the study support the idea that an integrative research approach including different factors related to collaborative learning effectiveness (Kirschner et al. 2009) is highly relevant and needed to further extend research on collaborative learning groups.