Introduction

Collaborative Learning: The Strengths and Challenges

Collaborative learning (CL) refers to a small group of students performing collective learning activities (e.g., writing a report) with the aim of achieving specific goals (Cohen, 1994; Dillenbourg, 1999). Since CL has been linked with increases in academic achievement (Kollöffel et al., 2011; Le et al., 2018), collaborative and social skills (Ku et al., 2013), and higher-order thinking (Chen et al., 2018), it is used in universities around the world (Echeverria et al., 2022; Johnson & Johnson, 2009). In China, CL is popular not only because of the aforementioned benefits, but also due to the collectivist culture of the country—particularly the belief that people working together can achieve more (or better) than an individual toiling alone (Gong & Cheng, 2020; Phuong-Mai et al., 2005).

Students’ CL achievement is associated with their level of engaging (Dillenbourg, 1999; Webb, 2013; Xu et al., 2020), however, disengagement is a common problem with (Le et al., 2018; Sinha et al., 2015; Webb, 2013). Such an issue has become a focus of academic concern and social interest (Jiang, 2011; Xu, 2018). Chen and Bennett (2012) found that students complained about the low peer engagement in online discussions and cohesive learning communities were infrequently formed. The official Chinese Youth WeChat Channel (2022) published an article—“What freaks university students out? A group homework will do”—suggesting that group assignments always irritate university students. Followers (mainly university students) commented that their peers’ low engagement was the most difficult problem with CL. Due to the vast number of Chinese university students (around 21 million undergraduate and master students in 2020; National Bureau of Statistics of China, n.d.), this article argues that more attention needs to be placed on their engagement in the widely-used CL.

Student Engagement

Student engagement has been discussed for decades because of its strong association with learning outcomes (Astin, 1984; Kahu, 2013; Skinner & Pitzer, 2012; Trowler, 2010). The present research is rooted in the psychological approach to exploring engagement, particularly Fredricks et al. (2004) tripartite definition that includes: (1) behavioural engagement (BE), which incorporates positive conduct (e.g., following the rules) and involvement in learning and tasks (e.g., effort and attention); (2) cognitive engagement (CE), which focuses on students’ inner psychological investment in learning and being strategic; and (3) emotional engagement (EE), which refers to affective reactions to learning, including students’ academic feelings (e.g., enjoyment, Pekrun et al., 2011), interest, and sense of belonging in the learning community.

According to Skinner and Pitzer (2012), student engagement has been investigated at four nested levels/contexts: prosocial, school/university, course and learning activity levels. Although a few studies have examined engagement across different levels and explored the collaborative impact of parents, teachers, and peers (e.g., Skinner et al., 2022; Vollet et al., 2017), a majority of research has focused specifically on the first three levels (e.g., Coates, 2010; Kuh, 2003, 2009). An example of university projects is the China College Student Survey (CCSS). Until 2018, it had assessed over half a million Chinese college students (Huang et al., 2021), and based on such data, researchers identified three factors for student engagement: practical relevance, academic validity, and effective collaborations (Guo et al., 2022).

By comparison, relatively little research on student engagement has been conducted at the learning activity level and even fewer studies discussed how to measure engagement in CL. Prior engagement surveys, for example, School Engagement Survey (Fredricks et al., 2005), Student Engagement Scale (Gunuc & Kuzu, 2015) and Student Engagement Instrument (Appleton et al., 2006) focus on school/university and course levels. Simply applying the aforementioned macro-level instruments into micro CL settings would only yield a decontextualized understanding of engagement; violating the position that engagement is negotiated within particular contexts and influenced by environmental factors, such as pedagogical practices (Sinha et al., 2015). However, some published surveys inform potential indicators of BE, CE and EE in activity settings, such as paying attention (Gunuc & Kuzu, 2015), adopting meta-cognitive strategies (Muukkonen et al., 2020) and achievement emotions (Pekrun et al., 2005). Additionally, previous surveys have provided a collection of items to the present study.

In summary, two gaps in CL and student engagement have been identified: low engagement in Chinese university CL activities, and a lack of appropriate tools to assess and monitor student CL engagement. To address these gaps, the present investigation adopted a mixed method approach and designed two sequential sub-studies. The quantitative study constructed and validated a new measure of CL engagement in Mandarin. A following qualitative study interviewed a subset of participants from the quantitative study to triangulate the results of the quantitative one. In doing so, this investigation sought to contribute to the CL and engagement fields theoretically and practically.

Quantitative Study

Methods

Participants

In this study, 672 students from 11 courses at six universities were invited to participate. These six universities were located in various regions of China, including the north, west, south and east; three were comprehensive universities and three were polytechnic universities. Out of the invited students, 504 (75.0%) agreed to undertake the survey. However, 99 (19.6%) of those who consented were later removed from the study as they did not complete the survey. The final sample of 405 participants included 331 (81.7%) undergraduate students (249 females and 82 males) and 74 (18.3%) master students (56 females and 18 males). Students were from the faculties of Arts (n = 66), Business (n = 8), Education (n = 205), Engineering (n = 27) and Science (n = 99). Finally, two students indicated they were under 18 years old and all others reported being between 18 and 24 years old.

Procedures

Two steps were carried out to identify and recruit student participants in mainland China. The first step involved recruiting university lecturers who were employing CL activities in line with the criteria of the present study–formal CL assignments that lasted for several weeks. To achieve this goal, the first author published an advertisement introducing the present study's aims on social websites (e.g., Weibo and WeChat). A total of 14 lecturers from 10 universities contacted the first author. After communications about the design and implementation of CL activities, the first author identified that only 11 activities in 11 courses met the criteria.

Secondly, with the consent of lecturers, the first author sent the research advertisement and online anonymous survey links to students in 11 courses less than one month after CL completion. Students had one week to complete the survey. Lecturers agreed that students participating or not would not affect students’ grades, as shown in the participant information sheet included in the survey link. Approval for this study was granted by the University of Auckland Human Participants Ethics Committee (Reference Number: UAHPEC3300).

Instrument Development

The instrument was constructed using items from well-established measures of student engagement in other contexts. In keeping with Fredericks et al. (2004), items (N = 46) were initially organized by the latent construct they were intended to assess, namely BE, CE and EE. See supplementary files (Appendix A) for a complete list of all items and their original sources.

Behavioural engagement. BE was assessed with 8 items from Fredrick et al. (2005) and, Gunuc and Kuzu (2015). Participants were asked to use a five-point Likert-type scale (where 1 = never, 2 = rarely, 3 = sometimes, 4 = often, and 5 = always) to indicate the extent to which they engaged in a range of behaviours during CL (e.g., I actively interact with peers in my group).

Cognitive engagement. CE also applied a same five-point Likert-type scale as BE and was evaluated with 18 items from Appleton et al. (2006), Fredrick et al. (2005), Gunuc and Kuzu (2015), Maroco et al. (2016), and Muukkonen et al. (2020). Participants were required to recall how often they exerted psychological energy and applied meta-cognitive strategies to complete CL tasks (e.g., I try to do my part of groupwork in the best way).

Emotional engagement. EE contained 16 items from Appleton et al. (2006), Blasco-Arcas et al. (2013), Fredrick et al. (2005), Gunuc and Kuzu (2015), and Pekrun et al. (2005). Items were presented in a nine-point semantic differential scale. Semantic differential scales arrange a set of bi-polar adjectives in pairs (e.g., sad-happy) and participants need to select a point that can best reflect their feelings along the continuum with the nine points, which are assigned scores of − 4 to 4 in turn for data analysis. Semantic differential scales have been used to understand people’s emotional responses toward objects, events or concepts (Badia et al., 2014). They can not only gain similar and reliable results as Likert-type scales but also effectively avoid text repetition and redundancy than the latter (Schibeci, 1982). In EE section, participants were asked to report emotional feelings, such as enjoyment and boredom. (e.g., When I think about our groupwork, I feel bored vs. interested).

Modifications were made to the original items to fit CL settings. For example, the appellation ‘classmate’ and context ‘classroom/school’ were changed to ‘peers in group’ and ‘my group’, respectively. The first author together with another two Ph.D. students who were fluent in both English and Mandarin completed translation (from English to Chinese) and back-translation to ensure that the Chinese items were authentic and clear, and expressed the same meaning as the original English ones. The online survey was administrated and distributed via Qualtrics.

Data Analysis

Missing values and data analysis assumptions were firstly inspected. In the dataset with 405 cases, 380 (93.8%) participants responded to all survey items (n = 49, including three demographics items) and 25 (6.2%) participants missed one to five items (questions were not required to be answered for ethical considerations). Specifically, one participant missed five items, one missed four, two missed two items and the remaining 21 participants missed only one response. Among the 19,845 (i.e., 405 × 49) potential data cells, 34 entries (0.17%) were missed out and the percentage of missing values for each item varied from 0.0 to 1.0%. The expectation–maximization (EM) algorithm was applied to impute missing values (Dong & Peng, 2013; Watkins, 2018). The kurtosis and skewness of each variable were within the normal range for the sample larger than 300 (i.e., |kurtosis|< 7, |skewness|< 2, Byrne, 2010), indicating the dataset was univariate normal. However, Mardia’s (1970) multivariate estimates (p < 0.05) showed that the dataset was multivariate non-normal, which further informed the selection of the technique of factor analysis (discussed below). The data were randomly split into two independent samples–one including 200 cases used for exploratory factor analysis (EFA) and another one with 205 cases used for confirmatory factor analysis (CFA). EFA was employed to identify the underlying relationships between the items and CFA to confirm the factor structure suggested by the former result.

EFA was conducted following six steps: (1) Bartlett’s sphericity and Kaiser–Meyer–Olkin (KMO) tests; (2) Extracting latent common factors. The common factor analysis model and the iterated principal axis, which has no multivariate normality assumptions, were applied (Cudeck, 2000); (3) Determining the number of factors by a scree-plot and parallel analysis; (4) Oblimin rotation was used since the latent factors (BE, CE and EE) are known to be correlated (Fredricks et al., 2004); (5) Refining the item-factor structure; and (6) Naming and interpreting latent factors.

In CFA, based on the literature (Hu & Bentler, 1999; Kline, 2016), the following standards were used for a good (or acceptable) fit: the ratio of the chi-square value and the degree of freedom (i.e., χ2/df) < 2 (or 3); SRMR < 0.05 (or 0.08); RMSEA < 0.05 (or 0.08); CFI and TLI > 0.95 (or 0.90). Data cleaning, assumption checking and EFA were conducted in R® (MVN and psych packages) and CFA was conducted in AMOS® v. 27.

Results

Exploratory Factor Analysis

The Bartlett’s test (χ2 = 6927.70, p < 0.05) and KMO (0.92) showed that variables were sufficiently intercorrelated to each other and supported the appropriateness of EFA. The scree plot result (Fig. 1) indicated five factors since there was no significant drop after the fifth point, while parallel analysis results suggested four factors. Five items (BE_5, BE_7, CE_4, CE_8, and CE_10) were removed since their cross-loading issues may affect the item-factor structure. However, the discrepancy between the two methods still existed after deletion. The factor loading matrices of four and five factors were calculated to decide the number of factors. As detailed in Table 1, compared to the four-factor solution, the five-factor solution contained one fewer item that cross-loaded and better fitness indices (e.g., lower RMSR and higher TLI). Thus, the five-factor model was selected for further testing.

Fig. 1
figure 1

The scree plot of eigenvalues

Table 1 Five-factor and four-factor loading matrices estimated by the iterated principal axis (PA) method

The item-factor loadings in the five-factor model (Table 1) were referenced to decide whether an item should remain. For a sample size of 200, it is recommended that items with a factor loading less than 0.364 or with cross-loading issues (loading differences < 0.05) should be removed (Stevens, 2002). Thus, EE_3 and CE_1 were deleted. In addition, the alpha values also indicated that EE_11 should be deleted for increasing the reliability of Factor 3. On contrary, CE_13 and CE_15 were kept because otherwise, the alpha value of Factor 1 would decrease. Therefore, 34 items remained in the survey after EFA.

Factor 1 and 5 aligned with the hypothesized item-factor structures of CE and BE, respectively. Factors 2 to 4 contained different clusters of EE items, showing there were three sub-factors of EE. According to items’ meanings, the definition of EE (Fredricks et al., 2004), and relevant theories (Academic Emotional Theory, Pekrun et al., 2011); Expectancy-value theory, Wigfield and Eccles 2000), Factor 2, Factor 3 and Factor 4 were named as Sense of Belonging (SB), Positive Afeqe4we1111111e€fect (PA) and Task Value (TV), respectively. In summary, EFA results showed the second-order factor structure of student engagement: SB, TV and PA as the first-order factors of EE, and BE, CE, and EE as the second-order factors of overall student engagement.

Confirmatory Factor Analysis and Descriptive Analysis

The measurement model one (M1) with 34 items in five factors was constructed in AMOS. M1 followed three rules: each item had a nonzero loading on its first-order latent factor and zero loadings on other factors; the factor loadings of the first item in every congeneric were fixed as 1; and error terms of each item were uncorrelated. As presented in Table 2, the indices suggested its poor model fit to the data. The modification indices (MIs) showed three items (i.e., CE_5, CE_13, and EE_2) had high error covariance with one or more other items. Thus, CE_5 was removed in M2, however, the upper limit of 90% confidence interval of RMSEA was higher than the acceptable value. CE_13 was further deleted as the MIs of M2 suggested.

Table 2 Goodness-of-fit indices for measurement models of student engagement

The updated model M3 showed significant improvement compared to M2 according to a Chi-square test (Δχ2 = 92.4, p < 0.01). Although all indices of M3 were in their acceptable ranges, MIs still suggested to remove EE_2 because it was highly correlated with EE_1. M4 with EE_2 removed was significantly better than M3 (Δχ2 = 197.8, p < 0.01) and MIs did not indicate any further modifications. Therefore, M4 was selected as the most appropriate measurement model, and its standardized path estimates are presented in Fig. 2. A multiple-group CFA showed M4 to be equivalent at the configural (CFI = 0.93, TLI = 0.93, RMSEA = 0.05), metric (Δχ2(26) = 29.14, p > 0.05, ΔCFI = 0.001) and scalar levels (Δχ2(28) = 33.62, p > 0.05, ΔCFI = 0.001), indicating the overrepresentation of females in the sample did not affect the instrument validity.

Fig. 2
figure 2

Multifactor student engagement model

Descriptive analysis and correlation analysis of five latent factors were conducted (see Table 3). BE and CE were mediumly correlated to each other (r = 0.61**), while both were slightly associated with EE and three sub-factors of EE (0.16** ≤ r ≤ 0.29**). PA, TV and SB were mediumly correlated to each other, and the high path estimate (see Fig. 2) between SB and EE indicated the importance of SB among the three EE sub-factors. The alpha values indicated high reliabilities of sub-scales and the whole scale (α = 0.93).

Table 3 Descriptive analysis and correlation of latent factors of student engagement

Qualitative Study

Methods

Procedures and Participants

Figure 3 depicts the participants recruiting process of this qualitative study. Firstly, participants from the quantitative study were divided into three groups based on their overall scores: 1) high engagement (overall score > M + 1SD; n = 67), 2) medium engagement (M—1SD ≤ overall score ≤ M + 1SD; n = 278), and 3) low engagement (overall score < M—1SD; n = 60). Secondly, the first author contacted 21 students who indicated (at the end of the survey) interest in a follow-up interview while 9 rejected the invitation, therefore, 12 students (N = 12) were interviewed. The sample size was determined by data saturation–the point, at which, further data collection was no longer adding value or new insights (Braun & Clark, 2013). Interviewees varied in engagement levels and demographics (Table 4). The author intentionally over-sampled the low engagement group as the feedback from these students’ will be utilized in another paper investigating influencing factors of CL disengagement.

Fig. 3
figure 3

The process of recruiting participants for the qualitative study

Table 4 Demographics of participants in the qualitative study

Interviews

To understand participants’ experiences and engagement during CL tasks, the interviewer conducted semi-structured interviews after building rapport with interviewees (see Appendix B for guideline questions). All interviews were conducted online via WeChat and audio-recorded. Each interview lasted for 40 min on average. The audio recordings were transcribed into text via the website iflytek and further edited by the first author. The transcripts were sent to participants to check the accuracy, trustworthiness, and authenticity.

Results

In general, students’ opinions and perceptions towards their engagement in CL tasks aligned with the quantitative study. Students from the high engagement group, Diana, Faye and Penny recognized they were behaviourally and cognitively engaged in CL. Faye stated that “I did the literature review and worked with others on our presentation slides… I tried my best to do what I can do”. Diana also mentioned that she made a large contribution in the group work and felt satisfied with her engagement as “I would score myself 90 out of 100”. Three participants also expressed their favour towards the task and their collaborators (i.e., emotional engagement). As Penny (in the course of Instruction design and implementation) said:

I like my task. I love exploring students’ needs and designing something [e.g., courses] new for them…We need to restructure teaching content because the well-organized teaching content benefits students.

The medium engagement group (Mia, Xavier and Perl) was pleased with their contributions, yet acknowledged the gap between what they could have achieved and what they did achieve. This self-evaluation verified their survey results. Mia stated that “I have completed part of the group work, like searching resources and organizing key ideas … However, in my opinion, our group product and what I have done, were just okay, not very outstanding”. Likewise, Perl acknowledged that the outcomes did not reach her expectation. Xavier (in the Wine experiment course) admitted he did not engage in the CL task deeply and explained the reasons:

I know these experiments are essential for wine production. But these skills are necessary for the people who plan to be inspectors. Well, I don’t want to do that in the future. So I just ensured I can understand the experiments, rather than setting a high level for myself in this course.

The other six students recognized that they did not engage in the CL task which triangulated the low survey scores they received. For example, Leo admitted: “I can’t say I contributed knowledge or good ideas to my group. In fact, I almost disengaged”. Some students also explained that the group members did not have adequate collaborative competencies to interact deeply and work smoothly. As Helen stated, “My peers and I were so fresh as first-year students. Collaborative learning was very strange to us, so we didn’t collaborate very well”.

Discussion

CL is widely spreading around the world. Students’ achievements in CL activities are dependent on effective engagement. Low engagement has been identified as a common issue while it has been paid relatively little attention in China. This paper situated student engagement in the Chinese CL context and developed a Chinese engagement scale. The results are discussed below.

The Collaborative Learning Engagement Scale and Engagement Construct

The quantitative study constructed and validated a Mandarin CL engagement scale, and explored the conceptual structure of engagement. Results from EFA and CFA showed the high reliability of the scale and indicated a hierarchical structure of engagement, with BE, CE and EE being the second-order factors, and PA, TV and SB being the first-order factors of EE. BE measured participants’ positive conduct and involvement in CL, including both individual effort (e.g., BE_4) and group commitment (e.g., BE_1) indicators, which have not been clarified in previous engagement literature. CE was about students’ cognitive effort, meta-cognitive strategies (e.g., shared regulation) and the motivation to complete CL tasks. As for measuring EE, the semantic differential scale was creatively used in the present study and it gained reliable results.

The first-order factors of EE, including PA, TV and SB, perfectly aligned with the core elements in Fredrick et al.’s (2004) definition, however, these three sub-factors were not overtly identified by prior engagement scales. For example, scales may incorporate Teacher-Student Relationships (Appleton et al., 2008), Peer Relationships, and Relationships with the Faculty Member (Gunuc & Kuzu, 2015), in the emotional dimension. The present investigation suggested that students’ perceptions towards tasks (e.g., PA and TV) and teams (SB) were critical indicators. Such differences were reasonable because previous studies were conducted at the course or school level, while the present study focused on learning activities, where students’ feelings towards tasks and collaborators were more directly influencing learning process and outcomes (Pekrun & Linnenbrink-Garcia, 2012).

The qualitative study interviewed 12 students, who reported high, medium and low levels of engagement in the scale, and triangulated the quantitative results. The high-level engagement interviewees were satisfied with their contribution and engagement; the medium group recognized their contribution, yet acknowledged that they could have engaged better; the low engagement group admitted that they hardly contributed or engaged in CL.

Furthermore, this paper found BE and CE were mediumly correlated with each other, mirroring previous findings at both activity (Naibert and Barbera 2022) and school (Virtanen et al. 2018) levels. Such results indicated that participants would more likely engage with behaviours and cognition synchronously, and students’ effort, attention and concentration (i.e., BE) were important in facilitating CE (Wilson et al. 2021; Wolters and Hussain 2015). The present investigation also found that EE had low correlation with BE and CE. It was not consistent with Pekrun’s control-value theory which proposed when students a have positive sense towards a task, they will be willing to exert more time and effort on it (Pekrun et al. 2002; Pekrun & Linnenbrink-Garcia, 2012). However, the low correlation between EE and BE/CE was also found in a longitudinal study conducted by Manwaring et al. (2017) in higher education. Such results indicated that some adult learners were less dependent on their emotional feelings (e.g., interest in tasks) to engage behaviourally and cognitively, but engaging for external rewards. Nonetheless, other students still attributed their investment (or disengagement) to interest in (or dislike for) the task and the collaborating experiences with peers.

Limitations and Future Research

Several limitations can be found in the present research. This study was cross-sectional observation research, therefore, the results cannot make causal conclusions. Additionally, most participants were female from humanities majors, which may not be able to represent all university students in China. In addition, the survey was designed to measure overall engagement levels post hoc. However, student engagement is dynamic and can vary from time to time during CL. Accordingly, future researchers are encouraged to conduct longitudinal research and adopt micro-genesis methods to capture finer-gained learning data. Further studies could also explore what factors influence student engagement in CL and how.

Implications and Significance of the Present Research

This research contributes to the realm of student engagement and CL practically and theoretically. The validated scale can serve as an instrument in future research that focuses on CL or student engagement in other learning ac and applied in other language contexts. Teachers could also use the full scale (or one of the subscales) to assess and monitor student engagement, and intervene accordingly as needed. Moreover, the refined conceptual structure and clarified indicators can guide CL design and assessment. For example, since BE covers indicators at both individual and group levels, teachers are suggested to assess CL tasks at two levels as well. Considering PA, SB and TV, teachers could (1) ensure tasks align with students’ intrinsic interests, (2) help build up team cohesion and interdependence, and (3) clarify the values and benefits of fully engaging in CL, to enhance students’ EE. Only with effective actions being taken appropriately, can the low engagement issues be solved and the expected outcomes of CL be achieved.