Introduction

The objective of this study presented is to investigate the effects of algorithm-based group formation by homogeneous and heterogeneous distributions of prior knowledge and extraversion on online group-work-outcomes, including satisfaction, performance, and group stability. For this purpose, we utilize a 2×2 research design to assess which type of group formation leads to the best results by the preset criteria for matching group members.

Group work has long been an important didactic tool for promoting learning at various levels and has already proven its worth as such (Lin et al., 2016; Mujkanovic & Bollin, 2019). In the wake of the Covid-pandemic, the need for pedagogical methods for Computer-Supported-Collaborative-Learning (CSCL) has increased (Hodges et al., 2020). For online learning to be successful, it is important to include social ELEMENTS (Gillen-O'Neel, 2021; Wildman et al., 2021). A key solution to creating good starting conditions for online learning for each student is to create groups (Gillies, 2004). Potentially disadvantaged students could also be identified and targeted through research on forming groups based on various criteria (Chahal & Rani 2022; Hachey et al., 2022). However, we already know from previous research that groupwork does not always benefit every learner (Chang & Brickman, 2018).

A crucial aspect seems to be the way in which groups are formed, which has a major impact on their success or failure (Bellhäuser et al., 2018; Müller et al., 2022). Therefore, there is increasing interest in research on what criteria could be used for group formation. This research trend on group formation has already been shaped by the increasing prevalence of online-based learning with large numbers of users (e.g., Massive Open Online Courses [MOOCs]), which require algorithmic support for group formation.

Research on group formation can be found predominantly in the field of computer science (Borges et al., 2018; Liang et al., 2021; Maina et al., 2017; Odo et al., 2019). Still, a critical look at this research literature from a psychological point of view reveals that the measuring instruments often do not meet common psychometric requirements (Kirschner, 2017). Additionally, in many cases, the correlative research designs used do not allow causal conclusions. Experimental research on the outcomes of group formation of students at the university is still missing (Bell, 2007; Nijstad & de Dreu, 2002; Thanh & Gillies, 2010). As a desideratum of research, we deduce that interdisciplinary approaches are necessary to meet an optimal group-formation-challenge (Bellhäuser et al., 2018; Houlden & Veletsianos, 2022; Müller et al., 2022). The aim of this research is therefore to systematically evaluate, how homogeneous and heterogeneous distributions of extraversion and prior knowledge, configured in a 2×2 research design, affect satisfaction, performance, and group stability.

Virtual learning in groups

The COVID-19 pandemic abruptly compelled many students to transition to remote learning, making online groupwork a pertinent and promising tool (Houlden & Veletsianos, 2022). Groups, viewed as complex adaptive systems (Ramos-Villagrasa et al., 2018), exhibit internal cohesion termed a "we-feeling" (Stürmer et al., 2013). A virtual group, defined as individuals geographically, organizationally, and/or temporally dispersed, collaborating on organizational tasks (Powell et al., 2004), surpasses constraints of time zones, distances, and organizational boundaries (Lipnack & Stamps, 1999). Previous research strongly advises against comparing online and face-to-face learning groups, particularly regarding group formation and outcomes (Atchley et al., 2013). The pandemic underscored the significance of online group work, necessitating further research (Williams & Castro, 2010) to delve into the social interactions influencing virtual groups (Hwang et al., 2013; Montoya-Weiss et al., 2001).

Challenges and opportunities in online-group-work-research

Despite extensive research on group work and attempts at improvement through formation (Borges et al., 2018), a standard model for group formation lacks consensus in the literature, hindering consistent and beneficial comparisons (Clark et al., 2019; Magpili & Pazos, 2018). Computer-Supported Collaborative Learning (CSCL) is a widely used strategy in online-supported university teaching, yielding performance advantages (Johnson et al., 2014) and enhancing emotional motivation (Cleveland-Innes & Campbell, 2012). It also has the potential to mitigate social isolation prevalent in digital learning contexts, positively influencing learner satisfaction (Liu et al., 2020; Mehall, 2021). The online environment's anonymity can aid in overcoming social anxiety, encouraging participation by silent or shy members (Kerr & Tindale, 2004).

However, socio-technical challenges may arise in online working groups (Montoya-Weiss et al., 2001). Virtual learning environments exhibit higher dropout rates compared to traditional settings (Diaz, 2002; Yang et al., 2013). Students in virtual spaces may feel lonely and isolated, demotivating them and increasing the likelihood of course abandonment (LaRose & Whitten, 2000). Issues such as low engagement from others when questions are posed may impede a sense of belonging, potentially leading to failed group work (Conole et al., 2008; Erez et al., 2013). Prolonged videoconferencing can result in "zoom fatigue" (Nesher Shoshan & Wehrt, 2021).

Online environments often neglect to support social processes (Krejins et al., 2002), crucial for collaborative task-solving (Lou et al., 2001). Incorporating cooperative learning elements increases interactivity, reduces feelings of isolation, and can counteract low participation and high dropout rates in online courses (Khalil & Ebner, 2014; Liu et al., 2020).

Group formation based on algorithmic assistance

Group formation significantly influences learning-group success (Bell, 2007; Halfhill, et al., 2005). It involves assembling groups through criteria-based member selection, while group composition refers to processes post-formation (Tuckman, 1965). Relevant criteria for group composition include demographic aspects, personality traits, attitudes, and cognitive preconditions, with either homogeneous or heterogeneous distributions considered advantageous depending on the criterion (Bowers et al., 2000).

Productive interaction among learners often does not occur spontaneously, necessitating criteria-based group formation. However, such research is limited due to its association with challenging selection procedures, prompting a need for interdisciplinary research on group formation, including development, criteria, and algorithm evaluation (Dinca et al., 2021).

CSCL is prevalent across disciplines, and recent research has focused on group formation in computer science and interdisciplinary projects. The demand for online tools, including MOOCs, requires algorithmic support, but experimental studies are scarce, and common recommendations for group formation are lacking (Magpili & Pazos, 2018). Standardized reporting methods for studying online collaboration are also absent in the literature (Hachey et al., 2022; Pai et al., 2014).

Characteristics used for group formation

Group formation within collaborative learning environments is influenced by two subcategories of variables: surface and deep-level criteria. Surface-level variables, including demographics such as gender and age, are deemed less critical for group success compared to deep-level variables, which encompass personality factors, values, and attitudes (Bell, 2007; Harrison et al., 1998, 2002; LePine et al., 2011). While demographics provide insights into group composition, it is the deeper traits that significantly impact group performance.

The selection of single-group-member-attributes as criteria for group formation can result in either homogeneous or heterogeneous constellations. Homogeneous groups, characterized by similarity among members, often foster comfort, productivity, and friendly behavior, leading to a preference for homogeneous fit (Ilgen et al., 2005; den Hartog et al., 2019; Muchinsky & Monahan, 1987). Conversely, heterogeneous groups, with diverse member attributes, contribute unique skills to the collective, enhancing overall performance (Bekele & Menzel, 2005; Cable & Edwards, 2004; Moore, 2011). While research on group formation varies, studies emphasize the significance of personality traits over demographics (Martin & Paredes, 2004).

This study examines group formation based on the combination of personality trait extraversion and prior knowledge skill level. Beyond optimal group distribution, within-group determinants of extraversion and prior knowledge are explored under homogeneous and heterogeneous fits. The study evaluates these determinants' importance through objective outcome measures like academic performance and subjective outcomes such as satisfaction, forming a theoretical framework for the criteria used to form groups.

In exploring group dynamics within collaborative learning environments, the integration of diverse demographic and personal information variables, including age, gender, average math grade and final school grade, is crucial for understanding group dynamics. Demographic variables offer insights into group interactions and outcomes, with age diversity linked to increased creativity and gender diversity associated with improved decision-making (Woolley et al., 2010). Additionally, academic indicators like math grade and final school grade influence individual contributions and interactions within groups (Cohen & Lotan, 1995).

Personality traits, particularly the Big Five traits extraversion, conscientiousness, openness, neuroticism, and agreeableness—shape group formation and dynamics. Extraversion impacts communication and social interactions, while conscientiousness influences task-oriented behaviors and productivity (Barrick & Mount, 1991). Motivation, assessed through factors like expectations and interest, drives engagement in collaborative learning experiences (Deci & Ryan, 1985; Kosovich et al., 2015). Additionally, team orientation preferences influence group dynamics, communication, and effectiveness (Harvey et al., 2019).

In conclusion, the incorporation of demographic information, personality traits, motivation factors, and team orientation variables provides a comprehensive understanding of the factors influencing group dynamics and collaboration within collaborative learning environments. By considering these variables, researchers can optimize group interactions and enhance learning outcomes in collaborative settings.

Extraversion

Extraversion is considered relevant to the formation of a group (Humphrey et al., 2007). It symbolizes a very interesting personality trait and is thus associated strongly with effectiveness (Hogan et al., 1994) and leadership behavior (Driskell et al., 2006; Judge et al., 2002; Nonaka et al., 2016). People with a low level of extraversion seem to be reserved and less involved in social situations (Power & Pluess, 2015). Since extroverts are more likely to assert themselves in groups, it follows that these individuals often take on leadership roles when working with other people (McCabe & Fleeson, 2012; Taggar et al., 2006). Concerning personality traits as group formation criteria, the literature assumes a heterogeneous distribution of extraversion within a group. Extraversion goes together with leadership-behavior and is differently pronounced among members (Kramer et al., 2011).

Prior knowledge

Due to an ambiguous definition of prior knowledge, we should emphasize first of all prior knowledge's multidimensionality (Williams et al., 2008). Prior knowledge has had a major influence on the outcome of groups (Horwitz, 2005). The extraordinary role of prior knowledge, especially the activation of prior knowledge for learning, can be verified for the success of learning processes of young children (Saalbach & Schalk, 2011). Additionally, prior knowledge seems to be a decisive predictor of academic success (Riazy et al., 2021). Group members can share and increase prior knowledge within a group through contribution. Superiority of it in single members can cause a great added value for the whole group (Weinberger et al., 2007) and those synergy effects between the participants can significantly influence student achievement (Hailikari et al., 2008). We can assume approximately the same results for online groups (Engel et al., 2014, 2015).

Drawing a valid picture of group formation using the criterion of prior knowledge, we must consider the individual's preconditions. Most likely, low-ability students are more motivated to learn in heterogeneous groups, average-ability students perform better in homogeneous groups, and high-ability students show equal results when examining the effects of heterogeneous or homogeneous groups on the performance of pupils with high, average, or lower abilities (Saleh et al., 2005). Consequently, high, mid, and low-competence students would differ between heterogeneous and homogeneous groups in their learning outcomes (Donovan et al., 2018). Researchers found improved cooperative skills and performance in heterogeneously formed student groups based on intelligence and gender (Mehta & Kulshrestha, 2014). Although researchers assumed positive effects of determinants in children and young adults in the school-leaving age, the question arises, whether their observation allows for the prognosis for students.

Aim of study

Interdisciplinary approaches are essential to address the challenge of optimal group formation. Since previous studies on online group formation techniques are scarce or limited to correlative settings, findings on the significance of extraversion and prior knowledge in online groups are lacking (Odo et al., 2019). Comparing results to understand the nature of groupwork collectively is complex (Magpili & Pazos, 2018), necessitating more attention and research in this area, as advocated by some authors, who called for further investigation in the link between personality in groups and group outcomes (LePine et al., 2011).

The assessment of group-work-outcomes should encompass various levels. It is not only performance that matters, but also the satisfaction of group members with their group dynamics and processes, as well as the group's duration over the course of the project. Course completion often serves as a measure of effectiveness (Hachey et al., 2013, 2022), highlighting the need for institutions to predict online students' persistence to address dropout rates. However, many studies focus on individual outcome measures exclusively, leading to a lack of standardization in the literature. Reporting diverse outcome measures is crucial for comparing study results. Thus, we examine the effect of extraversion and prior knowledge distribution on outcome variables such as satisfaction with group work, assignment performance, and group retention (referred to as "group stability").

This study investigates extraversion and prior knowledge as criteria for online group work in a university context. Building on previous findings, we hypothesize that heterogeneous group formation will be advantageous in a similar setting. Experimental studies exploring these criteria are currently unavailable. We have formulated our hypotheses accordingly:

H1

Individuals in groups, assigned heterogeneously in extraversion by algorithm, will be more satisfied with the group composition, produce better results in the assignment and spend more time on group work than those in homogeneously extraverted groups.

H2

Individuals in groups assigned heterogeneously in prior knowledge by algorithm will achieve better outcome measures (see above) compared to homogeneous grouping. Additionally, we formulated an open, explorative research question assuming an interaction effect of both above-described measures.

RQ1

There will be an interaction effect between extraversion and prior knowledge for the outcome measures (see above).

Materials and methods

Sample

We recruited participants from a four-week-online-preparation-math-course at the university in September 2019. This online math course is mainly for beginners of science, technology, engineering, and mathematics (STEM) subjects. It focuses on repeating the mathematical basics from school to improve the scholastic aptitude and reduce the heterogeneity of knowledge among the students. Students can do all the topics and tasks in the online math course voluntarily and in any preferred order; participating in the online math course does not result in a grade. We recruited participants (female = 172) and obtained their informed consent in writing. We did not determine exclusion criteria before participation. We included those students who made the decision to work in groups in our computation. After the acquisition, we matched 375 participants to groups of three, leading to 125 groups. To maximize the number of formed groups, we chose a group size of three members. We asked participants to work on weekly assignments and fill in evaluations of the quality of their groupwork. We also conducted a test at the beginning and the end of the course and a final evaluation after the course.

Study design

The study presents a systematic, fully crossed experimental design with two factors (extraversion, prior knowledge) manipulated in two stages (homogeneous, heterogeneous) in all groups. We, therefore, have a between-subject-design with two factors (personality trait extraversion and prior knowledge) with two levels, respectively.

Instruments

After having consented to participate in the study, participating students were asked to fill out a demographic and psychological questionnaire at the beginning of the preparation course, which included questionnaires regarding their personality, prior subject knowledge, motivation for the course and team orientation. Participants answered the questionnaires online, using a Likert-scale from 1 ("does not apply') to 6 ("does completely apply").

Experimental variables

Extraversion The short version of the Big Five Inventory (BFI-K; Rammstedt & John, 2005) was used to assess the extraversion of the participants. The BFI-K was developed as a quick-response questionnaire that, with an average duration of processing of less than 2 min, can be considered extremely economical. It measures extraversion with 8 items, answered on a 6-point Likert-Scale, ranging from very inaccurate to very accurate. The validity between the BFI-K and the NEO-PI-R (Costa & McCrae, 1992) was established by Rammstedt and John (2005). Exemplary items for extraversion are "I am very enthusiastic" and "I am outgoing, sociable." Cronbach's alpha was α = 0.89.

Prior knowledge We measured prior subject knowledge with participants' self-estimation on every subdimension of mathematical content from the school. We based matching concerning previous knowledge on the result of the entrance tests, i.e., participants completed the entrance test before the end of the group formation. The entrance test focused on mathematical tasks students should solve to succeed in the first mathematical lectures. The entrance test is adaptive, so that each of the participants works on a different set of tasks based on whether they solve tasks correctly or incorrectly (Konert et al., 2016). The participants can score x points out y possible points, where x and y can differ for all participants. We then added the following two questions for the group formation and asked the participant to describe his "achievements score on the test" and the "maximum possible score on the test." We used the entrance-test-score to calculate the quotient x/y of the number of points achieved and the number of points achievable. We used this score for grouping as the value for previous knowledge of the participants.

Control variables

Demographics and Personal Information We asked for participants' age, gender, average math grade and average final school grade, as well as confirmed consent to participate in the current study.

Personality The Big Five personality questionnaire (BFI-K; Rammstedt & John, 2005) demonstrated robust reliabilities in this setting (extraversion: 8 items, alpha = 0.89; conscientiousness: 9 items, alpha = 0.83; openness: 5 items, alpha = 0.70; neuroticism: 4 items, alpha = 0.79; agreeableness: 4 items, alpha = 0.64).

Motivation We measured motivation (EVC; Kosovich et al., 2015) within four subscales: expectations (4 items, e.g.," I know that I can learn the contents of the preliminary course," alpha = 0.86), use (5 items, e.g.,? I understand how important the preliminary course is for my future,? alpha = 0.78), cost (6 items, e.g.,? The time required for the preliminary course seems great to me,? alpha = 0.83), and interest (7 items, e.g.,? I?m looking forward to the preliminary course," alpha = 0.80). Reliabilities of the motivation scales were high.

Team orientation We measured team orientation using three questions (e.g., "If I have a choice, I'd rather work in a team than alone," alpha = 0.86.)

Honesty We recorded the honest answering of the questionnaires with the question, "I have concentrated the questions and answered them honestly," with the possible answers: "Yes, completely concentrated and honest," "Yes, mainly concentrated, and honest," and "No, not concentrated and honest at all." Only the last option led to the exclusion from participants.

Dependent variables

The evaluation questionnaire contained questions of mainly satisfaction. Additionally asked in the evaluation were question regarding involvement and time spent (e.g.," How much time (in minutes) did you personally spend on individual preparation?"," How many personal meetings have you had with your group in the last week?". A communication question included:" How many members of your study group did you communicate with in any way this week?") which were not all included in the result section.

Satisfaction The evaluation of satisfaction was done with an online evaluation questionnaire filled out by participants, as a precondition to group assignments turn-in. Questions included for satisfaction were for example: "I am satisfied with the cooperation in my group"," Our group has worked productively". We used the overall score of all answers regarding participant satisfaction as a result measure of satisfaction ranging from: 1 ("low satisfaction") to 6 ("high satisfaction").

Assignment Homework handed in was graded for quality of the proposed solution by different previously trained student tutors. Homework needed to be turned in three times during the course. Grade point ranged from 0 to 10.

Group stability In addition, we used the number of all group homework assignments to be handed in during the preliminary course (absolute value = 3) as a key figure to obtain an objective measure of group stability and the possibility of making the stability of group cooperation measurable and portrayed over time.

Algorithm in use to perform the group formation

Moodle is an online e-learning platform used at the university where we conducted our study. To facilitate the chosen study design, we developed the plugin MoodlePeers, which implements the algorithm named GroupAL. The plugin is published as an Open Source Project and is available in several versions at Moodle.org.[1] For the two-factorial and two-stage experimental design, the algorithm has to meet the following objectives: extendable modelling and exchangeability of criteria, support for the formation of mixed homogeneous and heterogeneous groups across multiple criteria, and normed quality metrics for group formation and differences between the formed learning groups (Konert et al., 2016).

MoodlePeers has shown that non-linear optimization is a preferable method to semantic, ontology-based approaches for achieving these goals. Consequently, the Group AL is also based on this optimization and uses n-dimensional vectors to represent the criteria. To assign participants to groups, the algorithm relies on three metrics that build on each other: the PairPerformanceIndex (PPI), which shows the suitability of two participants to each other, the GroupPerformanceIndex (GPI), which measures how all participants in a group match each other, and the CohortPerformanceIndex (CPI), which indicates the difference or similarity of all groups in relation to each other. Users can optimize the weighted criteria, based on either homogeneously (1) or heterogeneously (0). For this purpose, the PPI uses a weighted normalized distance function as the basis for matching in terms of fit (homogeneous) or complementarity (heterogeneous) of group members on criterion dimensions. The evaluation of the MoodlePeers tool showed better results than other non-linear optimized algorithms in terms of the quality of group formation both within groups and between groups in the resulting cohort (Konert et al., 2016). Consequently, it was possible to realize the planned experimental design in which the cohort of participants was divided into small groups within four segments of equal size (see Table 1).

Table 1 Grouping scheme, algorithm applied to match in groups

We therefore divided participants into groups, that were similar or dissimilar in the two traits of extraversion and prior knowledge. Individuals with similar quotients were thus matched, to create homogeneity in the groups with respect to their levels of prior knowledge and extraversion. Matching participants with similar quotients in the two experimental variables ensure that there is homogeneity in the groups with respect to prior knowledge. The same is true for extraversion. Groups that are matched very differently in the quotient of these experimental variables (extraversion, prior knowledge) are in turn maximally different in these characteristics, i.e., heterogeneous within their group. Here, the algorithm tries to generate the largest possible distance to the group mean, and thus a high standard deviation within the members of this group across the entire population.

Data analysis

We grand mean-centered the personality traits as well as motivation subscales for better interpretation. We used block randomization in randomly assigning each participant to one of the four conditions. As mentioned above, the algorithm randomly divides the whole sample in four equally large parts. It then makes sure that all four parts are comparable in their distribution of the relevant personality trait and attribution of prior knowledge.

Data exclusion

The algorithm will not match participants, who have not filled in the questionnaires honestly with a group. Instead, it puts them together with people with missing data and forms random groups. We excluded from analyses participants who forgot their participation codename or misspelled it in the posttest, since we could not match data from pre- and posttest. Additionally, we detected questionnaire data for traces of careless responses and eliminated them when there were obvious cases (Meade & Craig, 2012).

Explorative analysis

As part of our exploratory data analysis, multilevel models were created for each of the three outcome variables (satisfaction, assignment, group stability). In doing so, we included different variables in each model, similar to a regression procedure (Moerbeek et al., 2003), to show their proportional effect on the respective outcome variable. After prior construction of the null model, we stepwise selected gender, age, and conscientiousness, in addition to the experimental conditions of group formation by extraversion and prior knowledge. We decided to include conscientiousness in our models because this variable showed strong correlations with prior knowledge (Meyer et al., 2022). Results from these exploratory analyses can be used for hypothesis-building in future projects.

Results

We conducted our analyses in light of our hypotheses; that is, we looked at whether heterogeneous grouping in extraversion and heterogeneous grouping in prior knowledge led to positive outcomes regarding the group members satisfaction, achieved points in assignments, and total number of assignments submitted (group stability). We also examined the interaction effects of the heterogeneous and homogeneous group formation. We also explored whether variances at the group- or individual-level and whether the experimental groups would lead to successful outcomes. We analyzed the data using SPSS 23.2, and R.

Descriptive analyses of the data structure

We start the presentation of results with a brief presentation of the underlying data structure, computed with SPSS. The description of the sample now includes the dropout analysis of the study. Most students, who filled out the questionnaire at measurement time point 1 before group formation, participated only in the first measurement time of group work. Due to this and the overall high dropout, we used the evaluation of the first measurement time point of groupwork only. We included the satisfaction with group work in the first evaluation (Satisfaction), the performance quality of the first homework (Assignment), and a measure of group stability as an outcome, including the sum of submitted group homework across all time points (Group stability). Table 2 illustrates the data structure at the selected result variables.

Table 2 Descriptive measures of main dependent variables

Univariate analysis of variance with two factors

We are interested in confirming or rejecting the posed question: Will either the heterogeneous or homogeneous grouping in the two manipulated variables of extraversion and prior knowledge affect the three outcome measures: satisfaction, performance, and group stability? We conducted ANOVAs to investigate changes in mean-value-differences and if changes were by chance or systematic and significant. We found no significant main effect on the outcome measure of the dependent variable satisfaction for both factor extraversion F(1,72) = 0.24, p = 0.63, η2 = 0.01, and factor prior knowledge F(1,72) = 0.10, p = 0.75, η2 = 0.01. There was also no significant interaction effect: F(1,72) = 0.26, p = 0.61, η2 = 0.04. Additionally, effect sizes are negligibly small. The main effect of criterion extraversion on the dependent variable first assignment is also not significant: F(1,235) = 1.80, p = 0.18, η2 = 0.08. Like the main effect of prior knowledge on the dependent variable assignment, F(1,235) = 1.61, p = 0.27, η2 = 0.07, the interaction effect is not significant: F(1,235) = 1.23, p = 0.27, η2 = 0.01. The main effect of extraversion showed no significance on the dependent variable group stability F(1,235) = 0.16, p = 0.69, η2 = 0.01 and the main effect of prior knowledge F(1,235) = 0.07, p = 0.79, η2 = 00. Thus, the interaction effect is significant despite the small effect size F(1,235) = 4.15, p = 0.04, η2 = 0.02.

Data analyses: considering the group structure

As an explorative part of our analyses, we used R-package “1me4,” version 1.1–18.1 to calculate multi-level models (MLM), taking into account the structure of data, where individuals were nested in groups (Bates et al., 2020). Traditional multiple regression techniques treat the units of analysis as independent observations. One consequence of failing to recognize hierarchical structures is that standard errors of regression coefficients will be underestimated, leading to an overstatement of statistical significance. As in our study, mostly the key research question regarding group formation research concerns the extent of grouping in individual outcomes and the identification of ‘outlying’ groups. In evaluations of group performance, for example, interest centers on obtaining ‘value-added’ group effects on students’ attainment. Such effects correspond to group-level residuals in a multilevel model, which adjusts for prior attainment (Hox et al., 2018; Van Landeghem et al., 2005).

We used a step-up modeling strategy to address the different problems and structures of the outcome variables. The special features of the result variables are now first listed, and then the respective solution for each result variable is shown in a model. We can report group variance using the Interclass Correlation Coefficient, ICCs. This was done by first setting up the empty or null-level-model without any explanatory variables, which describes the partition between variance at the student-level and at the group-level. ICCs that are nontrivial and greater than 0.05 must be considered (Hox, 2010). It is important to mention that the variances could be misleadingly high, due to the small group size and slight variation of outcomes on the individual-level. Thereby, the group could explain 54% of the variance on variable assignment, 34% of the variance in variable satisfaction, and 60% in group stability. The calculated linear equation models are shown in Tables 3, 4, and 5. We added the predictor's age (age of the students), average grade (as the self-stated average grade in math during school), personality traits, extraversion, agreeableness, neuroticism, and conscientiousness in each model. The proportional values of the explanatory variables in the model are represented by the respective sizes of the coefficients. The best model fit can be identified by AIC or BIC. Asterisks mark the significant predictors. Satisfaction. For the dependent variable satisfaction with the group (“Satisfaction”), we assumed normal distribution and linear equation models were calculated. Table 3 shows the results. Model fit was best in Model 1 and 4 showing lowest BIC and AIC values. In the models, experimental conditions and personality traits, extraversion and conscientiousness are shown to be more important predictors, then demographics such as gender and age. In the models, experimental conditions and personality traits extraversion and conscientiousness are shown to be more important predictors, then demographics such as gender and age. No significant predictor for satisfaction could be revealed.

Table 3 Individual-level and group-level predictors of dependent variable satisfaction
Table 4 Individual-level and group-level predictors of dependent variable assignment
Table 5 Individual-level and group-level predictors of the dependent variable group stability

Assignment We established and adapted a hierarchical linear model for not normally distributed variables for dependent variable assignment. Model fit does improve from Model 1 to Model 4 with model 4 having the best fit. Again, only conscientiousness is revealed as a significant predictor for assignment. Table 4 reports the results.

Group stability Table 5 shows the individual and group-level predictors of the dependent variable of overall submitted assignments (“Group stability”) as an indicator of group work endurance. Model fit constantly improved from Model 1 to Model 4. The decision was made to calculate a generalized linear mixed model (GLMM). In contrast to simple regression analysis and multiple regression analysis, the dependent variable here can be binary with only two values: 0 for "not delivered" and 1 for "delivered". This means that it is not the value of the dependent variable that is predicted here, but the probability that the dependent variable takes on the value 1. Furthermore, the conditions are less restrictive than in linear regression analysis. Still, any postulated causal relationship must be theoretically justified (Hox et al., 2017, 2018). Most of the independent variables have no influence on the probability that the dependent variable "group stability" takes the value 1, i.e., that the group stability remains. Only conscientiousness turns out to be a significant predictor of group stability.

Discussion

In this study, we implemented group formation based on the personality trait extraversion, aligning with prior knowledge regarding the corresponding standard deviation. The working hypotheses posited that superior results in subjective satisfaction, performance, group stability, and overall better group formation would be achieved through (1) a heterogeneously formatted group in extraversion and (2) a heterogeneously formatted group in prior knowledge. However, our study results did not substantiate these hypotheses. Consequently, the study hypotheses are rejected. However, it is essential to underscore that the rejection of hypotheses still constitutes a significant finding. Our research yielded no significant results, indicating the absence of a main effect of extraversion or prior knowledge on group outcomes. Yet we observed an interaction-effect of extraversion and prior knowledge on group stability: interactions with a heterogeneous distribution of extraversion and a homogeneous distribution of prior knowledge demonstrated the highest retention in groups. Although these differences lack statistical significance, they suggest that assuming a direct relationship between personality traits and group-work-outcomes may be unwarranted. These results align with prior studies advocating for research on the personality-work-behavior relationship to transcend linearity assumptions (Curşeu et al., 2019).

The literature suggests that different ability-types among students yield benefits from working in either heterogeneous or homogeneous groups (Saleh et al., 2005). Given the overall low value of prior knowledge in our sample, one might infer that heterogeneous groups are generally superior to homogeneous ones, due to the possibilities inherent in group formation. The variability in mean values between homogeneously grouped prior knowledge groups could be another contributing factor to the absence of significant results. A crucial finding is that, despite the initial online nature of group work, the group level could statistically account for most variances. This underscores the importance of group formation and development in understanding groups.

Exploratory findings reveal significance in the model for all performance-related outcomes in the predictor conscientiousness. Conscientious individuals, characterized as goal-oriented, structured, organized, and self-disciplined (Costa & McCrae, 1992), are associated with better performance (Hurtz & Donovan, 2000). Prior studies have consistently affirmed that conscientiousness exhibits the highest correlation with performance success among other personality traits (Busato et al., 2000; Di Fabio & Busoni, 2007; Furnham et al., 2003; Lounsbury et al., 2003) and displays the strongest correlation with academic success (Di Fabio & Busoni, 2007; Protsch & Dieckhoff, 2011). Consequently, behaviors associated with conscientious team members are likely to be beneficial for group performance, including the fulfillment of task roles, as evidenced in our study.

Considering the achievement level and personality characteristics of group members in group formation, as opposed to self-selection or random group formation as suggested by Bekele and Menzel (2005), where greater diversity of personalities within the group positively affects overall performance (Roberge & van Dick, 2010). In contrast, other studies suggest that successful and unsuccessful teams are homogeneous with respect to various characteristics (Wax et al., 2017). Similar studies revealed significant effects due to group formation based on standard deviation. However, results of a meta-analysis showed that this formation was unrelated to performance (Devine & Philips, 2001). It is possible that the significant effects found in the studies may be explained by the nature and difficulty of the task (Bowers et al., 2000). Additionally, literature on MOOCs has shown that participants are more likely to complete to obtain a certificate (Liu et al., 2020). Both the voluntary participation in the course and the lack of relevant evaluation of the course could be reasons for the students in the study not completing the course.

Students are familiar with online tools and generally show a positive attitude toward learning with them. However, problems can occur when creating their own online learning environment (Lim & Newby, 2020). Here, group-working methods could be a promising tool. In online group work, we assumed that the group participants were unacquainted with each other before the group work and probably did not meet personally during the process. Through enhanced cohesion, group members built a stronger bond within the learning group. The resulting affiliation to the group—prompting a desire for continued group membership—could promote higher participation, crucial for positive development in virtual teams (Williams et al., 2006). Computer-based asynchronous programs cannot transmit gestures, non-verbal subtleties, or symbolic content (Montoya-Weiss et al., 2001). This limitation can make communication more challenging, as we are accustomed to communicating with these aids from our everyday life and may impact group-problem-solving efficiency within this study. Nevertheless, significant effects found in earlier studies could be due to the type and difficulty of the task used in the analyses (Bowers et al., 2000).

In regard to the multitude of individual and group-level variables affecting CSGBL-processes and the challenges in predefining independent static conditions, we propose a looser observation set-up (Strijbos et al., 2004). Students deem direct communication as the most informative, and less informative, text-based online work negatively affects communication and social interaction (Okdie et al., 2011; Straub, 1997). Students are found to be less likely to engage in collaborative learning, interactions, and discussions in online settings compared to traditional classroom settings (Dumford & Miller, 2018), which might have hindered interaction in our study.

This type of online work may have taken place in our study, disproving the assumption that social interaction will inevitably transpire with the provision of adequate technology. Technology encourages communication by offering more appropriate means to complete the task, but it does not guarantee the required social exchange (Kreijns et al., 2003). Technology knowledge positively correlates with technology acceptance. Addressing students' technology proficiency and acceptance is an important step for designing online courses and group work (Nami & Vaezi, 2018). In future studies, prior knowledge should be replaced by technology-knowledge. For online-group-learning in university settings, we need to distinguish the outcome: differentiating the development of knowledge and the learning process of individuals to gain knowledge (van Merriënboer & Kirschner, 2001). Using group constellations complicates the measurability of both outcomes. Several key questions arise, such as whether predefining independent static learning or instruction conditions are a feasible possibility in a grouping, and whether we can control all relevant conditions that affect group interaction and individual knowledge gain.

It is noteworthy, that while stimulating group collaboration and fostering communal learning, educational techniques may not have the ability to establish it all together. We observed that groups might not have been actively working together during the execution of the study. Creating a sociable CSGBL-atmosphere could be a possible solution to this issue. The solution could include generating an environment that allows for interpersonal, social, non-task-related exchange and provides external bonding opportunities. It also includes intensifying the number of task-related and non-task encounters, resulting in a more constant presence and awareness of the group members (Lin et al., 2010; Strijbos et al., 2004). For future projects using the Moodle-platform, we might use an Online-Course-Design-Checklist (OCDC) and integrate an analytics-framework for detecting students at risk of dropping out (Baldwin & Ching, 2019; Monllaó Olivé et al., 2020).

Strengths and limitations

The study's notable strength lies in its experimental design, facilitating the determination of causal relationships. It represents a well-designed field-study conducted in a real-world-environment and an authentic teaching-scenario that assumes the students' natural behavior. Consequently, the results boast higher external validity, enhancing their generalizability. Moreover, we can report a substantial initial sample size. The assessment of homework processing serves as an objective measure, free from dropout bias, as we also considered absentee records during data analysis. Evaluations, being a prerequisite for submitting homework, were processed more frequently by students compared to previous studies. Alternative frameworks, such as teaching students from home or extending enrollment periods, pose potential avenues for future exploration. An intriguing question emerges regarding whether a robust expression of conscientiousness and prior knowledge yields similar effects.

Regarding data analysis, a notable strength is the utilization of multi-level modeling, allowing the investigation of variance at both the individual and group levels. This approach contrasts with regular regression, which may encounter sampling problems and lack generalization. However, a limitation of the study is its reliance on a virtual groupwork setting, which was still uncommon at the time of the research. Given that the study predates the "corona pandemic," this virtual setting was unfamiliar to many prospective students. Additionally, voluntary participation in the course contributed to a high dropout rate, negatively impacting the entire group and disrupting the group-process. Despite the challenges posed by the unfamiliar situation, experimental studies in the online context have become indispensable today due to the pandemic, offering valuable guidance and design recommendations for institutions navigating the shift to online teaching.

The study's small group size and the simultaneous existence of a large number of groups may have led to a potentially artificially high Intraclass Correlation Coefficient (ICC) (Hox et al., 2017). However, this circumstance serves the purpose of group comparisons, aligning with the study's objectives and strengthening the power of the results. It also allowed for an increase in the number of groups, enhancing the sample size at the group level. This trade-off is a recurring challenge in university and educational research, where the number of available subjects is not limitless.

As previously mentioned, the overall low level of prior knowledge may have influenced the formation of experimental groups. On average, the algorithm had to form homogeneous groups from individuals with low prior knowledge to correctly generate heterogeneous groups, potentially explaining the absence of significant results and the presence of only interaction effects. Despite this limitation, it is reasonable to assume that prospective students opting for a prerequisite course to enhance their mathematical skills likely had lower prior knowledge, introducing a selection effect. Thus, this restriction is considered acceptable within the context of the study's setting.

In the context of groupwork occurring within a short timeframe and an unfamiliar online setting, it is plausible that the desired group dynamic did not have sufficient time to develop. While this limitation is inherent in studies on online group work, it remains an assumption that cannot be verified, but warrants consideration. Furthermore, the study's group-formation-aspect should be replicated over an extended period to reveal potential effects over time.

In addition to the limitations, most research findings are derived exclusively from self-report questionnaire formats. This poses a notable overall limitation to (virtual) group-work-research, emphasizing the need for more objective-dependent variables, such as quality and quantity of group discussions in forums, log-files, and videography. The study attempted to address this limitation by incorporating both objective and subjective outcome measures.

Implications

Considering the current situation caused by COVID-19, studies exploring didactically online learning settings for students and how they may actively foster participation and continued engagement are mandatory (Wildman et al., 2021). The rapidly advancing digitization of university education demands that students take the initiative and display conscientious self-organization to progress in their attainment of further knowledge. Group work has proven to be very beneficial in this regard. Even though most studies support the importance of personality traits, such as extraversion and conscientiousness, and cognitive aspects, such as prior knowledge, in education, the question of how to make use of them as criteria for group formation arises.

Taken together, the benefits offered by the group formation algorithm are highly relevant for universities, as it allows first-year students to form remote learning groups according to criteria relevant to them. Thus, a group formation tool, has the great potential to create social interaction and thereby a sense of belonging for students despite social distance. Such an algorithm can additionally be useful for various other settings. In addition to the benefits of the group-formation-tool for university, it also has potential benefits for the didactics of schools in the business contexts, as well as for the leisure sector and thus for private-group-design. The question remains open as to how other characteristics, or more precisely other personality traits, play out in this context. Other settings, such as groupwork that does not take place online or hybrid formats, should also be investigated regarding other group formation characteristics or those used here. What has already been clearly found in this study is that the outcomes of group work can be explained at the group level, and that group formation is thus an important and, moreover, economical means of choice to enable the success of group work. However, more research is needed on the characteristics used in group formation, the settings in which they lead to success, and the outcome variables on which they affect. Attention should be paid to personality traits, as group formation can lead to positive and negative outcomes depending on their structure, and we already have evidence that the traits studied here are quite relevant. Moreover, the algorithm used here can be successfully used in other follow-up-projects to study-group formation.

For future work, we plan to repeat the described experiment under different conditions, both in face-to-face courses and in virtual environments that promote CSCL. Another upcoming work is to experimentally manipulate additional student characteristics, e.g., other personality traits such as conscientiousness, as well as a replication of the present work, where we would use previous student grades as a criterion for group formation, rather than prior knowledge queried selectively, to look at the outcome that collaborative learning has for previously low-performing students. Other characteristics that could be influential are factors such as prior technical knowledge among students and their motivation regarding specific learning activities.

Conclusion

Our study introduced an experimental approach to group formation with promising criteria, thoroughly researched. Not only did the study demonstrate that the proposed experimental research method and the applied algorithm successfully achieved the goal of obtaining homogeneous and heterogeneous groups, but it also revealed that the interaction of characteristics, specifically heterogeneous extraversion, and homogeneous prior knowledge, positively influenced the development of activities within the collaborative learning context and served as an indicator of group stability. Furthermore, we explored the potentially crucial role of conscientiousness for online working groups. In this context, it is crucial to emphasize that the inclusion of specific student characteristics requires careful consideration, preferably guided by methodology-grounded psychological insights. This underscores the necessity of considering numerous variables of individual group members to optimize group fit for well-functioning groups.

Contrary to expectations, the study's hypotheses could not be supported. It appears that the high variance explanation at the group level is attributed to other group-level variables differing between groups, rather than the structuring of the experimental variables and group formation. Nevertheless, the impact of group-level factors compared to student-level factors is a noteworthy finding. This underscores the importance of investigating group formation criteria, because the results of group work can be influenced by the group formation processes. This is a significant outcome, highlighting that the mix of group member characteristics is more pivotal to the results than the characteristics of individual members alone.

Researchers and practitioners have diverse approaches to construct different grouping models, considering individual trait constellations and focusing on traits beyond those examined in this study. Our approach presents an opportunity for scientists to conduct future research on group formation, enriching the body of knowledge on online group formation. Such studies are crucial for educational institutions and other professional domains. Given factors such as spatially distributed and interdisciplinary group work, digitalization, increasing demands for mobility in the working world, and current considerations like social distancing and flexibilization, the workload is escalating while available time is diminishing, reflecting the transformative nature of the way we work.