Keywords

1 Introduction

Software development is extremely collaborative, especially in agile development [3, 10]. One of the important objectives of software engineering education is, therefore, to make students learn essential collaboration skills [20]. Teamwork in software engineering projects among higher education students is harder than anticipated [2]. Common challenges are a lack of dedication and involvement of one or more team members [17] coupled with communication challenges among the team members [11].

Designing courses that effectively teach agile software development and equip students with the skills to tackle real-world challenges remains a formidable task [8]. Team composition in large capstone courses presents instructors with a complex and time-consuming challenge, as they must balance a range of potentially conflicting factors, including the distribution of skills and availability among students, and the varying degrees of enthusiasm and ambition for the project. However, creating well-functioning agile development teams is crucial for fostering learning, encouraging student engagement, and achieving a successful project outcome [21]. It is essential to consider team composition, including gender diversity, in capstone software development projects to enhance learning and project outcomes. Different kinds of diversity can make teams underperform because the team members may not understand or talk to each other well, which can cause disagreement or dissatisfaction [14]. Recent research has found that when team members feel safe, they are more inclined to help each other and exchange their knowledge [15]. Lack of trust is one of the main barriers for autonomous agile teams [31].

Research on team composition in agile software engineering capstone courses has highlighted the challenges faced by instructors in creating well-balanced and effective teams [11, 27]. To address this, algorithmically supported systems like TEASE have been developed to assist instructors in team composition, resulting in better mean project priority and reduced time for team formation [11]. The importance of motivation, interpersonal relationships, and communication in team satisfaction has also been emphasized [11]. Furthermore, the use of Scrum in capstone courses has been found to positively impact students’ estimation and planning skills [28]. Finally, the effectiveness of the proposed team building criteria in enhancing team cohesion has been demonstrated [36].

In this paper, we describe our methodology for assembling development teams, which is based on a predetermined set of criteria derived from our extensive experience in conducting the course since 2019. Many developers have their first encounter with agile software development through courses conducting teamwork during their education, where this teamwork experience sets the tone for developers’ view of agile collaboration. Due to the low percentage of women in the sector, female developers frequently find themselves in the minority within their teams. In teamwork, social research found that minority team members encounter gender stereotypes, affecting their task assignment and performance [1, 18, 40]. Agile software development teams where women are the majority are currently understudied, leaving a knowledge gap regarding how different gender compositions affect teamwork. Amidst the extensive body of literature on gender, agile student teamwork, and team composition separately, there’s a lack of research exploring the affection of gender composition in student teams in software engineering courses. Motivated by this gap, this study aims to explore gender influence on the quality of teamwork by answering the following research question:

RQ1: “How can diverse student teams be composed in an agile software engineering capstone course, considering students’ preferences?"

RQ2: “What gender differences do we find in an agile software engineering capstone course?"

RQ3: “How does gender composition affect the student’s teamwork in agile student teams?"

Our case study was designed to explore gender dynamics in an agile software development capstone course. The course involved 40 teams, and data was collected through two surveys. Our study is based on Kanter’s theory [19] of gender diversity and the concept of teamwork quality (TWQ) as defined by Hoegl and Gemuenden [16], focusing on the quality of interactions within the team. Our measure of teamwork quality does not encompass the assessment of task process, task strategy, or the individual team members’ performance quality in carrying out task activities. Additionally, management-related activities like task planning, resource allocation, and management by objectives are not within the scope of this teamwork quality construct.

In the following sections, we will present the theoretical background used in this paper in Sect. 2 and the context and methodology of the case study in Sect. 3. Section 4 presents our results, structured by the research questions, and Sect. 5 discusses the results before concluding in Sect. 6.

2 Background

2.1 Diversity in Software Development Teams

Early social research stated that women prefer working in either gender-balanced or male-dominated environments over women-dominated environments [41]. However, recent studies in software engineering propose that male-dominated environments, and lack of female support, drive women away from the field [13, 29]. A recent estimate of the percentage of women among the world’s software developers is approximately 10% [7]. The underrepresentation of women in certain academic disciplines is accompanied by the risk of female students disengaging or dropping out, posing a significant challenge to educational institutions [13, 29].

Teamwork is one of the most critical soft skills in software engineering, often taught through project work in capstone courses where students collaborate in teams [4]. Curseu et al. [9] recommend forming diverse teams based on gender and nationality to enhance learning performance, as diverse groups tend to perform better. However, a study by Aeby et al. [1] involving third-year Bachelor and first-year Master students found significant gender differences in role preferences: females were more likely to engage in report writing, while males showed a stronger inclination towards technical tasks.

Løvold et al. [27] found that student-formed teams performed slightly better than instructor-formed, advantaging from already knowing each other and how they work together. However, mixing the two approaches, allowing students to submit their team preferences for the instructor to base team formation on, could give the students ownership and include social benefits from the self-assigned approach, while at the same time ensuring balanced teams regarding the project needs [33].

2.2 Teamwork Quality

As described in the Introduction, our use of the term Teamwork Quality (TWQ) is based on Hoegl and Gemuenden [16], and focuses solely on the quality of interactions within the team. The six subconstructs of communication, coordination, balance of member contribution, mutual support, effort, and cohesion cover performance-relevant measures of internal interaction in teams. Hoegl and Gemuenden used the TWQ instrument in studies of traditional software teams. In recent years, TWQ has been used in studies of agile teams [25, 26]. See Appendix B for the items used in the TWQ construct.

2.3 Kanter’s Tokenism Theory

In 1977, Kanter introduced the theory of Group proportions, also known as the Tokenism theory [19]. This theory argues that the dynamics of teamwork will be affected by the numerical proportion between the minority and the majority. Meaning that women in a minority position will be more exposed to stereotypes proportionate to the number of men in the team. The theory describes four types of gender compositions: uniform, balanced, tilted, and skewed. Specifically, Uniform are 100% homogenous gendered groups, Balanced groups have a ratio between 60:40 and 50:50, tilted groups of 65:35, and skewed groups have an 85:15 ratio. Within skewed groups, the minority is defined as “tokens" and the majority “dominants". They are called tokens as they are often made representative of their genders, following expectations and stereotypes from their gender into the person.

The increased visibility can result in tokens experiencing performance pressure and constraints in social behavior [23]. This might lead to token women assimilating the men dominants [22]. In tilted groups, women can form alliances and therefore influence the group’s culture, while balanced groups are more prone to group dynamics depending on the individuality of the group members.

3 Methodology

3.1 Context

We aimed to investigate team dynamics with a gender perspective through a single-case holistic study [35, 42]. The data was collected from January to June 2023 through two surveys distributed in an agile capstone course. The course is a compulsory fourth-semester course that provides 20 ECTs (equivalent to one-third of a full-time academic year’s workload). The course is mandatory for three study programs at the Department of Informatics at the University of Oslo. Regardless of study program, all students are expected to participate in all aspects of agile software development.

The students are divided into teams tasked with developing applications based on the Norwegian Meteorological Institute’s weather programming interface over a twelve-week period. Figure 1 illustrates the course’s timeline for 2023. In that year, there were 247 students enrolled in the course, and 34% of students were female. These students were organized into 40 teams.

Fig. 1.
figure 1

Timeline Showing Course Structure and Data Collection

3.2 Data Collection and Analysis

Pre-survey: The team assembly was based on a pre-survey (Appendix A), requiring students to sign up for the teamwork, either alone or with a pre-assembled team of up to 3 students. The survey was only filled out by one student per pre-assembled team, requiring prior agreement among team members regarding project ambition. The survey was handed out in February with a 10-day submission deadline. As gender was not collected in the survey, members’ gender was assigned by manually checking their registered gender in the student system.

Fig. 2.
figure 2

Our implementation of Kanter’s Tokenism Theory

Post-survey: In June 2023, a quantitative subjective survey was conducted to measure 1) the team members’ self-assed teamwork quality, success, and performance, and 2) students’ self-assessed perception of gender biases and gender compositions’ influence on teamwork. The survey consisted of 82 questions, where 61 of the questions measured teamwork quality. Seven Likert scale questions asked about their self-assessed experience with gender bias within their teams during the project work, and 1 open-ended qualitative question inviting students to reflect on their teams’ gender compositions, their effect on their project, and experiences with gender dynamics in teamwork in general. The full survey is available in Appendix B. The survey was answered by 201 respondents (response rate was 84%).

The survey was handed out to the students in between team project presentations, providing them with an environment exclusively related to their work regarding location, time, and headspace. The students were obligated to present at a given time, and dedicated time was allocated for conducting the survey, hopefully motivating them to answer it more thoroughly.

Students are asked to select their gender provided with the following options: Woman, Man, Other, or Do not wish to declare. We are interested in the student’s self-defined gender and have therefore employed the terms Man and Woman throughout the research as we believe them to be more inclusive than female and male, inviting students to identify regardless of biology [24]. Only one student chose Do not wish to declare, and was removed from the analysis due to data-relevance. Figure 2 displays this case study’s implementation of Kanter’s theory.

When analyzing the survey, the data was prepared for three main analyses: a) Teamwork quality aggregated by teams, b) Individuals’ experience with teamwork and c) qualitative teamwork experiences. Sets a) and b) represented quantitative data, analyzed with statistics in SPSS and Python. Set c) was analyzed using reflective thematic analyses [6]. Thematic analysis is a systematic process for identifying, organizing, and gaining insights into recurrent themes within a dataset. By pinpointing common discussions or written content, shared themes and patterns are fabricated through this method.

4 Results

Of the initial 247 students who signed up for the course, 240 students qualified for the project work after passing all individual compulsory assignments, making 40 teams each with 6 members. Team assembly took the first author over 60 working hours, manually assembling by following variables in order of priority: 1) requested peer students, 2) level of ambition, 4) abnormalities in working hours 3) multidisciplinary, and 4) gender. This section presents the result of the team composition regarding these variables.

The pre-survey received 134 responses, where 34 pre-assembled teams with three members and 35 teams with 2 members. One team was permitted to pre-assemble with 6 members, due to challenges in working hours. Team assembly began with prioritizing a balanced composition regarding the number of members in the pre-assembled teams. Meaning, ideally, a team would either compose two pre-assembled teams of three peers, four of two peers, or six single students.

Pre-assembled teams usually compose friends, and we were concerned this would lead to imbalanced team dynamics, especially when three strangers join three close friends. Despite these concerns, we managed to form 23 symmetric teams where the composition mirrored our balanced ideal and 17 asymmetric teams where such balance could not be achieved.

4.1 Team Compositions and Ambition Levels Based on Pre-survey

In our survey, we inquired about the students’ desired time commitment to the course, referred to as their ambition level (detailed in Appendix A). None of the surveyed students reported having ambition level 1 (“Wish to work much less than expected (<15 h/week)"). When comparing the reported ambition level among genders, we found that women reported a higher average ambition level, with a mean score of 3.91 (SD = 0.757). Men, on the other hand, demonstrated a slightly lower mean ambition level of 3.68 (SD = 0.778). The standard deviation among women’s scores indicates slightly more variation in comparison to men’s scores. These results suggest that, within this survey’s population, women have expressed a marginally higher ambition level than men.

We strived to compose teams with team members having approximately the same ambition level. Teams were distributed as follows: two teams at level two, 12 at level three, 20 at level four, and six at the maximum ambition level; level 5 (“Wish to work much more than expected (>35 h/week)"). Table 1 illustrates the distribution of teams based on their ambition levels and gender composition, with each column representing the gender compositions (W=women, M=men).

Table 1. Distribution of teams by ambition level and gender composition.

Gender distribution was the last consideration in the team assembly, only adjusted if feasible after accounting for ambition levels, work hours, and study program. Consequently, only two teams comprise token men and no team comprises women uniformly. Given the overrepresentation of men in the course, forming all-women teams solely for the research would disproportionately increase the number of uniform men teams. Table 1 gives an overview of the number of teams in each gender composition.

4.2 Agile Ways of Working and Gender

In the post-survey, as much as 91% reported that they used the practice of conducting stand-up meetings. With regard to satisfaction with stand-up meetings (rated on a scale from 1, very dissatisfied, to 5, very satisfied), only 4% said they were “very dissatisfied," and 5% said they were “slightly dissatisfied." 11.9% were neither happy nor unhappy. In other words, the majority appeared to appreciate the stand-up meetings; 35.3% said they were “slightly satisfied," and 34.8% said they were “very satisfied" with the practice, while 9% did not utilize stand-up meetings. Women reported higher satisfaction with stand-up meetings than men, see Table 2. As can also be seen in the table, women reported spending more hours on the project work. The students also rated the degree to whether they had been a Scrum Master/Team leader (on a scale from 1 - Has not been to 5 - To a large extent), and women reported to have been Scrum Masters to a greater extent. However, these differences are not statistically significant.

Regarding the use of collaboration and communication tools, 46% used Discord, 52 % used Facebook Messenger, 40% used Microsoft Teams, and 40% used Slack. Almost 70 % used Trello, which might be explained by the high usage of ScrumBan/Kanban. Among the other responses, the most commonly mentioned collaboration tools were GitHub, Figma, Google Drive, Confluence, Snapchat, ClickUp, Signal, Jira, Overleaf, and Notion.

Table 2. Stand-up meetings and Scrum Master analyzed by gender

4.3 Gender Differences in Primary Work Functions of Team Members

In the post-survey, students were asked what their primary work function in the team was, selecting either programming, testing, designing, documentation, architecture, or others. The survey received 201 responses, where 98 students reported programming as their main function, 45 reported designing, 30 reported documentation, 9 testing, 11 architecture, and 8 chose others. When we split these results by gender, women design and document more, while men program more. As illustrated in Fig. 3, 34% of women report designing as their primary work function, in comparison to only 16% of all men. This is despite equal amounts of women and men attending the design study program that year. Furthermore, while 36% of all women report programming as their main work function, 56% of all men report programming. Lastly, 19% of women report documentation, while 12% of men primarily document. In total, 89% of the women and 85% of the men either programmed, designed, or documented, the rest distributed between testing, architecture, and others.

Fig. 3.
figure 3

Primary work function in teamwork, distributed by gender

4.4 Impact of Gender Composition on Team Dynamics

Changes in Primary Work Functions: Women report on designing to a greater extent than men in all gender compositions, except for in teams with 5 women and 1 man. Teams with 2 women and 4 men represent the team composition where women design the most. 50% of women in those teams report on having design as their primary work function, while only 11% of men in those teams report on primarily designing. We found that women, when in the minority on tilted teams (Fig. 2), often experienced not being able to engage in programming as much as they would have liked. To the open-ended question,“Describe how you experienced the gender distribution in your team, and how you believe it affected the teamwork and the project", one woman in such a tilted team responded:

“There was a tendency for gatekeeping at a technical level by the boys towards the girls. Initially, tasks related to documentation and design were predominantly delegated to the girls, an issue that was addressed and managed somewhat late in the development process." Furthermore, we found that women were expected to design to a greater extent than men. As exemplified by a man as the majority in a tilted team sharing his thoughts about his team’s gender composition:

“There were two women; I thought they would show significant effort regarding design and delegation, but I ended up doing their job."

Regarding programming, teams with four or five women have the highest percentage (44%) of women reporting programming as their primary function. Conversely, 22% of the token women in teams with five men reported programming, 25% of the minority women in teams with four men, and 28% of the women in balanced teams. Additionally, the teams of 4 or 5 women exhibit the smallest difference in percentage between women and men primarily involved in programming. One token man in a team with 5 women, suggested that women-dominated teams benefit from a safe exploratory environment:

“I believe it had the most positive impact on teamwork because I collaborate well with girls and learned a lot about programming since working with "typical" IT guys has been quite unfamiliar to me. I could ask silly questions while pair programming and maintain a friendly and chatty tone while programming."

In uniform men teams, 60% of the men program, while only 6% design in these teams. Being the team composition with the largest proportion of members primarily programming and least primarily designing, suggests that men focus less on designing and more on programming.

Teamwork Quality: Table 3 shows each gender composition’s average teamwork quality scores. Teams with two women and four men have the lowest mean TWQ scores (3.60), followed by balanced teams (3,76). Teams with five women and one man score the highest (4.37), and teams with four women and two men score the second highest (4.27). It is worth noticing the low standard deviation among all compositions, indicating a high level of agreement among the respondents. However, the two lowest-scoring teams also have the highest standard deviation, suggesting a higher level of disagreement regarding their teamwork quality.

Gender Bias Experiences: Intrigued by the difference in TWQ scores, we performed a correlation analysis to examine the relationship between the number of women in a team and the outcomes of gender-bias post-survey questions. Responses from women and men within the same team were aggregated separately to capture each gender’s collective team perspective.

For men, the only significant correlation was with the item, “I am dissatisfied with the gender balance", which yielded a significant negative correlation (\(r = -0.553, p < 0.01\)). This suggests that their dissatisfaction with gender balance decreases as the number of women in the team increases:

[Man in a team with one woman and five men]:“I believe a more balanced gender distribution would have been beneficial for group dynamics."

[Man in a team with five women and one man]: “I experienced it as positive and found it to be enlightening to be in a group with only girls."

Table 3. TWQ outcomes by gender composition (TWQ factors: communication, coordination, mutual support, effort, cohesion, and balance of member contribution)

In contrast, women demonstrated a broader range of significant correlations. Aligning with the men, their dissatisfaction toward the teams’ gender balance significantly negatively correlated to the number of women in the team (\(r = -0.545, p < 0.01\)). Additionally, their feeling of being undervalued by team members was significantly negatively correlated (\(r = -0.393, p < 0.01\)), while the perceived taboo of discussing gender discrimination also showed a significant negative correlation (\(r = -0.401, p < 0.01\)). This suggests that as the number of women in a team increases, women tend to be more satisfied with the team’s gender balance, feel more valued, and are less likely to see gender discrimination topics as taboo. A woman in a team composed of two women and four men shared her experiences, highlighting potential bias in team interactions:

“Some in the group were very inclined to direct the coding questions exclusively to the male programmer, assuming he had the answers and seeking code reviews from him on GitHub. This could be a combination of me not speaking up loudly enough, possibly not appearing confident, and might be a subjective perception."

Furthermore, their concern about not contributing enough was inversely related to the number of women in the team (\(r = -0.241, p<0.05\)), and the TWQ mean score showed a positive correlation (\(r = 0.272, p<0.05\)). This indicates that women are less concerned about how their contributions are perceived, and the quality of teamwork improves with a higher representation of women in the team.

These findings indicate a skewed relationship between men’s and women’s self-reported affection to gender composition. The significant correlations for women across multiple aspects suggest that gender balance within teams could play a critical role in women’s team experience, whereas for men’s, it appears to be a less critical determinant.

5 Discussion

We have described our strategy for composing agile software engineering student teams, and have examined gender differences in agile teamwork, and the impact of gender composition on student teamwork by forming teams with varied gender compositions.

According to a recent study [5], men are typically more inclined to lead than women, and women are more willing to lead teams with a female majority. Our findings indicate that women on average, participate in the role of Scrum Master to a greater extent than the men. This is in line with Paasivaaras [32] recent study on the Scrum Master role in student teams, who found the Scrum Master role fit female students extremely well and suggests that marketing this role towards girls currently outside the computer science field could be a good idea.

We found that women design and document more than men in the project, while men program more than women. This is consistent with former research, suggesting differences in women’s and men’s primary work functions to be due to gender stereotypes [1]. However, in light of gender composition within the teams, as the proportion of women in the team increases, we observed a corresponding rise in women engaged in programming.

In general, we observed a positive impact on teamwork quality with an increasing number of women. Women-skewed (4 women/2 men) and female-dominated (5 women/1 man) teams exhibit the highest two TWQ scores, while male-skewed (2 women/4 men) and balanced teams (3 women/3 men) scored the lowest. Where according to Kanter’s theory we would expect the teams with token women to have the lowest teamwork quality [19], our results showed these teams to score higher than male-skewed teams, equal teams, and uniform male teams. However, former laboratory experiments, testing the group’s gender proportions affection on performance, found a deficiency in women’s performance proportionate to the increase of male team members, while men performed unaffected by gender composition [18, 40]. Our survey sample lacks significant representation of token women (response rate: 41%), leading to the men’s perception of the teamwork dominating these results. Furthermore, supporting the low teamwork quality scores in balanced teams, the recent study by Graßl et al. [12] found that student teams with greater gender diversity encounter challenges in motivation, productivity, respect, and collaboration.

As the number of women on the team increases, we found a decrease in women’s reported sense of being undervalued, a reduced taboo surrounding discussions of gender discrimination, and less fear of being perceived as inadequate contributors to team efforts. This gender difference aligns with earlier research, where in relationship to their team members, women felt less respected and appreciated compared to men [12].

The majority of the students implemented daily stand-up meetings and found it to be a valuable practice. Stand-up meetings are often seen as positive, as long as the frequency is adapted, and the meeting is not mainly about reporting progress [30, 39]. Recent research [15] has found that implementing social agile practices positively affects psychological safety in the teams. Also, most teams used two communication platforms (the most popular were Discord, Facebook Messenger, Microsoft Teams, and Slack). Slack has been identified as a significant tool that offers various advantages such as facilitating improved communication by reducing the barrier to seeking assistance and encouraging greater participation through the visibility of queries and topics under discussion [34, 38]. In student teams, Ross [34] found that Slack seems to improve the quality of group communications and perceived learning outcomes from group work.

5.1 Limitations

Constrained by the limited participation of women at 34%, only two teams were composed of a token man and five women. The small sample size of these teams, challenges drawing definitive conclusions regarding gender composition in women-dominated teams. However, we believe our results provide insight into an infrequently observed team composition within the context of agile software engineering student teams. Furthermore, our data sample of teamwork quality is based on students’ independent survey self-assessment, which may not always accurately reflect their team’s actual performance.

6 Conclusion

In this study, we investigated gender dynamics within agile software development teams in an educational setting. Our findings reveal that gender composition significantly impacts team collaboration and individual roles within these teams. We conducted two surveys. First, we investigated the ambition level of 240 students (82 women and 158 men) before starting agile project work. When comparing the reported ambition level among genders, we found that women reported a higher average ambition level than men. Next, after the project work, we analyzed a post-survey with 201 respondents. We asked the students to assess teamwork quality, success, performance, self-assessed perception of gender biases, and gender compositions’ influence on teamwork. We found that women were more satisfied with stand-up meetings than men. When analyzing team averages with team composition, we found that teams with two women and four men have the lowest score on teamwork quality.

For future research, it would be interesting to observe an all-female team in this context and compare their teamwork experiences to an all-men team. Finally, the course is currently in session, with an increased percentage of women participation at 39%. This facilitates further and deeper research on gender composition in agile student teams. We found, consistent with earlier research [32], that women show a preference for the role of Scrum Master. Given the rising utilization of agile coaches in the industry, and that leadership and project management skills are important traits to have in this role [37], it would be interesting to implement and investigate the role of agile coaches in software engineering capstone courses. Further research could explore whether the skills and inclinations that draw women to the Scrum Master role could similarly influence their participation as agile coaches, potentially enhancing team effectiveness and project outcomes in the IT sector.