1 Introduction

The practice of collaborative learning gathers students in groups to work and learn together, allowing them to develop and improve essential skills. The advantages of collaborative learning are visible in the literature since students gain “content knowledge and problem-solving skills” and “autonomy by transferring some of the responsibility for teaching and learning” to them (Freeman, 1995, p. 289). Workgroups are expected to “make learning meaningful through active learning” (Daba et al., 2017, p. 861).

This learning method has become an increased practice among universities. In academic settings, grades have to be attributed to the students according to their individual performance. However, traditionally, the students are recognized with the same mark for their contribution to a workgroup, which may not correspond to their actual individual performance (Babo et al., 2021; Babo et al., 2020a; Kolmos & Holgaard, 2007; Macdonald, 2003).

Therefore, self- and peer-assessment methods have been rising in popularity among academic institutions and companies. The process of doing self-assessment consists of a student evaluating themself. In contrast, peer assessment is to have that same student evaluating their colleagues, namely “between peers in projects and presentations” (Babo et al., 2021, p. 70). These methods provide a unique opportunity to the students by involving them in their own and their peers’ evaluation process. The lecturer might not have the same point of view as the students; therefore, this process is considered as “being valid, reliable, fair and as contributing to a growth in competence” (Babo et al., 2021; Dochy et al., 1999; Tan & Keat, 2005).

Along with technological advances, self and peer assessments have been done more frequently with software tools, as a more efficient way of performing it. Concerning software tools, the importance of User eXperience (UX) and usability is evident. Usability is defined in ISO 9241-11:1998 as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use”, whereas UX is the consideration of “the whole experience” “when evaluating a software tool” (Babo et al., 2021, p. 70).

Usability testing is performed to qualify the usability of tools, being “determined by a combination of its features, functionality, visual appeal, and usefulness” (Corrao et al., 2010, p. 120). Basically, Costabile et al. (2005) state that a tool has excellent usability when users are allowed to manipulate it efficiently and perform the tasks appropriately. Literature also mentions that the target audience of a software tool has a great impact on its usability assessment. It can depend on the age of the audience, for instance when senior individuals are the target of the tools, one may consider that they may have more difficulties in using them (Babo et al., 2021; Corrao et al., 2010; Costabile et al., 2005).

With the increase of collaborative activities at Porto Accounting and Business School (ISCAP), the lecturers decided to implement self- and peer-assessments, which, according to literature, are more efficient with assessment software tools. Therefore, the development of WebAVALIA, a framework for self and peer assessment, started to be performed (Babo et al., 2020b).

WebAVALIA is a tool where two main users exist: the student and the lecturer. The students use the tool to perform their self and peer assessments, while the lecturer is responsible for assigning students to groups and evaluating the final work. The lecturer can create editions that correspond to an assignment or project with different groups. After creating an edition, the evaluator sets the parameters concerning the number of assessment moments, the number of elements per group, and, in a final phase, the calculation of results based on scores provided by the students. Students are, in turn, those who, based on their individual voting scores, are responsible for the variation of individual marks. WebAVALIA was designed based on Design Science Research (DSR) methodology (Babo et al., 2020b).

All cycles of DSR were already explained in Babo et al. (2020b) and Babo et al. (2021, 2020a), by having the presentation of other existing assessment tools and the development of WebAVALIA with its formulation and explanation. Although iteration is the central aspect of the design cycle, WebAVALIA needs to be evaluated to collect feedback about future improvements. This feedback garnered from users allows to gather their opinions to be reflected in new future versions.

This paper aims to present the results concerning students’ perception of their use of WebAVALIA by answering the following research question: How do students evaluate their WebAVALIA experience? The students’ opinions were collected through the distribution of surveys using the Limesurvey application. This enquiring process has been implemented since 2013.

The following section presents the literature review with all relevant concepts to the study. Then, the methodology is explained, followed by the data analysis and results, and discussion sections. Finally, the future work and conclusion are provided.

2 Literature review

In order to discuss the design of a new assessment tool, some relevant concepts must be detailed. Therefore, this section will present a literature review on important topics concerning e-assessment of workgroups, namely collaborative learning, self and peer assessment, and technology-enhanced assessment.

2.1 Collaborative learning and assessment

Collaborative learning is the practice of grouping students to develop a work where they can interact and assist each other to achieve a final goal. The use of group activities can assist the students in developing and improving essential skills such as critical thinking skills, reasoning, autonomy, and knowledge. Lately, there has been an increase in the popularity of collaboration themes in research and the skills improvement through workgroups, which may suggest that there has been equal interest in finding better methods to perform these activities (Babo et al, 2020b; Chen & Kuo, 2019; Eshuis et al., 2019; Hakkarainen et al., 2013; Pai et al., 2015).

According to Vicente et al. (2018), collaborative learning has been more prevailing in education institutions since, and, by working together, the students can learn and develop their “individual contribution, self-learning, peer-learning, accountability, and communication skills” (Wen, 2017, p. 127). These practices can also benefit the students, with “achievement, motivation, and social skills”, among others (Chen & Kuo, 2019, p. 95; Daba et al., 2017; Wen, 2017).

There are also some discussion and possible disadvantages of group working and collaborative learning. Namely the lack of interest by learners to work in groups, “the insufficient background knowledge of content and lack of skills related to time management, evaluating, searching, and organizing content from various sources” (Babo et al., 2020b, p. 160). Students have also expressed, in literature, that there can be biased and unfairness in groups’ individual assessment. This can happen because some members are less committed than others, which leads to different contribution rates, but not always compatible assessments (Daba et al., 2017).

According to Clark et al. (2005), the task of “performing a fair and accurate assessment of individual student contributions to the work produced by a team as well as assessing teamwork itself presents numerous challenges”. Luaces et al. (2018), also affirm that the assessment of workgroups can be subjective and usually, the lecturers would attribute the same mark to the whole group. Each member has a personal performance rate, and by assigning the same mark to everyone, chances are that it will not mirror the actual performance and contribution of the individual. This practice is unfair, since most of the time it does not correspond to the actual contribution and performance of each individual, thus there is not a distinction between the members nor their actual performance. (Babo et al., 2021, 2020b; Clark et al., 2005; Hall & Buzwell, 2013; Luaces et al., 2018).

Therefore, some lecturers opt to use self-and peer-assessment practices. Self-assessment occurs when the students evaluate themselves on their own performance and contribution. On the other hand, peer assessment is when the students evaluate their peers. Self and peer assessment practices can benefit the students by developing their “collaboration, communication, conceptual understanding and problem-solving skills” (Alias et al., 2015; Borg & Edmett, 2019; Li, 2017; Reinholz, 2016, p. 312; Topping, 2009).

This form of assessment allows the students to judge themselves and their peers concerning their performance and work development. Consequently, it also gives the lecturer an idea of the students’ opinions and assists them in marking. The belief that the students should assess themselves and their peers is supported by Bushell (2006), who states that the team members are better positioned to determine each other’s performance and contribution (Bushell, 2006; Tan & Keat, 2005).

Self and peer assessment are used in different contexts and can be advantageous to their users. Nevertheless, these practices also increase the lecturer’s administrative tasks. To alleviate the workload, lecturers can undertake e-assessment methods, which carry some advantages. Among the advantages are “time saving, immediate feedback, better use of resources, assessment records saving and more convenience” (Sorensen, 2013, p. 173), “greater flexibility,” “impartiality,” and “reliability with machine marking” (Koneru, 2017, p. 130; Luxton-Reilly, 2009).

Additionally, due to the pandemic outbreak, most educational institutions have endeavoured in online strategies to contain its spread and mitigate the impacts on students’ education. However, this new situation has changed the way the education system works. Considering that “social interaction is under control” and the students’ instruction became “more individualized”, the implementation of “lesson-oriented group work, discussion and cooperative learning” is complex (Kufi et al., 2020, p. 12; WHO, 2020).

Despite students’ distance, one way to continue these practices is by using technology to support the learning process. Online learning can incorporate collaborative approaches using software or applications that allow the students to continue their workgroup assignments. Also, the recent development of technology can assist in making the assessment more efficient and faster (Kufi et al., 2020; Verawardina et al., 2020).

2.2 Technology-enhanced assessment

Technology-enhanced assessment (TEA) means using technology to administer assessment practices, which can entail tools, processes, or methodologies that are capable of managing e-assessment. E-assessment tools allow for an easy way to offer adaptive, innovative, and significant assessments. These technologies support assessment activities and feedback by saving answers and automatic marking (Abelló Gamazo et al., 2016; Gray & Roads, 2016; Hettiarachchi et al., 2014; Huertas & Mor, 2014).

The number of assessment tools used in the academic context has been increasing, as suggested by Nikou and Economides (2018), who expressed that the assessment practices need to be redesigned to reflect the students’ knowledge. However, each tool has its own goals and purposes, such as tools to perform self and/or peer assessments. These also can vary from having the students review their peers’ assignments to answering questionnaires about the work developed. E-assessment tools can support assessment management and provide different and quality experiences in the assignments’ reviews (Gray & Roads, 2016; Luxton-Reilly, 2009; Nikou & Economides, 2018).

The use of self and peer assessment tools can support the lecturers in achieving accurate perceptions of the students’ performance. Considering that each group member spends much time working with their peers, they have a unique viewpoint on each other’s performance and contribution. Consequently, they can provide fairer and truer assessments. However, it is essential to comprehend how efficient these tools are in assessment practices (JISC, 2007; Luxton-Reilly, 2009; Tan & Keat, 2005).

2.3 E-assessment software usability and parameters

Each e-assessment tool has its own goals and purposes, allowing for different assessment activities. At the same time, there are many options with different advantages. Thus, the users must be aware of those to select the appropriate one for their needs and context (Abelló Gamazo et al., 2016).

Therefore, usability can play a role in choosing an aforementioned assessment tool. According to literature review, some usability characteristics are efficiency, learnability, satisfaction, “ease of learning”, flexibility, efficiency, and effectiveness (Đorđević, 2017, p. 522). Accordingly, the usability of a software tool can be obtained by repeating usability testing. This can be done with different methods that can be based on comparing users’ perceptions when performing tasks on the tool (Đorđević, 2017; Gu et al., 2011; Haaksma et al., 2018; Hussain & Mkpojiogu, 2016; Llerena et al., 2019; Luxton-Reilly, 2009).

Previous research (Babo et al., 2021) attempted to answer the lack of literature and knowledge about the existing software tools that could perform self and/or peer assessment by comparing and evaluating seven different tools:

  • WebPA (Loddington et al., 2009; WebPA, n.d.);

  • Peergrade (Peergrade - Engaging Student Peer Review, n.d.);

  • iPeer (IPeer | Teaching with Technology, n.d.);

  • TeamMates (TEAMMATES - Online Peer Feedback/Evaluation System for Student Team Projects, n.d.);

  • Peermark (PeerMark™ - Guides.Turnitin.Com, n.d.);

  • InteDashboard (InteDashboard™ - Empowering TBL with Technology | Peer Evaluation, n.d.);

  • Workshop Module (Workshop Activity - MoodleDocs, n.d.).

This study considered several characteristics to analyze these tools, such as the quickness to perform the steps in the tool, and a method to restrict the scores provided by the users, among others. Table 1 presents all parameters considered in their study (Babo et al., 2021; Babo et al., 2020b; Block, 1978; K. Sorensen & Jorgensen, 2019).

Table 1 Parameters considered to review assessment tools.

Babo et al. (2021) study analyzed the aforementioned tools and concluded that the most important parameters to implement in self and peer assessment software tools are the quickness to perform tasks, the restriction of scales, and the weighting of scores. These characteristics allow the tasks to not be time-consuming, reduce errors in scores attribution, prevent biased assessments, provide more accurate data, and adjust the evaluations according to the students’ performance. Therefore, it concluded that “there is the demand for a freeware software tool supporting self and peer assessment, allowing its use to all the academic community” (Babo et al., 2021, p. 79).

3 Methodology

This study is part of a broader research based on DSR methodology (Babo et al., 2020b). All cycles of DSR were already explained in. (Babo et al. (2020b) and Babo et al. (2021, 2020a), by having the presentation of other existing assessment tools and the development of WebAVALIA with its formulation and explanation. This paper focuses on the evaluation aspect of the Design Cycle of the DSR process.

This cycle, that comprises the “build-and-evaluate loop”, has iteration as its central aspect (Hevner et al., 2004, p. 78), since the feedback from the evaluation helps identify weaknesses and thus the development of alternatives leads to more evaluation phases. The evaluation provides important feedback, enabling a better understanding of the users’ perceptions and opinions, which can be reflected in new future versions.

In this section, data concerning the students’ evaluations on the use of WebAVALIA are presented. The collection of data was done with surveys performed in the LimeSurvey application (LimeSurvey: The Online Survey Tool - Open Source Surveys, n.d.). These surveys were composed of a students’ characterization page, five-point Likert scales, and open-answer questions where students were asked about the advantages and disadvantages of using this tool, and comments and suggestions. The Likert scales’ statements had an assortment of points helpful to understanding the students’ opinions and how they felt when using WebAVALIA, gathering its advantages and disadvantages, main perceived characteristics, and usability. Some questions in the survey were not mandatory, thus the respondents could opt to not respond to those, which may result in differences in the considered answers for each question.

The survey was responded anonymously by students who had used WebAVALIA at the end of their course project. The data have started to be collected from the academic year of 2013/2014. For this study, survey responses of all the further academic years were considered until 2019/2020, except for 2015/2016, since surveys were not distributed that academic year. The analysis is done by characterizing the students for each year by their age and whether they have used any evaluation tool before. The students’ quantitative responses are then displayed, followed by their advantages and disadvantages of using WebAVALIA. Finally, an overall analysis of the years is done to verify the evolution of the software tool.

The number of evaluated students by course and by each academic year is indicated in Table 2. The courses are as follows: “Management Information Systems” (SIG); “Information Technologies” (TI); “Informatics 2” (INF2); “Information Systems 1” (SI1); “Information Systems 2” (SI2); “Introduction to Medicine” (IM); “Decision-Making Support Systems” (SAD); and “External Trade” (CE).

Table 2 Number of students per academic year and course who answered the WebAVALIA survey

As it is possible to observe, the number of students who answered the survey is considerable (414 respondents), especially in the last two years. The differences between sample sizes are mainly due to the variation in the number of students attending the courses over the years and the tool’s adoption by the lecturers of the courses.

The set of evaluated students has a size and an average age by academic year, according to Table 3. The students were also asked if they had ever used an evaluation tool. This question was optional, but most respondents affirmed to never have used an evaluation tool before or only have used it previously in the same program degree. This analysis is important to infer whether self/peer assessment is or not a practice among workgroups. It is also important to note that throughout the years the tool has been adopted among other lectures, so the probability that the students have used the tool in the past (in earlier years) of the degree is high, therefore, in the academic year of 2019/2020, this question was eliminated, and thus it does not appear in Table 3.

Table 3 Students who answered the question: “Have you ever used an evaluation tool before?”

Table 4 presents the statements regarding WebAVALIA. From these statements, it is expected to understand how the students feel about the tool, for example, if they feel that some assessment aspects of the tool are fair or unfair.

Table 4 List of statements of the survey

The Q4/Q18 pair of statements have a contradictory meaning where the answers are expected to be the same. Adding these pairs of statements makes it possible to prevent unfeasible answers in the data analysis. For instance, when a student provides the same maximum value in both these statements, this means that he/she agrees with having the students who worked less receive a worse grade (Q4), but he/she also agrees with the opposite (Q18), which does not make sense. In these cases, these students’ answers are removed.

It is also important to note that the survey has been changed. In the first two years (2013/2014 and 2014/2015), the Q6 statement was Q20, the Q17 statement was Q21, and both Q18 and Q19 were not considered in those surveys. From 2016/2017 on, the Q20 and Q21 of Table 4 ceased to exist and were not considered.

4 Data analysis and results

Four major categories, summarized in Table 5, are considered for the data analysis related to the present study. The categories are “fairness”, “simplicity”, “assessment”, and “productivity”. A different category, “others”, is also added. Despite the statements being the same as in (Babo et al., 2020b), where the categorization is already established, the authors have decided to change the distribution of the statements throughout different categories. It has been carried out to fit the present study’s purpose better.

Table 5 Categorization of the statements of the survey

Only two statements have been changed from their original category, namely Q4 and Q11. Q4 has initially been in the “fairness” category and is now moved to the “assessment” category. The referred statement is similar to Q18 since both concern students who contribute less to the project. The “assessment” and “fairness” categories are identical. However, the “assessment” category is aimed to have statements whose positive results can be obtained by changing the mathematical model used in WebAVALIA. On the other hand, “Fairness” is aimed to carry statements related to the evaluation of WebAVALIA in terms of being a useful tool to fit fairness purposes.

The second statement, Q11, was initially, in a first version of the survey, in the “productivity” category and was now moved to the “assessment” category. This statement states that less or more frequent use of WebAVALIA during a workgroup project may influence the students’ productivity, in case feedback of the workgroup’s voting is provided to the students. With Q11, the student’s perception of the correlation between the evaluation frequency and students’ productivity can be taken. The results of the surveys have allowed the redesign of the tool’s algorithms for the final mark calculation. Since Q11 relates to the number of times the students have to vote, it has an influence on their performance and provides an input for future algorithms.

The “assessment” category is divided into three subcategories. These subcategories are in regard to “general assessment” questions, which are relative to all the students, and the “less contribution” and “more contribution” questions focus specifically on the students’ final marks. In the third column, some statements are negated, meaning that the Likert scale, that is detailed in the next subsection, is inverted. For example, if the average of Q4 is 5.0, that means that the opposite statement would receive the opposite answer, i.e., 1.0. It helps the visual comprehension and the general opinion of WebAVALIA.

Table 6 assists in providing a statistical context to the number of answers considered by each academic year. Throughout the years, the number of respondents vary, as well as the number of respondents who answered the non-mandatory questions. Consequently, the number of considered students is different from the number of students who answered the survey.

Table 6 Number of the considered students in the statistical treatment by academic year

4.1 Quantitative analysis

Tables 7, 8, 9, 10, 11, 12 and 13 present the quantitative results for each statement and academic year by category. The possible answer is from a five-point Likert scale with an integer number varying from 1 to 5. An average and the standard deviation of the considered students’ responses are displayed for each statement and year. Moreover, the percentage of students’ responses above the average is also shown in the last row for each statement. It is indicative of the asymmetry of the distribution.

Table 7 Answers to the set of statements about “Fairness, Honesty and Students’ Approval” by statement and academic year

Table 7 is representative of Q1, Q2, and Q19 answers along the academic years. Then, Fig. 1 displays these results over the few years of study. It is possible to perceive an increase in the students’ agreement about the fairness and honesty of WebAVALIA, considering that throughout the years, students have been giving higher scores in Q1 and Q2. However, it can be observed that the Q19 statement has varied the most over the years, meaning the students have increased their opinion about tools like WebAVALIA being indispensable for a fair assessment of workgroups.

Fig. 1
figure 1

Evolution of the students’ answers to the set of statements about “Fairness, Honesty and Students’ Approval”

Table 8; Fig. 2 present the results for Q6, Q9, Q13, Q15, and Q16, in each academic year. From these results, it is possible to perceive that the simplicity category generally has the same evolution as the fairness category. The students have also been asked about the intuition and easiness of the tool’s usage. It can be visualized that it has been the statement most agreed. It is perceived that in 2014/2015 the students disagreed that WebAVALIA was an intuitive and easy-to-use tool, but two years later their opinion changed. Likewise, over the years, the students have agreed on the development of a mobile app and that the simplicity of the tool allowed them to assess in an easier and more accurate way.

Table 8 Answers to the set of statements about “Simplicity of the Tool” by statement and academic year
Fig. 2
figure 2

Evolution of the students’ answers to the set of statements about “Simplicity of the Tool”

When students were asked about the increase of the “productivity” when using WebAVALIA, the answers were divided, as can be seen in Table 9; Fig. 3. The students’ opinions regarding “productivity” items remained almost the same over the years.

Table 9 Answers to the set of statements about “Productivity” from the students by statement and academic year
Fig. 3
figure 3

Evolution of the students’ answers to the set of statements about “Productivity”

The assessment of the final marks of WebAVALIA were also a topic of evaluation. When the students were asked if they believed that the marks given by WebAVALIA were, in general, well distributed, their opinions were mostly neutral along the years, as can be seen in Table 10; Fig. 4.

Table 10 Answers to the set of statements about “Assessment: general questions” from the students by statement and academic year
Fig. 4
figure 4

Evolution of the students’ answers to the set of “Assessment: general questions”

Concerning the students’ beliefs on whether the marks of students who less contributed were well distributed, a slight increase can be perceived on both statements along the years (Table 11; Fig. 5). Students feel that the system penalizes too much the students who contributed less to the project, but they also believe that these students achieve a better grade than what they deserve. However, this last opinion has oscillated.

Table 11 Answers to the set of statements about “Assessment: less contribution” from the students by statement and academic year
Fig. 5
figure 5

Evolution of the students’ answers to the set of “Assessment: less contribution” questions

Likewise, the students agree about the idea of giving a higher mark than the project to those students with the best voting mark of the assessment, as observed in Table 12; Fig. 6.

Table 12 Answers to the set of statements about “Assessment: more contribution” from the students by statement and academic year
Fig. 6
figure 6

Evolution of the students’ answers to the set of “Assessment: more contribution” statements

Table 13 compiles the students’ answers regarding the remaining statements of the survey, mostly about the course and workgroup assessment. It is possible to observe that the students, over the years, feel that a tool such as WebAVALIA should be adopted in all the courses. When enquired on if the course assessment should only be done through a project, and if the project should be excluded from the assessment, the students’ opinions were mostly neutral.

Table 13 Answers to the remain statements from the students by statement and academic year

Summing up, according to the students’ point of view, WebAVALIA is a fair, honest and, overall, a simple tool. However, the accuracy and distribution of the marks are themes of improvement in the future. A radar plot summarizing WebAVALIA characteristics is presented in Fig. 7, for the academic years 2013/2014 and 2019/2020.

Fig. 7
figure 7

Characterization of WebAVALIA according to the students

The average of the last 4 years of answers to the statements displayed in Tables 7, 8, 9, 10, 11, 12 and 13, by each one of the categories, is considered for the plot representation in Fig. 8. For the simplicity aspect of WebAVALIA, only Q9 and Q13 answers were considered, since the other three statements intend to inquire on whether some features would be useful in WebAVALIA. The existence of significant differences through pairs of academic years is verified using the Kruskal-Wallis test. Those significant differences (using p < 0.05) are identified by statements applying the Mann-Whitney U test and listed in Table 14.

Table 14 Krustal-Wallis test results along the academic years

From Table 14, it is possible to observe that significant differences on a considerable number of statements of the survey, mainly between the academic years of 2017/2018 and 2019/2020, for example Q4 from the “Assessment” category is the statement in which responses have significantly changed along the years. In the first two years, there are no substantial changes in the responses of any statements. Also, the category “Simplicity” is the most stable in terms of response consistency over the years and “Assessment” related statements have the most unstable answers, as shown in Fig. 8.

Fig. 8
figure 8

Significant differences of the Krustal-Wallis test by categories

Moreover, students were also asked about their individual and group expected marks when compared to the marks they have obtained at the end of the project (Table 15). This question is useful for the analysis of cases and response trends on the survey.

Table 15 Students’ answers about their individual marks

Observing these results, it is possible to infer that almost all the students were expected to obtain an equal or higher individual mark and only few of them actually obtained a higher mark than expected. The students also had these expectations for their group marks, excepting the academic year of 2016/2017.

4.2 Qualitative analysis

Students are also asked about some advantages and disadvantages of using WebAVALIA. The content analysis of the open-answer questions was carried out. The advantages are listed in Table 16, and the disadvantages are in Table 17.

Table 16 List of main advantages of WebAVALIA in the students’ point of view
Table 17 List of main disadvantages of WebAVALIA from the students’ point of view

Table 16 lists the advantages, which are: fairness (61), measurement of performance (40), simplicity, comprehension and ease to use (33), anonymity (29), quickness (26), performance differentiation (25), no advantages (18), veracity/sincerity (14), motivation (11), privacy (9), efficiency (9), accessibility (9), other than teachers perspective (8), easier to the teacher (6), assessment relevance (4), ecologic reasons (3), accuracy (3), constant evaluation (3), personal development (2), organization (2), and innovative (2).

From these answers, most students consider “fair” as the main characteristic of the assessment tool. They also list characteristics such as “simple, comprehensive and easy to use”. It is also mentioned that the tool promotes “anonymity, quickness, performance differentiation, veracity, sincerity and motivation”. The assessment tool is an important way for the teachers to know who deserves the grade of the project.

Likewise, Table 17 lists the disadvantages given by the students, which are as follows: no disadvantages (34), incorrect evaluation due to human error leading to non-correspondence to the reality (24), voting manipulation for collective purposes (21), lack of WebAVALIA ability to evaluate equally – technical bug (19), unfair (17), lack of honesty or veracity (15), incorrect evaluation due to WebAVALIA (12), vote manipulation for individual purposes (10), may lead to anonymity due to the change of opinions between members (8), lack of preparation for evaluation (5), lack of confidence (5), internet connection dependency (5), discrimination (4), vote too much restricted (4), conflicts (4), not intuitive (4), unreasonable injury (3), purposeful injury (3), inefficiency (3), does not explain the used algorithm (2), does not agree with assessment in a general point of view (2), the teacher does not know what happens on self and peer assessment (2), anonymity (2), incorrect weight of this kind of assessment (2), it does not allow validation of evaluation (2), general bugs (2), information lost (1), the best element is not well distinguished (1), few time (1), unavailable for voting redo (1), lack of cybersecurity due to passwords at the beginning (1), not intuitive at the beginning (1), jeopardize students’ final evaluation (1), out-of-class environment (1), constant evaluation (1), only quantitative evaluation (1), takes much time (1), assessment is mandatory (1), students have different working levels (1).

From Table 17, it is possible to see that when compared to the list of advantages (Table 16), there are fewer disadvantages, however, in a wider variety. The most frequently pointed out are the students’ inexperience in voting, vote manipulation, and the software tool’s incapacity to evaluate all the members equally. Moreover, students mention as disadvantages the anonymity provided by the tool, however this can also be considered an advantage. From another point of view, other students point out that there is an indirect inexistence of anonymity. A previous exchange of opinions between the group may happen before the official voting on the software tool.

5 Discussion

The previous section presented the results of the students’ experience with WebAVALIA. These results allow us to understand the students’ opinions on the self and peer assessment tool during the 8-year study. In former studies of WebAVALIA, the tool has been classified as fair, a more evident characteristic expressed by the students’ opinions throughout the years. Over the years, the algorithm changed and criteria have been added to obtain fairer results according to the group members’ opinions, which may contribute to better assessments with greater accuracy of students’ marks. ‘Word of mouth’ may also be a factor in the results since the students’ perception of fairness can be justified by them talking with each other.

Figure 7 plots a visual summary of the students’ assessment of WebAVALIA regarding its fairness, simplicity, sense of improved productivity, and assessment. The overall opinion related to “Productivity” remained constant over the years in opposition to the different categories, namely “fairness”, “simplicity”, and “assessment”. According to Fig. 7, it is possible to perceive that in 2013/2014, the “productivity” category was the most appreciated by the students. “Fairness” is the category of WebAVALIA where students mostly agreed in 2019/2020. “Fairness” and “simplicity” were the categories that have a constant increase in agreement.

This study allowed us to gather a more broad understanding of the students’ perceptions about the tool. It is possible to perceive an increase in the students’ agreement about the fairness and honesty of WebAVALIA, considering that throughout the years, students have been giving higher scores in Q1 and Q2. It can also be observed that the Q19 statement has varied the most over the years, meaning the students have increased their opinion about tools like WebAVALIA being indispensable for a fair assessment of workgroups.

Regarding the simplicity of the tool, the students have agreed with Q6, meaning that they believe that if the tool were more complex, it would be more beneficial. However, they also agree that the simplicity of the tools allows for an easier and more accurate determination of each group member mark (Q9). The agreement with Q6 suggests that the students find the tool simple, nonetheless they do not deny the hypothesis to have more assessment parameters.

The students have also been asked about the intuition and easiness of the tool’s usage. Throughout the year, the Q13 statement has been increasing. This statement relates to the intuitive and ease of use of the tool, and its high mean indicates that the students agreed with the good usability of the tool. It is also important to highlight that Q13 is the statement with the highest mean (4.23 in 2019/2020).

The opinion on productivity did not change much throughout the years, having it remained favorable (above 2.8). The students do not feel that the assessment should be done with a different frequency than what is done at the moment (Q3). Furthermore, they are favorable to have access to midterm scores to increase productivity (Q8). This may indicate that the students are content with how the assessment through WebAVALIA is performed and that the way that the assessment is performed currently is what they perceive as correct.

Regarding the assessment category, over the years, the students have started to recognise that the current frequency of the tool or its increase is ideal to provide more accurate evaluations (Q11). In agreement, they consider that the use of the tool allows for a concrete and correct way to assign a score to the work developed by each element of the workgroup (Q14). Also, the students’ opinion on Q4 and Q18 has been decreasing over the years. It means that they feel that the member who contributes less achieves a mark corresponding to their efforts (Q4). At the same time, they also have been inclined to feel that these students are not excessively penalized (Q18). This may mean that they are of the opinion that the assessment has been well performed and it is fair.

However, regarding the assessment for the students who most contribute to the work development (Q5 e Q12), there is not a clear opinion. On one hand, it is observed a will to attribute a higher score than the one assigned to the project to the student who most contributes. On the other hand, the students feel that the student who most contributes should have a mark equal to that assigned to the project. These not well defined opinions serve a good point for discussion and it would be interesting to understand which grade the students find to be the best grade for the student who most contributes to the project.

This is more evident in the results expressed in Table 15. In the first year of the study, most of the students expected a higher individual or group mark than the mark they had received. However, the students’ opinion that they obtained a lower mark than they expected started to decrease throughout the years, and, in turn, the expectations started to be coincident with the actual mark. It is essential also to mention that the tool differentiates the students’ marks in a workgroup. Therefore, these results can reflect the perceived fairness of the tool.

The increased agreement with Q17 means that the students understand the usefulness of the tool, appreciate the tool, and feel that it could be of use in other courses in order to have fair assessments in workgroups. There was also good support from the students for the development of a mobile app (Q15).

The student’s feedback in relation to the advantages and disadvantages of WebAVALIA has been presented. The quantitative results confirm that the students perceive the tool as fair, straightforward, and that it correctly measures their performance. Despite the students having reported fewer disadvantages, it is also important to regard those to better understand the tool and figure out ways to improve it.

The comment most mentioned by the students was “no disadvantages”, which can be regarded more as an advantage than a disadvantage since it may mean students generally like the tool. The first concrete disadvantage pointed out by the students was human error. This disadvantage can have several explanations, namely the students’ inexperience in performing self and peer evaluations, the students understating their collaboration with the group or altering the understanding of their input to the final work, which may result in an improper voting thus jeopardizing the group members’ marks. In case that the members of the group have any thought that voting manipulation is happening, they can report this to the lecturer. However, the evaluators usually assume that the students take conscience on their assessment, as reported in the literature (Freeman, 1995).

Voting manipulation is another disadvantage listed in Table 17, also reported in the literature (Loddington et al., 2009). Voting manipulation for individual purposes separates the collective goals. In the first case, the student can spoil the assessment system, purposefully giving themselves the maximum score and the minimum score possible to the others in the group, benefiting only themselves. Despite not being thoroughly investigated in this study, one can hypothesize some reasons for it. The inadequate groups formation at the beginning of the work development is one of them. Having students in groups where the other group members may not well accept them is also a possible reason. Also, the rearrangement of the groups in the middle of the collaborative work can occur due to a student dropout.

A primary limitation of WebAVALIA is the impossibility of giving the same assessment grade to all the group students when the resulting value of 20/N is not an integer. When the number of members in a group is odd, it is to say that.

From the results, it is also possible to perceive that voting as anonymous is both an advantage and a disadvantage, whereby most students prefer to have anonymity. The students who take this feature as a disadvantage may fear that they can be penalized by a workgroup member they cannot identify.

6 Conclusions

Collaborative learning has been gaining more popularity in academic settings. These practices gather students in groups to work and learn together to achieve a common goal, allowing the students to develop important skills. The advantages of the collaborative environment learning are well known, however, the resulting grades attributed to the students do not always correspond to their real individual performance. Usually, all students in a workgroup receive the same mark for their contribution to the work developed whereas they should be distinguished by their performance, mainly in the cases where the efforts are inequal.

The lack of recognition for the individual contribution to the workgroup can sometimes develop a negative perception of collaborative work among students. They often feel exploited due to the biased evaluation results. However, if there is a distinction between the group members, according to each one’s performance, it may give each member a sense of fairness and reward, leading to an openness/willingness towards collaborative work.

From this, the need of a technological tool that offers a way to easily perform self and peer assessments and return fair and unbiased individual results arose. WebAVALIA was developed to assess individuals that are working in groups, and guarantee that they are assessed according to their performance and contribution to the work developed. To that end, WebAVALIA uses the workgroup members’ opinions about their own and their peers’ performance and contribution to the work’s development, to achieve individual results in the project grade. By allowing a distinction between individuals, WebAVALIA can return to the students the feeling that the assessment was fair and that their efforts have been adequately rewarded. It can also assist the lectures in understanding the students’ level of agreement on their own and their peers’ contributions. Thus, providing them an environment where it is possible to deliver a distinction between group members resulting in a fair assessment.

This study intended to understand whether the students working in groups felt that the individual assessment was fair and if their efforts have been adequately rewarded with the use of the self and peer e-assessment tool, WebAVALIA. As well as understand if the outcomes of these assessments show the real performance and contribution of each individual. To that end, there was the distribution of surveys to the students who have used the tool since 2013. The results presented in this study reflect the overall qualitative and quantitative opinion of the students over eight years of the tool’s usage, allowing to answer the research question: How do students evaluate their WebAVALIA experience?

The main goal of WebAVALIA is to provide accurate individual assessments to distinguish the workgroup members and correctly recognize the members who most contributed to the work developed. Accordingly, the tool needs to have a concrete and correct way to assign a score by each element of the workgroup, a correct set of criteria to be defined by the evaluator, and an algorithm to process and output a fair result.

The main conclusions are that the students acknowledge that the tool differentiates the marks in a workgroup by stating that the individual mark obtained was equal to what they expected. Regarding the frequency and the way that the assessment is performed with WebAVALIA, the students agree that the current frequency or even its increase is ideal to correctly provide individual marks and the assessment has been well performed. Therefore, the students perceive the tool as fair, straightforward, and that it correctly measures their performance. The algorithm considers that to the most contributed student is attributed the same grade as the one assigned to the project by the lecturer. However, the students are divided in relation to the mark that the student who most contributed to the work developed should have, either an equal or a higher mark to the one assigned to the group project.

The perception of fairness and assessment accuracy of WebAVALIA has changed through the years, and in 2020 the users considered it a fair and simple tool with a positive influence on the groups’ productivity. The results have shown a clear variation in the opinions over the years, resulting from the constant improvement of the tool algorithm and usability, using the constant observation and feedback of the students. There was a slight increase of opinion about the simplicity of WebAVALIA in the last two years of the study with the increase of the advantages pointed out, led to a favourable opinion about the tool.

Advantages such as quickness and anonymity in the voting process, fairness and performance differentiation in the marks, and the tool’s easiness in have been identified by the students. One of the disadvantages expressed was human error, either by the students’ inexperience with self and peer assessments or improper voting by not having a real understanding of their contribute. This problem can be overtaken by familiarizing the students with the use of WebAVALIA previously to the assessment moments.

Another disadvantage pointed out was voting manipulation, which is only dependent on each one’s ethics and may not be controllable by external parts. When students were asked for disadvantages on the tool, most of them refer that the tool has “no disadvantages”, meaning that the students are of the opinion that the tool performs accordingly to its purposes and it is appreciated.

In sum, the students find the tool simple, intuitive, and with good usability. They understand its usefulness and consider it indispensable for a fair assessment of workgroups. With the self and peer e-assessment tool, the students felt that it is possible to achieve fair and unbiased assessments of workgroups. In turn, it helped them gain the feeling that the assessment was fair and that their efforts had been adequately rewarded when working in groups.

7 Future work

In the future, it is expected to continue the improvement of WebAVALIA. It would be important to understand the tool’s generalization, by comprehending how other universities, technical training students, and cultures or countries experience this tool. Likewise, it would be relevant to implement WebAVALIA in different contexts, such as sports organizations or business environments, to assist in the awareness of how each member perceives their team and the work exchange within a group. This awareness can support the achievement of better coordination between team members and better assignment of tasks, improving teams’ performance, which may enhance results. Using WebAVALIA on these teams can help team members and managers understand the importance of each member’s contribution to overall results.

Additionally, an experiment to understand good practices when using WebAVALIA would be to compare the students’ satisfaction with the tool when some groups have access to intermediate results, and others do not. By comparing the results of both groups, it would be possible to understand if there are any differences in the groups’ performance. Furthermore, as said in the discussion, the students find that the grade provided to the student who most contributed to the project development should not have an equal grade as the project, while at the same time, tend to believe that it should be higher. Therefore, it would be interesting to find out which is the grade that they find as the optimal for those students. Likewise, it would be important to find out which parameters the students think need to be added to have a more beneficial evaluation. These perceptions could be gathered through a focus group interview with some students who have used WebAVALIA in their assessment.

WebAVALIA’s development is a continuous process; therefore, it is important to continue to use the tool and gather and analyze the users’ feedback. Only with its repeated use and testing will it be possible to improve its usability, efficiency, and user satisfaction. In the future, it will be essential to explore the mathematical formulations and the respective supporting algorithms, discuss borderline cases and other efficient forms of mathematical formulation, and improve the results given by WebAVALIA. Future research includes applying artificial intelligence and machine learning to the survey results to improve the algorithms and provide marks that meet student opinions.