The participants were universities student from two first-semester classes in a Computing course at a private university in the state of Rio Grande do Sul, Brazil. In total, 48 students aged between 17 and 34 years old (M =21, DP =3.74), 38 boys, and ten girls, were invited at the beginning of the semester for participation. Consent forms were delivered for all students, who agreed to join in the experiment, to sign. Only the information of the students who fulfilled the personality trait questionnaire (43 students) and agreed to participate by returning the content form (41 students) was considered at the end of the experiment, totalizing 40 students (7 girls and 33 boys). At the beginning of the experiment, the teachers explained to the students that their participation in the experiment was voluntary; they could quit at any moment, and this would not change their final grade in the class.
We verified the change in engagement by the number of logins, badges, points, and also the number of visualizations of the gamification elements. The grades in the course exams served to evaluate learning. The programming behavior was measured by the accuracy of the solutions submitted by students for programming exercises. Accuracy is the result of the total number of correct solutions divided by the total number of solutions sent. It represents the student care before submitting a solution, being the opposite of trial and error behavior, in which the student sends different solutions repeatedly until success, without seriously reflecting on them, only to get the system feedback.
Personality questionnaire - iGFP-5
To determine students’ personalities, we have used the IGFP-5. IGFP-5 is a self-reported measure composed of 44 items and designed to evaluate the personality dimensions based on the Big Five Personality Factors model (Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism) (de Andrade 2008). It was validated for Brazil through a sample of 5,089 respondents from the five Brazilian regions, 66.9% female, and 79.0% higher education. According to Andrade (2008), individuals with high scores in Openness are generally outspoken, imaginative, witty, original, and artistic. Conscientious individuals are generally cautious, trustworthy, organized, and responsible. Extroverted individuals tend to be active, enthusiastic, sociable, and eloquent or talkative. People with high scores in agreeableness are pleasant, lovely, cooperative, and affectionate. Neurotic individuals are usually nervous, highly sensitive, tense, and concerned.
BlueJ and feeper
FeeperFootnote 1 is a web-based system designed to assist students and teachers in programming classes. In the environment, the teacher can provide programming exercises, which can be solved by students, and automatically corrected by an Online Judge integrated on the platform. It matches the output of the learners’ program with the output of an ideal solution provided by the teacher for a given input. It uses rules previously registered by the teacher also to give some feedback for the students based on the output of their code. A significant advantage of this type of environment is that it reduces the teacher’s burden because it corrects the exercises automatically, allowing the teacher to concentrate their efforts on students who are struggling with the tasks.
BlueJ is a free Java Development Environment designed for beginners to learn the basics of programming (Bluej 2019).
In our work, the teacher recommended students to write their code at BlueJ, which has a more straightforward interface for beginners. After solving the task on BlueJ, students should submit the final solution to Feeper to get the correction and error-feedback. Only Feeper was gamified in this study, and it was used by students to verify their progress.
System logs and grades
The information extracted from Feeper through the use of the environment consisted of the number of: logins, correct and wrong exercises, badges and points obtained, and challenges completed. We also analysed the number of users’ views of the elements ranking, badges, and points. The number of badges view is different from the number of badges obtained. When we counted the number of views, we were analyzing how interested the student was in this element. A student can get many badges because she accomplished the activities successfully due to her interest in the topic, even if she is not interested in getting badges. The same is true for the Points and Ranking.
During the semester, students accomplished three exams as part of their formal evaluation process of the class. Grade A was delivered in the middle of the semester; it was comprised of problems related to topics seen until it. Grade B was the last exam, delivered at the end of the semester. When students were not able to achieve the minimum score, they could improve their grade with Grade C, which was delivered two weeks after Grade B. In this work, grade A contributed to check students’ performance in programming before gamification switching on in the experimental group. The participants completed the IGFP-5 personality questionnaire and were randomly distributed into two groups, the control and the gamified. At the end of the semester, they took the final exam.
Gamification in feeper
The gamification elements implemented on Feeper for this study are points, badges, and ranking, described below. The only difference between the gamified and non-gamified versions of Feeper is that participants in the non-gamified version cannot see the gamification elements, but internally the system still scores points and badges. This score allows us to compare whether students be able to see the gamification elements engage them.
Points appear to participants in two different parts of the system. When students are completing a programming task, they can see how many points they could earn if they solve it successfully. When the solution is incorrect, the score is decreased by five points for each submission (the students can lose a maximum of 70 points for each task). Students can also view their score histories for the solved exercises and the points previously earned. Students were warned that the scores obtained in the exercises would not affect their final grade on the course.
Nine distinct badges were granted to users by obtaining specific objectives, with three degrees (gold, silver, bronze), totalizing 27 badges. Badges were granted for students who have achieved a specific sum of logins, correct assignment, submitted assignments, submitted assignments with no errors, daily activity, and for whom have concluded challenges and were top of the class and the platform.
The ranking is the sum of all points earned by students for all assignments solved. There are two distinct rankings available. The ranking of the class shows the participants with the best scores in the class; its goal is to promote local objectives for students. The second one is the general ranking, which contrasts the scores of all students of the platform who have used Feeper.
This experiment followed an experimental design consisting of two groups, control (21 participants) and experimental groups (19 participants), for which the students were randomly assigned with the only restriction of having the same number of participants initially in both groups. Table 1 shows the number of participants for each personality trait in each group (gamified and non-gamified).
Students in the control group used the original non-gamified version of Feeper, while learners in the experimental group used a gamified version of Feeper with points, badges, and ranking. All students started using the non-gamified variant of Feeper, and only in the second half of the semester (after the first exam, grade GA), students in the experimental group began to use the gamified version.
This type of design allows us to examine the effects of gamification on personality traits using both controlling conditions: the participant with himself (a within-subject design, by comparing the performance and engagement of students of the experimental group before and after Grade GA) and by comparing control and experimental groups after Grade GA. At the end of the semester, students completed the final exam (Grade B), involving all the content of the course. Figure 1 illustrates the phases of the experiment.
Some students have reported that they noticed their version of Feeper was different from the one used by a nearby colleague (they were able to notice the presence of points and ranking). When this occurred, teachers only reported that some new features were being tested in Feeper and were only available to some participants.
The experiment was realized in the second part of 2018 and had a period of four months. The participants had class once a week, and each class had two hours and 38 minutes of duration. Students used Feeper in all classes, except for the first class, the three classes in which the teacher delivered the exams (Grade A, Grade B, and Grade C), and one class of topic review, totalizing 15 classes solving program tasks using BlueJ and Feeper.
In the first week of class, the teacher presented to the students the organization and some introductory notions of computer organization. In the second week, the teacher presented to students the Feeper and BlueJ environments (“Materials” section). Introductory tasks were given for the students to get used to both learning environments.
From the third week onward, students realized four exercises in each class: a worked example, two activities that were part of the final grade, and an optional exercise. The teacher began each class by solving a worked example step-by-step to teach the students how to solve a programming task involving the same concepts to be worked in the class.
Students then solved two other programming tasks with the same difficulty level as the example worked and using the same programming concepts. The students’ grade was composed of these two tasks accomplishments (20%) and the score on the exams (80%). In addition to being part of the grade, these tasks served to identify students’ difficulties. The optional task was an extra activity with greater difficulty and with the possibility of additional grade. The goal was to challenge the students and also verify their engagement as it was optional.