Introduction

In recent years, gamification—which refers to the use of game-based elements, such as game mechanics, esthetics, and game thinking in non-game contexts aimed at engaging people, motivating action, enhancing learning, and solving problems—has become increasingly popular (Apostol et al. 2013; Deterding et al. 2011). Admittedly, the idea of introducing games in teaching is not a new concept. People have been using digital games for learning in formal environments since the 1960s (Ifenthaler et al. 2012; Moncada and Moncada 2014). However, the term of gamification was coined only a few years ago, and since then has been gaining more and more popularity (Dicheva et al. 2015; Sung and Hwang 2013). The benefits of gamification (or, in more broad terms, game-based learning, e.g., Ifenthaler et al. 2012) in educational contexts are widely described in the literature. Among them are increasing student intrinsic motivation and self-efficacy (Banfield and Wilkerson 2014; Seaborn and Fels 2015), motivation effect and improvement of the learning process (Dicheva et al. 2015; Sadler et al. 2013), as well as improving the positive aspects of competition (Burguillo 2010; Conklin 2006).

Games can reinforce knowledge and bridge the gap between what is learned by creating dynamic, fun, and exciting learning environments (Royse and Newton 2007). They are a powerful teaching strategy, and they challenge and motivate students to become more responsible for their own learning (Akl et al. 2013). However, this requires having the game to be well-designed and structured clearly with a framework that provides effective outcomes (Allery 2004). The review presented in Dicheva et al. (2015) suggests that early adopters of gamification are mostly Computer Science/IT educators. This is in line with the rising popularity of computer games, which have become prominent in the last decade. Nowadays, many articles can be found, in which using computer games in the teaching process are introduced and evaluated (Eskelinen 2001; Ko 2002; Rieber and Noah 2008). Nevertheless, not all of them are proper for school circumstances. Zagal et al. (2006) points out that some of the designed games are highly opaque and complex in rules, and did not include players collaborating to play the game: Therefore, these games did not affect students peer-learning. Through peer collaboration, students build on each other’s knowledge to develop new attitudes, cognitive skills and psychomotor skills (Adams 2006; Damon and Phelps 1989). The same authors suggest that for such a purpose board games could be used due to their transparency regarding the core mechanics. Moreover, board games provide the teachers with an opportunity to guide or direct children to meet specific educational goals by extending their learning during and after playing the game (Durden and Dangel 2008; Wasik 2008). Teachers can also facilitate communication amongst children, build understanding about games, discuss concepts, and provide feedback to one another (Griffin 2004).

Board games are also successfully used in early childhood education (Ramani and Siegler 2008; Shanklin and Ehlen 2007) as a pedagogical tool that reinforces a positive environment for learning (Dienes 1963). Games also appear to build positive attitudes (Bragg 2003) and self-esteem, and enhance motivation (Ernest 1986). They have been found to be also effective in promoting mathematical learning (Bright et al. 1983), mathematical discussion (Ernest 1986; Oldfield 1991), social interaction (Bragg 2006), and risk taking abilities (Sullivan 1993). Some types of board games were also used in medical education and have been found as useful methods for conveying information and promoting active learning (Neame and Powis 1981; Richardson and Birge 1995; Saunders and Wallis 1981; Steinman and Blastos 2002). In the present study, a board game—a competitive game between groups of students in a classroom—was used as an assessment in order to examine if it could increase high school students’ achievements and retention of knowledge in physics.

To assess student achievements in general, and as a result of the board game specifically, there are two main formats of assessment distinguished and widely discussed in the literature, namely formative and summative assessment (Harlen and James 1997; Wiliam and Black 1996). In general, formative assessment is carried out throughout a unit (course, project) and its purpose is to provide feedback to students about the learning process. Summative assessment is given at the end of a unit (course, project) and is used to summarize students’ achievements usually in the form of grades (Harlen and James 1997; Looney 2011; McTighe and O’Connor 2005). Even though summative assessment could be performed in many ways, some authors pointed to a lack of post examination feedback for students as a weakness (Leight et al. 2012; Talanquer et al. 2015). In our study, the board game was used essentially as a tool for summative assessment, although it also includes some elements of formative evaluation. Such a combination was dubbed by Wininger (2005) as a formative summative assessment. It entails reviewing exams with students so that they get feedback about their knowledge comprehension. One example of this approach is collaborative testing that aims to give students an opportunity to work in groups during or at the end of an exam (Guest and Murphy 2000; Lusk and Conklin 2003). Research has shown that there are many benefits to collaborative testing. These are described in detail by Duane and Satre (2014), Gilley and Clarkston (2014), Kapitanoff (2009), and based also on literature about the positive impact of group testing (Millis and Cottell 1998; Michaelson et al. 2002; Hodges 2004) and peer-learning (Slusser and Erickson 2006; Meseke et al. 2008; Ligeikis-Clayton 1996), which are parts of collaborative learning. Among others, the most important benefits of collaborative learning are increasing students’ achievements (Bloom 2009; Haberyan and Barnett 2010), reduction of test anxiety (Zimbardo et al. 2003), improvement of critical thinking ability (Shindler 2003), and collaboration skills (Lusk and Conklin 2003; Sandahl 2010).

The assessment in the form of a game employed in the current research is based on the authors’ previous experiences and research (Dziob et al. 2018). It evaluates not only the content matter knowledge itself, as in typical tests, but combines a few different aspects together, as schematically shown in Fig. 1. It assesses the relationship between content knowledge and everyday life, as well as socio-historical context. Moreover, it gives the opportunity to assess research skills required to conduct experiments. The form of the board game enables development of social and entrepreneurial skills in the form of a challenge-yourself competition, which allows students to surpass individual limitation (Doolittle 1997).

Fig. 1
figure 1

Assessment strategy components. In addition to the content matter knowledge, all other expressed elements were involved in the assessment process. Own work

This study reports on the efficacy of assessing students’ knowledge by means of a group board game approach and measuring its effects on students’ learning outcomes. The research questions are as follows:

  1. 1)

    What is the effect and influence of the board game assessment on student learning outcomes when compared with student prior results in physics?

  2. 2)

    What is the effect and influence of the board game assessment on student learning outcomes when compared with a traditional teaching approach?

Methodology

Participants

The research was conducted on a group of 131 students in total from two high schools in Poland. Students were divided into experimental (of n = 37 and n = 36 in school 1 and 2, respectively) and control groups (n = 31 and n = 26). Each group was taught by the same teacher and followed the same curriculum during their education. Just before the experiment, the students had accomplished a 25-h course on vibrations and waves (in school 1) and on optics (school 2). After finishing the unit, experimental groups took part in the assessment in the form of a group board game (described below, hereinafter the intervention) and 1 week after in a traditional test. Students from control groups participated only in the traditional test the same as experimental group, but without the intervention. In each group, the ratio of males to females was similar (3:2).

Intervention

The section below contains a detailed description of the intervention: a game which students from experimental groups played once at the end of the unit together with the evaluation process. The description includes procedure and examples of questions used in students’ assessment.

Intervention Organization

The game lasted approximately 2.5 lesson hours (approx. 110 min). At the beginning, students were divided randomly into groups of 4 to 5 people each, and asked to take seats around the game board table. Each group began with questions concerning some physics phenomena and students moved their tokens (one per group) forward by the number of right answers or correctly named concepts. At the end of the game (when allocated time ended), students were asked to fill in a self- and peer-assessment questionnaire. At each stage of the game after students had attempted, a scientifically accepted answer to each question was provided together with a proper explanation by students or, if needed (when students didn’t pass), by the teacher. Thus, this approach allows the teacher to immediately rectify and clarify students’ misconceptions.

Game Board—Organization

The game board consisted of a circular path, and the participants moved their group token along this path. The path was made up of a random assortment of five potential categories, or activities, to land on. These included physics phenomena charades, famous people, short answer questions, multiple-choice questions, and simple experiments. All questions required the students to perform different types of activities and allow them to obtain a different number of points. Because the number of points obtained at each stage was identical with number of spots the token was moved, the scoring system was identical with the movement system as in typical board game. Additionally, there were also two special lines on the board. Whenever any group reached one of them, the members of both groups received special algebraic tasks or complex experimental tasks to solve. Figure 2 presents the game board with the fields of different type indicated on it.

Fig. 2
figure 2

The design of the board game

Physics Phenomena Charades

Upon reaching this field, one representative of a given group received six cards with names of various physics phenomena related to waves and vibrations (school 1) or optics (school 2; see Fig. 3). Their aim was to describe each concept, without using the words given, so that the rest of the team could guess the name. The time for this task was limited to 1 min (measured by a small hourglass). After the end of the round, tokens were moved forward by the number of fields equal to the number of correctly guessed charades.

Fig. 3
figure 3

Examples of physics phenomena charades

Famous People Charades

These questions were similar to the previous ones, but they related to important people connected with the concepts of waves, vibrations, and acoustics (physicists, musicians, etc.) or optics (Fig. 4). The scoring system was identical to the one employed in the physics phenomena charades.

Fig. 4
figure 4

Examples of famous people charades from games on waves and vibrations and optics

Short Answer Questions

The short answer questions differed with respect to their level of difficulty, but usually they required only a true/false answer (see Fig. 5). The questions were asked by the teacher and the time of each group’s round was 1 min. Within that time, all members of the currently active group could answer as many questions in a row as they managed, without passing their turn to another group. If the provided answer was wrong, the next group took over and had an opportunity to answer other questions. At the end of each round, the groups moved their token forward by the number of the correctly answered questions divided by 2 and rounded up.

Fig. 5
figure 5

Examples of short answer questions

Multiple-Choice Questions

Upon reaching a field of this category, a group received multiple-choice questions related to scientific reasoning (Fig. 6). Students had to point out the correct answer and provide comprehensive argumentation for their choice. By providing the right answer together with the correct explanation, students could move forward by 2 fields on the board. Otherwise, no move was allowed.

Fig. 6
figure 6

Examples of short answer questions

Simple Experiments

Upon reaching a field of this category, the students had to conduct some simple experiments in order to prove relevant phenomena (Fig. 7). The equipment necessary for each experiment, with some extra materials among them, were available to students. An important part of the task was the necessity to make a decision which objects were essential. The other groups taking part in the game were enabled to ask the currently active team detailed questions about the conducted experiment and ask for additional explanations. Having carried out the experiment and addressed the questions properly, the group was allowed to move forward by 2 fields.

Fig. 7
figure 7

Examples of simple experiments tasks

Algebra Tasks

When one of the groups reached the first special line on the game board after the end of the round, all competing groups simultaneously received three algebra tasks. They had 10 min to solve them. For accomplishing this task, each group could receive a maximum of 4 points and moved their token forward by 4 fields. Incorrect or incomplete solutions, assessed by the teacher, reduce the amount of points.

Experimental Task

When one of the groups reached the second special line on the board at the end of the round, all competing groups simultaneously received one experimental task, which was neither discussed nor solved during any previous class. The students had to come up with an experimental design to examine the effect of damping the movement of the harmonic oscillator (school 1) or examine the surrounding medium refractive index on the glass lens focal length (school 2). The groups received special worksheets prepared in accordance with an inquiry-based methodology. Students had to formulate a proper hypothesis, describe the plan of the experiment, draw the experimental setup, write down their observations, analyze the results, and draw the conclusions. This part took up to 20 minutes. For this task, students could receive a maximum of 10 points.

Instruments and Data Collection

Former Achievements

Before the intervention, students from each group were tested individually in four tests throughout the school year; on kinematics, energy, gravitation, and rigid body rotational motion (school 1); and electrostatics, current, magnetic field, and induction (school 2). They comprised mixed problems: content knowledge and scientific reasoning tasks, multiple-choice, open-response, and algebra problems. Tests were the same for experimental and control groups. The average of each student’s percentage results on the four tests was used to measure his/her achievement prior to the game, henceforth called average former achievements and denoted as FA.

Assessment Questionnaires

When the game ended, each student was asked to fill in individually two questionnaires of self- and peer-assessment in order to evaluate themselves and other fellow players from the same group under various sides. Each questionnaire was composed of eight questions designed on a 6-point Likert scale. Half of the questions focused on the students’ communication skills, while the rest on subject matter contribution. In Table 1, the self-assessment questionnaire is presented. The peer-assessment questions were designed in the similar way.

Table 1 Student self-assessment questionnaire

Evaluation Process

The questionnaire-based assessment results were included in the final score according to the author’s own approach presented below and described in detail in Dziob et al. (2018).

  1. 1.

    The mean average score was calculated based on the “subject matter contribution” and, separately, “communication skills” points in the self-assessment results (S).

  2. 2.

    The average score was calculated based on the “subject matter contribution” and, separately, the “communication skills” points ascribed to the student by the other members of the group (the peer-assessment, P).

  3. 3.

    At the end, the “subject matter contribution” and “communication skills” scores were obtained separately as follows:

  4. 1.

    If |S − P| ≤ 1 (a consistent evaluation): take P as the final score

  5. 2.

    if not (an inconsistent evaluation): take P − 0.5 as the final score

The percentage score for each team was calculated by dividing the number of points (number of fields) accumulated by the group by the maximum number of points available to obtain. The final overall score given to each student consisted of three parts:

  1. 1.

    the group common percentage result from the board game—with the weight of 0.5,

  2. 2.

    the questionnaire-based assessment percentage result for the “subject matter contribution”—with the weight of 0.3,

  3. 3.

    the questionnaire-based assessment percentage result for the “communication skills”—with the weight of 0.2.

The final score for each student after the game, calculated according to the algorithm above and expressed in a percentage form, is henceforth referred to as game score (GS).

Post-test

An unannounced post-test was conducted in the experimental groups 1 week after the game. The same test was given to students from the control groups, just after finishing the unit. It was prepared in a traditional written form. There was neither a review of the relevant content knowledge during regular classes nor a post-game discussion of the game problems and results before this test. The post-test (PT) score is expressed in percentage terms.

Students’ Opinions Questionnaire

Students from the experimental groups received an anonymous short evaluation questionnaire a week after the game (just after the post-test). It consisted of six questions asking “How the knowledge assessment method influences your…”, and each answered on a linear point scale ranging from −5 to +5, with the numbers indicating the most negative (−5), through none (0) to the most positive (+5) impact. The evaluated aspects covered pre-test preparation, engagement in team work, answer difficulty, test anxiety, final acquisition of knowledge, and motivation for future learning. It was also a space for students to present opinions on the game. Exact questions are presented in the results section together with students’ answers.

Data Analysis

Basic Statistical Analysis

Below, a statistical analysis of the data is carried out, firstly for the experimental groups, and then in comparison with the control groups. In Table 2, we present basic descriptive statistics and empirical distributions (in the form of histograms, with normality tested by the Shapiro-Wilk tests) for each set of results, i.e., the FA, GS, and PT, for both experimental groups. The numbers 1 and 2 in superscripts indicate the schools.

Table 2 Basic statistics of the results obtained by students from experimental groups

All examined variables have normal distribution. Student t test showed that the differences among each variable means are statistically significant (in each case p value < 0.05). This allowed for comparison of the students’ results in different tests. On average, the students from experimental groups scored 47%/59% (school 1/school 2) in the former test, 70%/80% in the game, and 58%/68% in post-test. An increase of almost 23 percentage points (pp.) between FA and GS in both experimental groups might emerge as the result of student cooperation during the game. The PT results are lower than GS. However, they are still statistically significantly higher than the FA results (p < 0.05), which may suggest a positive impact of game-based assessment on students’ achievements. It should be noticed, however, that at each stage the results of students from the first school are lower than those from the second. This is consistent with the author’s observation about the educational standards in each school. Therefore, in what follows, both groups will be analyzed independently, in comparison to adequate control groups from the same schools.

In both schools, control groups were formed from the students who studied with the same teacher and who completed the same courses. In Tables 3 and 4, basic descriptive statistics for control and experimental groups former achievements (FA) and results in post-test (PT) are presented. In each school, average former achievements of the students from both the experimental and the control group are similar. As proved by the t test (each data are normally distributed), there are no statistically significant differences (p > 0.05) between FA in experimental and control group within one school.

Table 3 Basic statistics of the students from the first school
Table 4 Basic statistics of the students from the second school

The former achievements and post-test results within the control groups were tested in the same way. Results indicate (p > 0.05) that there is no statistically significant difference between former achievements (FAC) and post-test results (PTC) in control groups. It implies that the post-test can be considered as a reliable tool, neither harder nor simpler than the former test. It allows the comparison of the PT results between experimental and control group. In school 1, the difference between mean results is close to 8 pp. (p = 0.0000), and in school 2, it is a little bit above 10 pp. (p = 0.0003). It clearly shows that experimental groups obtained statistically higher results in PT than their colleagues from the control groups. In other words, students from the experimental groups gained significantly more knowledge than their colleagues from the control groups.

Students’ Opinions

Students’ opinions about the board game were collected just after the post-test, but before providing them with the information about their final marks. Participants filled in a questionnaire (on −5 to +5 scale, where 0 means no impact) and expressed their comments anonymously in an open, descriptive form. Mean results for the questionnaire questions in both schools are provided in Table 5.

Table 5 Mean results for each question in the questionnaire for both experimental groups

Because the answers for each were normally distributed (tested by Shapiro-Wilk test), the comparisons of the H0: mean against zero was calculated using the Student t test. The test showed (p < 0.05) that in each question students’ answers differ significantly (were higher or lower) than “0” value, which means no impact. In other words, for each question, students state significant influence of the board game on the tested issue. Students in both schools judged the assessment in the form of a group board game beneficial for their preparation, and pre-test preparation was marked by students from each school as positive. This means that students would spend more time preparing for the game, as opposed to spending time on preparation for a traditional test. Both experimental groups agreed that the level of engagement of their team-mates was high and that answering questions was easier than in traditional, individually taken tests. It corresponds with the students’ opinions that this new form of assessment prompts them to give answers, even if they feel uncertain about their correctness. Anxiety during the test was assessed at − 3.1 and − 2.3 in both experimental groups, respectively, which means that this form of assessment reduces the anxiety normally associated with traditional exams. Students also indicated, both in the questionnaires and open opinions, that the board game improved their final level of knowledge. Students also felt motivated by the game to continue learning.

A few examples of the opinions are presented below:

  • Student A:

This is a good option to test for people who are weaker in calculation. Not everyone is able to solve a complex task, but anyone can learn theory.

  • Student E:

This form of the test was very good, because you could learn also during the test. It teaches cooperation in the way you could have fun.

  • Student K:

I think that we learned and invented more during this game than during a written test. It was a very good possibility for integration.

  • Student O:

Each group should get all kinds of questions. Then it would be more fair. Questions should be more focused on physics, without connections to history.

The vast majority of students’ opinions were positive and enthusiastic. A few of them used the feedback to provide helpful and insightful comments for improving the assessment. In the discussion section, we relate them to the findings commonly presented in the literature on collaborative testing and gamification.

Discussion

The main purpose of this research was to investigate the influence of assessing students’ achievements in the form of a group board game in comparison to their former achievements and traditional tests. The first important finding is a statistically significant increase in students’ achievements in the game in comparison to their former achievements. This result is consistent with research on the positive impact of collaborative testing, which shows that students’ results obtained in collaborative taken exams are higher than in individual ones (Bloom 2009; Haberyan and Barnett 2010; Kapitanoff 2009; Lusk and Conklin 2003). Some authors controvert, however, the ability of collaborative testing to improve content retention (Leight et al. 2012; Woody et al. 2008), pointing out that only their performance during the collaborative exam is higher. Our second results addressed this problem. We found that students from experimental groups gained higher results in the post-test taken 1 week after the game with respect to the results obtained by the control groups. In other words, the students assessed by the game obtained not only high performance in the game but also in a knowledge test taken after the game. This finding is encouraging with respect to other research that shows improvement in students’ achievement after the collaborative exam in the long run (Cortright et al. 2003; Jensen et al. 2002; Simpkin 2005). The results show also that the assessment method is efficient independently of the level of students’ performance.

The students’ opinions were encouraging and supported findings in the literature. Board games can be perceived as a form of activity in which group work skills are exploited and play an essential role in accomplishing tasks (Dallmer 2004; Kapitanoff 2009; Lusk and Conklin 2003; Sandahl 2010; Seaborn and Fels 2015; Shindler 2003). Some researchers (Dicheva et al. 2015; Sadler et al. 2013) suggested that gamification could improve the learning process, which can be inferred from the increase in students’ results in post-test. By playing the game, the students learn to listen to everybody else’s answers, provide fellow players with their know-how, and respond to ideas proposed in discussions. According to Hanus and Fox (2015) and Jolliffe (2007), the above can stimulate knowledge assimilation. In the students’ opinions expressed in the questionnaire and open-descriptive form, the board game assessment has a positive impact on their motivation and social interactions, which also corresponds to the literature findings (Banfield and Wilkerson 2014; Bragg 2006; Seaborn and Fels 2015). Furthermore, the assessment in the form of a game induces far less test anxiety by giving students a sense of being supported by the other team members (Banfield and Wilkerson 2014; Kapitanoff 2009; Lusk and Conklin 2003; Sandahl 2010; Zimbardo et al. 2003). Similar results can be found in other research, in which results from student’s attitude surveys confirm that collaborative testers have more positive attitudes towards the testing process in general compared to students who take assessments individually (Bovee and Gran 2005; Giuliodori et al. 2008; Meseke et al. 2009). Finally, an active involvement in the self- and peer-assessment process may increase the students’ self-assurance and adequate self-esteem (Hendrix 1996), thereby enhancing retention of knowledge (Sawtelle et al. 2012).

Comments on Organization of the Game

Preparation of the board game has a few important aspects, which should be described in order to give the reader the impression of how to adapt the idea to her/his own purpose. The most important is to decide on the topic, which should give considerable benefits for assessing knowledge in a non-standard form. A board game has to have clear rules, provide a sufficient rationale for collaboration, be a challenge for participants, and provide different types of activities and experiences. It is connected with the next important step, which is to precisely define the goals of the event, regarding prepared tasks. Chosen activities should allow assessment not only the content knowledge but also all other aspects (e.g., Science as a Human Endeavor and Science Inquiry Skills) chosen by the teacher. Breedlove et al. (2004) reported that the effects of collaborative testing were directly related to the level of cognitive processing required by the test questions. The activities, rules, and scoring system should be modified and matched to the groups. Particularly, in our research, one student claimed that there was a possibility to guess the proper word in the charades without physics knowledge. This can be improved by additional rules or modifying the charades questions with other types of activities. Because the effectiveness of the collaborative testing may depend on students earlier teaching strategies and improve over time as students become more familiar with the collaborative process (Castor 2004), the modification of the game seems to be natural consequence.

Further Issues

The study examined only the short-term effect on students’ retention knowledge. One can suppose that because the initial level of the forgetting curve was higher in experimental groups than in control groups, after a few months, experimental groups should also obtain better results. This assumption has to be, however, tested in future work. The method could be also implemented and verified in subjects other than physics as well as in a wider spectrum of school levels. Even though literature findings about collaborative testing and board games in many science subjects are very enthusiastic, only few of them focus on the distinction between high- and low-performing students (Giuliodori et al. 2008). This question should be also examined in future work.

Another issue is the claim that a teacher could influence the results, e.g., by focusing on the experimental group or neglecting control groups. In our research, control group results in the post-test were similar to their results in all other tests taken before, taken in average as former achievements. This approach, unlike the typical pre-test, allows us to address this remark. Comparison between pre- and post-test provides clear information about students gain in the examination topic, but can be easily influenced by the teacher, which will be found only in the lower achievements for the control group. In our approach, we assumed that an uninfluenced teaching style will have an effect in unchanged students’ results in the post-test, which is assumed to be allowed. However, the implementation of the method under different circumstances could also provide worth worthwhile information.

Future research could also examine collaborative testing as a more effective standard assessment strategy across a curriculum (Meseke 2010). Because the game always has to be a challenge for students, some modifications should be introduced in the type of questions and rules or the board game should be used interchangeably with another collaborative method.

Concluding Remarks

This paper studied the influence of a board game as an assessment method on high school students’ achievement. Students from experimental groups performed better in the game than in the former tests. Simultaneously their achievement in a traditional test taken 1 week after was significantly higher than for students from control groups. It implies that assessing students’ achievement in the form of a game may improve their performance and short-term achievements.

The improvement of students’ achievements may result in combining collaborative testing with gamification. Apart from quantitative results, the students’ enthusiastic opinions are also indicative of the social benefits of the approach, such as the development of group work skills, supporting weaker students through collaboration with others, and, in addition to these, integration of the class. It appears that game-based assessment enhances students’ retention of knowledge and provides opportunities for improvement for each student, regardless of their former performance. Moreover, it helps to improve students’ attitudes towards their learning and add valuable collaborative learning experience to enhance the school curriculum.

The approach can be easily modified and adapted as a testing method in fields other that physics, especially natural sciences, in which assessing the experimental skills and socio-historical context are also under consideration.