This study explored the utilisation of an asynchronous online learning activity which employed elements of gamification and AR technology. Higher education chemistry students apply the concepts of VSEPR theory to solve problems. Descriptive statistics concerning the variables of conceptual understanding, cognitive load, and attitude to chemistry in response to completing our activity are summarised in Table 1. In addition, analysis of students’ spatial ability, plus their perceptions to the AR technology, and our VSEPR intervention, are also provided.
Analysis of VSEPR Conceptual Knowledge Data
Table 1 shows the relative group-dependent means and standard deviations of the VSEPR instrument test scores obtained before, and after, our activity intervention. Across both groups, 77 students completed the instrument at both the pre- and post-test stages. Following data collection, the Shapiro-Wilk test was used to check for the existence of normality. Although other methods for normality testing exist, Shapiro-Wilk has more power to detect the nonnormality on smaller sample sizes (Gupta et al., 2019). Data was found to be normally distributed for the pre- and post-test stages. In addition, Bartlett’s test was conducted, which verified that the assumption of equal variances was true.
Intergroup comparisons between the two experimental groups showed no significant differences in the pre-test mean scores obtained, t(77) = 0.962, p = 0.339. In addition, no significant differences were observed in the post-test mean scores achieved by the two groups, t(77) = 1.906, p = 0.06. However, it is noteworthy that significant intragroup improvements in performance between pre-test and post-test stages were observed for both the control group, t(40) = 6.809, p < 0.01, and the AR group, t(37) = 4.300, p < 0.01. This supports the premise that, for both experimental groups, chemistry instruction using a synchronous session coupled with an asynchronous game-based learning activity can enhance chemistry understanding. For this cohort of students, the introduction of AR technologies did not result in a significant improvement in performance on the VSEPR instrument when compared to the control group.
To further investigate the post-test scores achieved by the control and AR groups, an ANCOVA was performed on each of the three sections of the VSEPR test instrument. The experimental group was used as the between-subject factor, with pre-test scores as the covariate. No significant differences were found in student performance on test items pertaining to bond angles, F(1,76) = 0.004, p = 0.951, and species identification, F(1,76) = 0.110, p = 0.741. Yet, significant differences were observed for questions regarding molecular geometry, F(1,76) = 5.508, p = 0.027.
Normalised change (c) calculations were conducted as a measure of the learning gain of students between the pre- and post-test stages. The higher the normalised change, the greater the learning gain. Marx and Cummings (2007) suggest calculating normalised change as shown in Fig. 3. For this study, the ranges defined by Hake (1998) for normalised gain are adopted: low (c < 0.3), medium (0.3 ≤ c ≤ 0.7), and high (0.7 ≤ c). The c values calculated were 0.38 for the control group, and 0.26 for the AR group. To account for the variance in individual scores, the effect size was also calculated. The suggested values for effect size were employed: small (0.2), medium (0.5), and large (0.8) (Wassertheil & Cohen, 1970). The effect size was calculated as 0.36.
To better understand item and scale characteristics of the VSEPR test instrument, the concepts and analytical procedures of Classical Test Theory (CTT) were applied to help determine the overall instrument reliability. CTT is regularly utilised by researchers as the first step in establishing reliability. Its association with basic statistical comparisons means that researchers who have had any exposure to measurement theory are likely to have encountered CTT (DeVellis, 2006). However, CTT has some noticeable shortcomings, such as correlations being computed on the sample which may differ between cohorts. CTT-based methods do not involve the rigorous scrutiny of item characteristics that other methods, such as Item Response Theory, employ.
Figure 4 shows the calculated properties of difficulty and discrimination for items on the VSEPR test instrument. In the context of educational testing, a difficult item is one that more respondents answer incorrectly. The difficulty values calculated range from 0 to 1, where a higher value indicates an easier item (Kline, 2005). The most effective items have mid-ranges of difficulty (Ding & Beichner, 2009). Yet, in practice, a difficulty of 0.5 on every test item for every cohort is not feasible. As a result, difficulty values within a range of 0.3–0.9 are acceptable (Doran, 1980). Items more strongly correlated with other items, and thus the true score, are fundamentally better items. Such items are said to have greater discrimination (DeVellis, 2006). The extreme group method was used to calculate discrimination with groups partitioned by the top and bottom 27% (Preacher, 2015).
Seven items (2, 4, 8, 3, 1, 5, and 6) fell within the generally accepted range for item difficulty and discrimination. Item 7 falls on the upper limit for item difficulty (implying that it is tending towards being too easy). Items pertaining to species identification demonstrated difficulty values lower than the generally accepted range (indicating harder items) and show low discrimination values. For the remaining items, the discrimination measure ranged from 0.24 to 0.76. Apart from species identification items, discrimination values constitute reasonable evidence that each item’s score is positively related to the overall proficiency represented by total performance on the instrument.
Cronbach’s alpha for total score estimates the internal consistency of the instrument. The alpha value for the VSEPR knowledge test was 0.62, which indicates adequate reliability for an assessment used for low-stake purposes (Cortina, 1993). One item had an alpha-if-deleted value greater than the overall test value: item 9. To see if removing item 9 would improve the reliability, Cronbach’s alpha was recalculated, excluding this item. Cronbach’s alpha for this reduced set of items was 0.66. This suggests that item 9 may not cohere with the rest of the items on the instrument and may need to be reviewed for modification or replacement. However, for the purposes of this study, the original validated instrument (Merchant et al., 2013) was utilised.
The analysis of long-answer responses surfaced misconceptions in students’ understanding of VSEPR theory. Following our measurement rubric (Appendix 2), the total score that students could obtain per passage of the online asynchronous activity is equal to 2n, where n is the number of inscriptions per passage. The correct responses (%) achieved by both groups are shown in Fig. 5. Common misconceptions and mistakes identified are summarised as follows:
A double bond contributes to more than a single bonding group around the central atom (passage 1, inscription 6).
The “octa” prefix in an octahedral geometry refers to the number of electron groups that can accommodate the central atom (passage 3, inscription 7), rather than the defined vertices of an octahedron.
Students displayed difficulty describing the shapes of ions (passage 4, inscription 9). Many students could classify ClF3 as T-shaped, but would incorrectly identify MnCl52− as trigonal bipyramidal, thus not accounting for all valence electrons.
Following the pre-training activity, more than 75% of students, from both experimental groups, were able to correctly identify that the Berry pseudorotation occurs in species adopting a trigonal bipyramidal geometry (passage 2, inscription 5). Further, more than 75% of students correctly stated that the Berry pseudorotation does not occur in tetrahedral molecules (passage 2, inscription 6). It was observed that the students in the AR group achieved higher scores on their submitted long-answer responses from the activity than the control group. However, this was not reflected in the post-test performance, where no significant intergroup differences were observed.
Analysis of Cognitive Load Scale Responses
A total of 34 students completed the CLS instrument. However, responses from 2 participants were incomplete and subsequently excluded from further analysis. To reveal if significant group differences for each type of cognitive load measured were present, an independent sample t-test was applied to each of the subscales. No significant differences were detected for ICL, t(30) = 1.703, p = 0.099, or ECL, t(30) = 0.144, p = 0.887.
This demonstrates that students perceived that they needed to invest similar levels of cognitive effort to understand VSEPR topic content (ICL), but also to comprehend representations of the molecular shapes (ECL), regardless of whether this was done using the augmented reality tool or perspective drawings. For ICL, this finding is expected, and in line with the meta study by Ibáñez and Delgado-Kloos (2018). The ICL, which describes the complexity of a learning topic itself, should not be influenced by any kind of learning support such as the integration of AR technology.
Our hypothesis was that the introduction of AR would result in a reduction of ECL as students would exert lower levels of cognitive effort to comprehend the representations. Although we did not see this result throughout this study, qualitative data collected from students may offer an insight into why this was the case. Participant interviews suggest that some of the gamification mechanics embedded into the activity required significantly higher effort to overcome relative to the chemistry concepts within the problems. Turan et al. (2016) report that gamification elements occupy the working memory capacities of students, therefore demanding more mental effort. This may have contributed to the ECL of the students, offsetting the cognitive advantages provided by the augmented reality technology.
Furthermore, no significant group differences were observed for GCL, t(30) = 0.667, p = 0.510. This is reflected in the non-significant difference in mean scores obtained by students on the VSEPR test instrument, in line with the suggestion that GCL is indicative of information retention (Leppink et al., 2013). No significant between-group effect was observed for ANCOVA results when comparing ICL, with pre-test VSEPR test scores as a covariate, F(1,29) = 2.721, p = 0.103. Lastly, no significant between-group effect was observed for VSEPR post-test scores obtained with GCL as a covariate, F(1,29) = 1.799, p = 0.190.
Analysis of Spatial Ability Scores
Conducting tests for assumptions of normality and homogeneity of variances showed a non-normal distribution for spatial ability data collected during both the pre- and post-test stages. As a result, the Levene’s test was used to verify homogeneity of variances (rather than Bartlett’s test) as this is more appropriate for non-normal distributions. A total of 45 students completed both the pre- and post-test spatial assessment. The reliability was measured using Cronbach’s alpha, with values of 0.795 and 0.774 obtained for the pre- and post-test stages, respectively. Intergroup comparisons were conducted using the Mann-Whitney U test. For pre-test scores achieved, no significant differences were observed between the two groups (U = 250.5, p = 0.955). In addition, no significant differences were observed when comparing gender (U = 223, p = 0.494). However, it should be noted that when comparing gender on all pre-test scores, n = 111, a significant difference for gender was observed (U = 1108, p = 0.043), with males performing better than females. This result is consistent with meta-analysis conducted regarding the correlation of spatial ability and educational performance (Roach et al., 2020).
No significant differences were observed when comparing the post-test scores achieved by the two groups (U = 170.5, p = 0.06). However, significant intragroup improvements for spatial score were observed for the control group (Z = 3.368, p < 0.01) and the AR group (Z = 3.887, p < 0.01) over the course of the study. Spearman’s correlation revealed a ‘moderate’ correlation (rs = 0.416) between students’ mental rotation ability and VSEPR conceptual knowledge scores obtained, which was found to be statistically significant (p < 0.01). A one-way ANCOVA showed no significant differences between-group effects for VSEPR test performance on bond angle determination (F = 0.799, p = 0.457), recognising geometries (F = 2.898, p = 0.068), and species identification (F = 1.912, p = 0.162) using spatial ability as a covariate.
To understand if students with lower spatial ability, who utilised AR, demonstrated greater gains in performance, a Spearman’s correlation was conducted between the pre-test spatial scores obtained by students and their calculated normalised change (Fig. 6). This was preferred over other common approaches such as the ‘median split’ to avoid the problems associated with categorising continuous variables (Irwin & McClelland, 2003). No significant relationship was present for this study, (rs = 0.097, p = 0.251), and therefore, further investigation of spatial ability as a predictor of performance gain, through techniques such as regression analysis, was not possible.
Analysis of Students’ Attitudes
We are interested in exploring relationships between attitude and achievement in chemistry. The questionnaire utilised contains two sub-scales: Emotional Satisfaction (ES), corresponding to the affective domain, and Intellectual Accessibility (IA), representing the cognitive domain. The mean scores reported for each sub-scale by each group are presented in Table 1. In Table 2, we present each item of the ASCI (V2), as reported by both groups, alongside the asymptotic significance, calculated during intergroup comparison (Mann-Whitney U test).
The scale reliability was established by calculating the Cronbach’s alpha. For the control group, the alpha values calculated were 0.735 for IA and 0.767 for ES. In addition, alpha values of 0.775 for IA and 0.735 for ES were calculated for the AR group. This indicates, for both groups, that a very good level of internal consistency is present. Interestingly, higher alpha-if-deleted values were calculated with regards to item 8 for both groups. A likely reason for this occurrence is the variance in meanings attributed to these adjectives, which may have resulted in students assigning different meanings to this item. Kahveci (2015) outlines the difficulty in translating item 8 to the Turkish language. It may well be that this item is not consistently interpreted by our students.
Intergroup comparisons show no significant differences for any items from either scale, except for item 2 (Complicated–Simple). When calculating the effect size (Cohen’s d) for item 2, a ‘large’ effect size of 1.542 is obtained (d > 0.8) (Wassertheil & Cohen, 1970). We hypothesised that AR technology would simplify the visualisation of representations, however, that is not reflected in the ASCI responses collected from both participant groups. Performing a one-way ANCOVA on the VSEPR post-test scores using IA as a covariate shows no statistically significant results. We believe that this difference stems from discussions raised during the qualitative analysis, where students discussed the potential difficulty encountered in overcoming the gamification elements, which confounds with the potential benefits of the AR technology.
Analysis of Interview Responses
We recruited 15 students, across both experimental groups, to participate in individual semi-structured interviews. The interview schedule covered three topic areas: usability of the ChemFord application (including experience and interaction, and perceived usefulness), the students’ experience of our intervention (including perceived learning effectiveness, satisfaction, performance achievement and reflective thinking), and the cognitive benefits of integrating augmented technologies (including comprehension of topic content, problem solving, and perceived mental effort).
Qualitative analysis of the participant interviews was completed through latent thematic analysis using the approach of Braun and Clarke (2006). Data was recorded, and transcribed verbatim, prior to being subjected to analysis for commonly occurring themes. The initial broad themes were constructed based on frequency and similarity of responses. Redundancy was eliminated and closely related major themes were merged. In this paper we focus on three predominant themes found in student discussions: supporting the learning experience, augmented reality as an asset, and challenges of integration. We report the use of negotiated agreement as the reliability measure for this data set to minimise subjectivity in the coding process and reduce errors. Differences were discussed and where there was a consistent disagreement a common approach was agreed.
Supporting the Learning Experience
Throughout our discussions, students commented that the examination of molecules within the augmented environment not only reinforced their three-dimensional understanding of the VSEPR concepts, but also helped them appreciate the three-dimensional nature of chemistry, “I always forget that molecules are, you know, three-dimensional... [this] is a constant reminder that we have to think that these are three dimensional molecules.” (Interviewee K).
Students perceived that the integration of augmented reality, as an additional mode of learning, into the teaching process improved their understanding of VSEPR, “With the app, I understand it better than if I was just using paper.” (Interviewee L). Similarly, regarding multiple contexts for learning, “I think ChemFord definitely allows you to see it better... it actually took me quite a long time to grasp what the 2D drawings were actually trying to show.” (Interviewee M). In addition, students commented on their feelings of engagement with the teaching content when AR technology is utilised.
“With the AR, if more of the lecturers did it, I would definitely like it a bit more. It breaks up the teaching content and makes it more interesting.” (Interviewee N).
The ability to manipulate objects within the augmented experience (moving/rotating/scaling) was considered an important affordance of the application, “If you had a molecule that was slightly different so maybe, a mirror molecule to a different molecule, you can always compare by twisting and turning, making it bigger and smaller... And it helps me understand the difference between different molecules in different forms.” (Interviewee D).
Our VSEPR asynchronous online activity was also positively perceived. All interviewed participants expressed a desire to repeat this style of activity in future modules throughout their degree programme. “I would like to see more of these. I’ve just really enjoyed having to challenge myself in a different way.” (Interviewee J). Students frequently stated that the worked example in the ‘Adventurer’s Logbook’ assisted them in correctly identifying the geometry of molecular species within the activity. The majority of students suggested that this recurrence should be once or twice a semester (typically a 12-week period at UEA).
Participants enjoyed the challenge presented by the gamification mechanics embedded into the activity, “It was a really nice change to just questions and bringing that sort of logic and having to think deeper” (Interviewee O). Similarly, “It’s not just the chemistry but also the analytical thinking, thinking about the statements.” (Interviewee B). Students additionally commented that our intervention “made me feel a bit more confident on VSEPR.” (Interviewee D), and that the activity “does help you implement the knowledge that you’ve learned.” (Interviewee G).
Our online VSEPR activity was primarily designed as a group activity. With the transition to online learning in response to Covid-19, we wanted to ensure social interactivity between students, and that this activity was an opportunity for students to collaborate, “So, we did that [the activity] together in person, and having me turn something around orient it [a molecule] to show him what I was thinking. That was when I found it most helpful”. (Interviewee F). Yet, a minority of students commented that they “like doing it [the activity] independently.” (Interviewee L). The design of the activity allows students to utilise skills both working in a team and solving problems independently.
Augmented Reality as an Asset
A positive opinion ran throughout most participants discussions regarding the AR technology. This positivity was found both in comments regarding the affordances that AR provides, and in remarks about the alternative resources that students purport to use. For visualisation of molecular structures, students commonly mentioned the use of Molymod molecular models (Mollymod.com, 2021). Students stated several benefits of the AR tool over physical models. Two discussion points were convenience and availability.
“I think the AR can work better. I would have to go out and get the Molymods, whereas I can download the app and have it in 30 seconds. That was preferable.” (Interviewee A).
Convenience was frequently attributed to two predominate discussion points: the ability to generate augmented experiences on their personal mobile devices from a large library of structures and that these structures could be created instantaneously without the additional effort of building the molecular structure. An attributed distinction of the Molymod physical models was the ability to modify the molecular structure to “take molecules apart and build whatever you like. That’s quite useful.” (Interviewee B). This is an affordance not currently provided by the ChemFord AR application.
Students described the user interface of the application as ‘intuitive’, “It’s actually very easy. Very easy to use.” (Interviewee C). This theme was also found throughout a previous thematic analysis of utilising ChemFord for visualising topics of stereochemistry (Elford et al., 2021). As well as these descriptions, further reports of student interaction with the tool suggest minimal frustration experienced by users, an important factor in design of a tool that will be adopted by students.
“I think it’s intuitive. I mean, it always picked up markers quickly. And, you know, just tapping around at the screen, you really quickly figure out how to do stuff on it.” (Interviewee D).
When discussing mental visualisation of structures, in relation to the topic of VSEPR, the role of AR in assisting the visualisation process was of great benefit to students, in comparison to perspective drawings. “If I can see the molecule, that’s a lot better for me. It helps me visualise.” (Interviewee E). Similarly, here: “The app is good for seeing things visually. I don’t really know why I wouldn’t use the app.” (Interviewee F).
Challenges of Integration
A number of challenges regarding the integration of the ChemFord AR application, and our VSEPR intervention, ran through participants’ discussion. Three major themes evolved from student interviews: (i) exposure of the ChemFord application, (ii) the format of the activity, and (iii) the technological limitations.
Although student comments on ChemFord suggest that it was positively perceived as an educational tool, challenges were expressed regarding integration of the application into the teaching and learning process. Outside of a synchronous learning environment, students explicitly stated reasons why they may not adopt AR technology. Primarily, easy access to the image target library was seen as an obstacle for students.
“If I had the markers to hand, it may have prompted me to look at the shapes. Not having them to hand, I just forgot about it.” (Interviewee G).
“I didn't use the AR, just because I didn’t have the markers to hand.” (Interviewee H).
Additional accounts from interviewees describe further reasons attributing to the lower student uptake of this AR technology outside of formal synchronous activities.
“To be honest, once it had been mentioned in lectures you kind of forgot that it was there. So, I just use Google...” (Interviewee I).
“I think I would have been more used to using it, but because not all of the lecturers use it. It's kind of like, I haven't been shown it that much.” (Interviewee D).
Participants also expressed the desire to be able to toggle the requirement to scan image targets to generate the augmented experience. As an alternative, students suggested the capability to spawn objects through import from a search function. As such, we have added this as a feature to the application.
“You’re going to need a code, if you want to use it, because if they don’t have anything... I think like maybe add a search bar or something with all the molecules.” (Interviewee E).
“I would like to be able to keep that molecule. So, like, if you scan it could like add it to a database on the app. And you could get back that molecule, get it back up, and without having to scan the QR code.” (Interviewee F).
Recurrent themes of our VSEPR activity were principally coded to (i) difficulty, (ii) gamified elements, and (iii) affective response. Difficulty captured students’ reports of the effort required to correctly apply the VSEPR subject content to evaluate problems. The difficulty of the activity was perceived by the majority of students to be surmountable with a minority commenting that they would have been more satisfied with a harder challenge. “I don’t think it was too bad in terms of difficulty. I thought it was at the right level.” (Interviewee H).
“I just wish it was a little bit harder. It was really interesting and cool, and I’d love to do more things like that.” (Interviewee I).
A minority of students also raised comments regarding device-dependent limitations of their personal mobile device when adopting AR technology, for example, students with devices that do not meet the minimum target API requirements for AR. An important step for integration of this paradigm will be to ensure accessibility for all students whilst keeping up with the rapid pace of technological developments.
Some limitations of this study must be acknowledged. Firstly, a major limitation is the relatively small sample size that the data analysis was based upon. The sample size was the result of modest enrolment compounded by participant dropout between the pre- and post-test stages. Secondly, following the adoption of online learning in response to the Covid-19 pandemic, our VSEPR activity was structured as an asynchronous study activity. Consequently, we did not have the opportunity to observe students’ interactions with the AR technology when participants were completing our asynchronous VSEPR intervention. Lastly, we must acknowledge the possibly of self-selection bias from participants. Students who volunteer for interviews may be different from the rest of the population regarding their communication ability or reasoning level.