Introduction

Critical thinking can be described as the “purposeful, self-regulatory judgement which results in interpretation, analysis, evaluation, and inference, as well as explanations of the considerations on which that judgement is based” (Abrami et al., 2015, p. 275). This high-level skill enables individuals to think logically, make appropriate decisions, and solve problems effectively (Peter, 2012). Critical thinking has been associated with academic achievements, enhanced employability, higher financial status, and better real-life decisions (Butler et al., 2017; Facione & Facione, 2001; Hart Research Associates, 2015). It has also been identified as an important educational goal for higher education (HE), preparing students for the demands of the 21st Century workplace (Hatcher, 2011; Joynes et al., 2019) and is often included in learning outcomes and assessment standards across disciplines (Forbes, 2018).

However, despite the emphasis that HE curricula place on critical thinking, students present difficulties in demonstrating critical-thinking skills (Harrington et al., 2006; Kreth et al., 2010). From educators’ perspective, formal training in critical thinking instruction is rarely provided (Broadbear, 2003; Scriven & Paul, 2007), and there is no clear consensus on how critical thinking should be taught (Abrami et al., 2015). Some researchers have suggested that critical thinking builds on metacognitive skills, such as differentiating inductive and deductive reasoning, interpreting the validity of arguments, and analysing relevant evidence (Solon, 2007). As metacognitive skills are domain-general, these researchers argue that critical thinking should thus be taught across disciplines (Solon, 2007). By contrast, other researchers have argued that critical thinking is context-specific (e.g., Baker, 2001). These researchers, who challenge the usefulness of standalone and generic critical-thinking courses, advocate that critical thinking should be taught within the domains in which it is used and based on content-focused approaches, such as Infusion courses (Baker, 2001; Brunt, 2005; McPeck, 1981). The debate between domain-general and domain-specific critical thinking pedagogy is longstanding; nevertheless, mastering critical thinking skills should imply that students can apply their critical thinking skills and dispositions regardless of context (Solon, 2007).

Apart from the debate in pedagogical approaches, critical thinking education is also challenged by the limited contact time for critical discussion and evaluation of the learning content in conventional teacher-led instructional approaches (Mandernach, 2006; Peter, 2012). All these challenges apply not only to traditional face-to-face teaching formats but also to online pedagogy of critical thinking. Furthermore, the rapid shift of the HE sector to online teaching during the recent COVID-19 pandemic (WHO, 2020) presented educators with additional challenges related to teaching critical thinking. Online learning relies on students feeling comfortable with using and participating in live discussion boards, online debates and focus groups, and this may pose a barrier to student access and engagement in activities relevant to the application of critical thinking skills, especially when students are not familiar with the online learning environments (MacKnight, 2000). There is also a scarcity of studies on instructional strategies to promote critical thinking in online environments (Guiller et al., 2008; Richardson & Ice, 2010).

In this study, we examined the effectiveness of a technology-enhanced learning intervention for critical thinking administered online to HE students during the second round of COVID-19 restrictions in the UK (early 2021). The intervention combined video-based learning with precision teaching, a behaviourally-grounded teaching approach targeted to build so-called fluency on learnt skills. In addition to this, in one of the learning conditions, precision teaching was combined with context-based training to better support the application of learnt knowledge.

Video-based learning

In the HE sectors, which heavily rely on e-learning, video-based learning has become increasingly popular as a student-centred, inclusive learning approach to support ubiquitous learning. Video-based learning enables students to learn outside of the physical classrooms and at their own pace (Syed et al., 2020). It also enables educators to enrich mainstream teaching provision with supplementary material, implement diverse pedagogical strategies (e.g., flipped classroom, blended learning; Yousef et al., 2014), and meet students’ individual learning needs and preferences (Carmichael et al., 2018). There is ample evidence that video-based learning can enhance students’ engagement (Stockwell et al., 2015), academic performance (Salina et al., 2012), and motivation (Hill & Nelson, 2011). There is also evidence that these benefits are maximised when videos of a shorter duration are used (Guo et al., 2014).

Bite-sized or micro-videos are designed to chunk information into manageable and digestible pieces, making the learning content more accessible and improving the engagement of students with it (Koh et al., 2018). It has been suggested that bite-sized video learning sessions facilitate active learning (Brame, 2016), as students can rewind and review parts of the videos more easily when videos are available in smaller chunks (Carmichael et al., 2018). High-speed internet and improved functionality of mobile devices have also helped to integrate bite-sized learning into everyday routines and support autonomy in learning (Khong & Kabilan, 2020). However, research on the educational uses of videos has mostly focused on subject-relevant knowledge and practical skills rather than on higher-level skills such as critical thinking (Carmichael et al., 2018). The current study addressed this limitation in literature by exploring the effectiveness of bite-sized videos on critical-thinking skill development alongside another instructional approach that has been shown to be effective—precision teaching.

Precision teaching (PT)

PT refers to a framework for the systematic self-monitoring of learning (Lindsley, 1997) and the effectiveness of instructional approaches (Kubina & Yurich, 2012). PT can also be used to collect students' learning data and tailor instructional methods to the individual student’s performance (Sundhu & Kittles, 2016). PT often obtains evidence of learning by measuring fluency, the combination of accuracy and speed in performing a targeted skill (Kubina & Morrison, 2000, p. 89), which is a prerequisite for more advanced skills (Kubina & Morrison, 2000). Within the PT framework, fluency is associated with other learning outcomes, including retention—maintaining good performance after an interval without training, endurance—carrying out a task fluently for long durations, stability—not being affected by distractions, and application—combining basic skills to perform a more complex task (abbreviated as RESA, Binder, 1996; Kubina & Yurich, 2012; see also Karpicke & Roediger, 2008 for alternative accounts on the positive effects of testing on memory retrieval and retention).

A commonly used fluency-training approach within the PT framework is frequency building (Kubina & Yurich, 2012). Frequency building uses timed repetition of tasks coupled with performance feedback provided immediately after timed trials (Lokke et al., 2008). This practice is thought to support the acquisition of the targeted skills in a time-efficient manner (Kubina & Yurich, 2012).

Research has shown that frequency-building techniques can support the acquisition of academic skills, such as reading, handwriting, and numeracy (e.g. Chiesa & Robertson, 2000; Hughes et al., 2007). There is less extensive evidence on whether and how frequency-building approaches could support the learning of models of complex thinking (Commons et al., 2015), improve fluency in complex concepts, such as logical fallacies (Fox & Ghezzi, 2003), and strengthen domain-general cognitive skills (Cuzzocrea et al., 2011). These led to a call for research in exploring the extent and the application of frequency-building approaches in enhancing complex, multifaceted skills, such as critical thinking.

One important challenge for frequency-building approaches is that building up fluency in basic skills does not necessarily lead to the automatic transfer of knowledge in applied settings (Kubina & Yurich, 2012). Furthermore, the ability to apply critical thinking skills learnt in real-world or subject-specific contexts does not often come intuitively (Paul & Elder, 2009). One way to address these challenges is to use frequency building synergistically with instructional approaches that promote the transfer and the application of critical thinking skills across domains. For example, embedding critical thinking training into content-focused courses or instructions (Braun, 2004; Gray, 1993; Ikuenobe, 2001) can facilitate the transfer of critical thinking skills by teaching students 'how to think' rather than 'what to think' (Clement, 1979). Similarly, Halpern (1998) proposed a model for the trans-contextual learning of critical thinking skills, which scaffolds the learner's ability to apply skills in real-world contexts.

Current study

In this study, we evaluated the effectiveness of an online learning intervention that aimed to enhance the critical-thinking skills of university students. The intervention focused on the skill of students to identify a type of reasoning error referred to as informal logical fallacies (Carey, 2000). This skill is thought of as a hallmark component of critical thinking (Carey, 2000; Ramasamy, 2011).

The intervention adopted a bite-sized video-learning approach and used frequency building within a precision-teaching framework. We compared the learning performance of three experimental groups: a PT intervention group, a PT + intervention group, and a self-directed learning control group. The two intervention groups (PT & PT +) received frequency-building practice aimed at increasing the rate of fallacy identification, with the addition of problem-based training in the PT + group. The control group was exposed to the same instructional materials as the intervention groups but was asked to navigate through them in a self-paced way.

We examined students' learning of the taught critical thinking skills, as well as their ability to transfer taught knowledge and skills in novel settings. More specifically, we measured student performance on the testing material in which they received instruction, as well as their performance in unseen examples and domain-general assessments of broader fallacy-identification skills.

Furthermore, we carried out follow-up assessments one week after the intervention. These follow-up tests were included in the research design to specifically address the potential benefits of frequency-building training in knowledge retention, which is a key learning outcome associated with precision teaching—RESA, Binder, 1996; Kubina & Yurich, 2012; see also Karpicke & Roediger, 2008).

With all these measures, we aimed to address the following research questions:

  • RQ1: What are the educational benefits of frequency-building practice on students’ learning of taught critical thinking materials?

  • RQ2: What are the educational benefits of frequency-building practice on students’ abilities to apply critical-thinking skills in novel contexts?

  • RQ3: How does frequency building affect students’ knowledge retention following the intervention?

  • RQ4: Does the combination of frequency building with problem-based training support further benefits in students’ learning of taught critical thinking materials (RQ1), generalisation in novel contexts (RQ2) or knowledge retention (RQ3)?

Instructional framework for teaching critical thinking skills

Traditionally, critical-thinking training follows either the domain-general or the domain-specific approach (Tiruneh et al., 2018). However, here, and in-line with other researchers (e.g., Koslowski, 1996; McNeill & Krajcik, 2009; Tiruneh et al., 2018), we take the view that domain-general and domain-specific expertise do not develop in isolation. Rather, both domain-general and context-specific knowledge is important for the effective acquisition of critical-thinking skills (McNeill & Krajcik, 2009). Thus, our instructional framework combines domain-general and domain-specific approaches. Specifically, the introduction to fallacy identification within bite-sized videos and frequency-building practice drew on elements of the domain-general approach; as learners could apply the critical-thinking skills learned across different domains. Whereas, problem-based training drew on elements of the domain-specific approach; as learners could learn how the skills are applied within subject-specific domains.

The domain-general approach is based on the assumption that the identification of informal logical fallacies shares commonalities across disciplines, and proficiency in this skill could transcend across the domain in which training was done. For example, let’s consider a hypothetical Argument 1 “there is no proof that the parapsychology experiments were fraudulent, so I’m sure they weren’t” and another hypothetical Argument 2 “because scientists cannot prove that global warming will occur, it probably won’t”. Although the two arguments differ in terms of context (the first case involves a psychology science, the second case involves nature science), both arguments are fallacious and share commonalities of using the lack of evidence as a proof of correctness (i.e., appeal to ignorance fallacy). In this study, scaffoldings of generic fallacy-identification skills within the bite-sized videos help students develop the skill to identify arguments that are “psychologically persuasive but logically incorrect” (Copy & Burgess-Jackson, 1996, p. 97). The exposure to structural features of fallacies and the use of real-world examples within frequency-building practice prompt students to apply generatively what they had learned. This strategy aligns with Engle et al. (2003) suggestion for intercontextuality as a means of bridging the gap between learning and transfer practices.

In addition, and following the domain-specific view, we also assume that critical-thinking skills may require explicit instruction within subject-specific domains to perform competently. This notion is similar to the Infusion approach, which emphasises how a critical-thinking skill could be applied within a subject-specific context (Abrami et al., 2008). In this study, the context-based scaffolding (i.e., problem-based training) within the PT + group prompts students to apply critical-thinking skills in a context-specific situation. While we compare critical-thinking abilities between students in the PT and the PT + groups, we, therefore, investigated if Infusion is necessary to promote the development of critical-thinking skills across domains (RQ4).

Method

Participants

A total of 57 adults (39 females, 17 males, 1 preferring not to say) with a mean age of 24.14 years (SD = 5.62; range 18–47 years old) took part in this study. Participants were recruited through the University’s Research Participation System and departmental social media platforms. All participants were university students, with 37 registered as undergraduate students and 20 as postgraduate students. The study was approved by the Research Ethics Committee of the Department of Psychology.

Material

The intervention focused on four informal logical fallacies: 'appeal to ignorance', 'bandwagon', 'false cause', and 'hasty generalisation'. These four logical fallacies corresponded to common reasoning errors and were selected after consultation with a subject matter expert (a senior lecturer of a university-level course involving critical thinking) and reviews of relevant textbooks (e.g., Gray, 1991; Schick & Vaughn, 2020). The four logical fallacies share a similar form, consisting of a premise followed by a conclusion (Fox & Ghezzi, 2003; see Table 1).

Table 1 Informal logical fallacies

Instructional material

Learning videos

Two ‘bite-sized’ learning videos, lasting 2:46 and 2:54 min, were created using the video animation software, Powtoon (https://www.powtoon.com). Powtoon has been highlighted as user-friendly software for supporting digital-based learning as it is equipped with various functions that can help to improve teacher’s creativity, boost learning motivation, and support the learning needs of students with different abilities (Muhammad Basri et al., 2021; Resmol & Leasa, 2022; Zamora et al., 2021).

Within the two learning videos, the first video (Episode 1: Arguments and Fallacies) presented learners with the standard form of an argument and introduced the four fallacies. The second video (Episode 2: Examples of Fallacies) gave examples for each of the four fallacies and explained why the arguments involved were fallacious or problematic.

Learning tasks

Two learning tasks (one for each episode) consisting of 20 multiple-choice items were developed to facilitate knowledge acquisition after the presentation of the learning videos. Items for these tasks were based on material from critical thinking textbooks (Gray, 1991; Schick & Vaughn, 2020) and were also reviewed by the subject-matter expert. Each item presented participants with a short paragraph that illustrated an example or a definition of a fallacy, followed by a forced-choice question asking participants to identify the relevant fallacy. Participants received programmed feedback (“Correct!” or “Incorrect!”) on the screen after each answer selection.

Problem-based tasks (used in the PT + intervention group only)

Three problem-based tasks were developed to support learning in the PT + intervention group, following each learning episode. The problem-based tasks consisted of open-ended questions, which required participants to analyse, evaluate, and explain flaws in reasoning within a psychological debate or dispute. Each task first presented a debate situation. This was done by showing a newsletter article or a short paragraph which summarised research findings referring to the main claim in dispute, alongside some context about the debate. For example, participants were presented with a paragraph entitled "does social media do more harm than good?" and referring to a recent survey, which found that feelings of loneliness among young workers increased as they reported higher amounts of time spent on social media. Then, participants were invited to identify fallacies in arguments presented by three panel members, who advocated for the disadvantages of social media (open-ended question, "Review the reasoning of each of the panel members A, B, and C and explain what might be problematic with their reasoning if considered to be faulty"). For example, a panel member would suggest that social media is doing more harm than good based on the fact that too much social media use will cause someone to feel lonely ('false cause'), and his friend, George, who uses social media more than 16 h a day has been diagnosed to have depression lately ('hasty generalisation'). Participants were asked to review each argument and explain if a fallacy was involved.

Subsequently, participants were asked to indicate which of the three arguments presented by panel members they would be least likely to support (forced-choice question, “Indicate which one you believe to be the reasoning that you would be least likely to support”). Finally, participants were asked to provide a suggestion for the best course of action or the best counter-argument to resolve the debate (open-ended question, “If you are asked to give an opinion in this debate, what would be your next course of action”). Programmed feedback was provided for each task following participants’ responses to the questions involved. For example, the panel member above argued that there is a cause-and-effect relationship based on the correlation found, and drew about the impacts of social media on all individuals on the basis of evidence concerning only certain people. Hence, the fallacies of false cause and hasty generalisation were committed.

Testing material

Pre- and post-episode tests based on the learning material

The questions included in the learning tasks of the two episodes were also used in the episode-specific tests of critical thinking. These were administered twice, at the beginning and the end of the episode. The pre- and post-episode tests were administered as time-based assessments (to consider both accuracy and speed in identifying the fallacies). Participants were instructed to answer the questions as accurately and as fast as they could within a minute. No feedback was given in the pre- and post-episode tests.

Pre- and post-intervention assessments on unseen questions

An additional 50 multiple-choice questions were used to assess participants’ skill to recognise fallacies in unseen questions. These were selected from the same bank of questions used for the development of the learning tasks and the pre-and post-episode tests. 25 items were presented as a pre-intervention assessment and the rest as a post-intervention assessment.

Broader abilities in fallacy identification: informal reasoning fallacies identification task (IRFIT; Neuman, 2003).

To assess the students' broader abilities in fallacy identification, we used a test based on the Informal Reasoning Fallacies Identification Task (Neuman & Weizman, 2003; Weinstock et al., 2004). In this study, four informal reasoning tasks, each consisting of two items adapted from Neuman (2003)'s study, were administered to participants. Each reasoning task corresponded to one of the four fallacies and consisted of an argumentative scenario followed by four questions. The scenario was structured in four sentences as follows. The first sentence presented participants with two debaters who were described as either psychology students or philosophers. The second sentence presented the context and the main claim under debate stated in the form of a question. The third and the fourth sentences presented the arguments by the two debaters, a so-called “protagonist” and an “antagonist”. Finally, the specific reasoning of one of the debaters in support of their position was presented with a fallacy involved.

Participants were asked to identify potential flaws in reasoning and identify fallacies. In particular, they responded to the following four questions:

  1. (1)

    A yes/no fallacy identification question, which examined whether participants conceived an argument as fallacious or problematic (e.g. “Do you think there is a problem in the argument that the antagonist presented in Line 5?”).

  2. (2)

    A open-ended fallacy explanation question, which assessed participants’ skill to articulate what they perceived to be faulty with the reasoning of an argument (e.g. “If you think that there is a problem in the argument presented by the antagonist, what is the problem?”).

  3. (3)

    An open-ended response question, which assessed participants’ skill to debate and present a counter-argument (e.g. “What is the best answer the protagonist can use in response to the antagonist’s argument?”).

  4. (4)

    A forced-choice fallacy classification question, which assessed whether participants perceived the argument to be a quarrel, a formal debate, or a critical discussion (e.g. "In your opinion, what is the main reason for the debate between the two arguers"). Participants responded to this question by selecting one of the three answer choices: (a) They do not like each other and, therefore, each person is attacking the other’s claim-quarrel, (b) Each one of them wants to impress his colleagues and win the debate–formal debate, and (c) They have different opinions on this matter, and they are trying to convince each other-critical discussion.

Design

The design of the study is shown in Fig. 1. Participants were randomly allocated to three groups: (A) a ‘precision teaching (PT)’ intervention group, (B) a ‘precision teaching plus problem-based training (PT +)’ intervention group, and (C) a ‘self-directed learning’ control group. The three groups were exposed to the same instructional material and testing stimuli; however, this was administered in different ways to implement different learning conditions. In particular, the PT group received frequency-building learning tasks, which aimed at increasing the rate of fallacy identification. The PT + group completed frequency-building learning tasks combined with the addition of problem-based training to facilitate a better application of critical thinking in the PT condition. Finally, the control group completed learning tasks in a self-directed way.

Fig. 1
figure 1

Flowchart of the study

Procedure

Participants completed the study in three sessions administered online via the Qualtrics platform (Qualtrics, Provo, UT). In the first online session, participants completed the pre-intervention assessment and Episode 1, Arguments and Fallacies. In the second online session, participants completed Episode 2, Examples of Fallacies and the post-intervention assessment. In the last online session, which was administered a week after the completion of Session 2, participants repeated both the post-episode assessments for Episode 1 and Episode 2 as retention assessments.

Each episode started with a time-based pre-episode assessment on fallacy identification. The assessment was followed by the participants watching a learning video, in which the definitions (Episode 1) or examples of fallacies (Episode 2) were explained for approximately three minutes. Participants were asked to watch the video until the end, and the next button to proceed with the next part was only presented at the bottom of the page towards the end of the video presentation. Then, participants completed two blocks of 20 multiple-choice questions, which were administered to the three groups as learning tasks in different forms. The learning tasks allowed participants to familiarise themselves with and consolidate knowledge learnt from the video content. Finally, participants completed the post-episode assessment within a 1 min timeframe.

The three groups were differentiated in the types of learning tasks they completed within the two learning episodes, as detailed in the following section.

Learning tasks in the PT intervention group

Learning tasks in the PT intervention group were guided by a high response-rate requirement implemented in iterations of timed intervals and feedback. Participants were informed that they were going to practice identifying the fallacies within a 1 min timeframe, with the remaining time appearing on the top left corner of the screen. They selected the best answer out of the four choices as fast as they could and received programmed feedback after each response ("Correct!" or "Incorrect!"). After the 1 min interval, participants were shown the number of accurate responses they had provided. Then, participants proceeded to an error-correction procedure, which focused on the questions they had answered incorrectly. During the error correction procedure, participants were instructed to answer these questions again, without any time limit, and were shown the accurate answer if they gave an incorrect response for a second time. After the error correction procedure, participants answered the 20 multiple-choice questions with the same procedure as the first timed interval again. The error-correction procedure and the learning cycle were repeated twice before progressing to complete the post-episode test (see Fig. 2).

Fig. 2
figure 2

Screenshots of the learning tasks interface for PT intervention groups—a instruction page; b video presentation page; c block presentation page; d learning tasks page; e error correction procedure page

Learning tasks in the PT + intervention group

Participants who were assigned to the PT + intervention group completed the same learning tasks as the PT group. Additionally, participants in this group completed the corresponding problem-based task following each episode.

Learning tasks in the control group

In this group, learning tasks were completed in a self-directed way, without a high response-rate requirement. Participants were instructed to answer all 20 questions accurately and as fast as they could (but not within timed intervals) and were given feedback on the number of correct responses they achieved. This cycle was repeated twice before progressing to complete the post-episode test. Hence, the main difference between the intervention groups (PT and PT +) and the control group was that participants in the control group did not complete the learning tasks in 1 min timed intervals; rather, they were asked to complete the whole tasks at their own pace. The learning tasks and the number of blocks conducted in each episode remained the same as in the intervention groups.

Scoring

IRFIT

Content analysis was conducted on participants’ answers to the tasks by two researchers. Using the scoring procedures from Neuman (2003)’s study, 10% of the data was marked by both scorers, and Cohen’s Kappa showed that there was strong agreement between the two scorers (κ = .814; McHugh, 2012). The yes/no fallacy identification question (e.g. “Do you think there is a problem in the argument that the antagonist presented in Line 5?”) was scored as 1 for a ‘yes’ answer and 0 for a ‘no’ answer. Both open-ended fallacy explanations (e.g. “If you think that there is a problem in the argument presented by the antagonist, what is the problem?”) and response questions (e.g. “What is the best answer the protagonist can use in response to the antagonist’s argument?”) were marked as 1 when participants took into account to identify and/or explain the informal reasoning fallacy involved in the situation. Participants scored 0.5 when they captured the key elements of why the arguments were fallacious but nonetheless did not provide a complete explanation. Participants scored 0 when either they did not answer the question, did not identify the problem in the situation, or did not take into account the fallacy involved when explaining.

Data analysis

Quantitative data collected from the pre-and post-episode tests and the pre-and post-intervention assessments were analysed to examine the effects of time (within-participants factor) and differences between groups (between-participants factor). When preliminary data checks suggested that the assumptions of normality and homogeneity of variance were met, data were analysed with a 3 (Groups: PT vs. PT + vs. control) × 2 (Time: pre- vs. post-episode/intervention) mixed-design ANOVA. When these assumptions were violated, Wilcoxon Signed Rank non-parametric tests (within-participants) were used to compare differences in a given measure across two time points, and Kruskal Wallis non-parametric tests (between-participants) were used to examine differences in the changes in the measure between groups. If the data were normal but the homogeneity of variance was violated, changes in a measure over time were examined with t-tests, and between-group differences in change over time were examined with a Welsch one-way ANOVA.

In a complementary analysis, we compared changes between participants with relatively low and relatively high performance.

In all analyses, effect sizes were reported using relevant standardised measures (t-tests: Cohen’s d; Wilcoxon Signed Rank/Kruskal Wallis: r, Welch one-way ANOVA: ω2, mixed ANOVA: ηp2). For Cohen’s d and r, a value of .20 was taken to suggest a small effect size, a ± .50 a medium effect size, and ± .80 a large effect size; for ω2 and ηp2 the thresholds were .01 (small), .06 (medium) and .13 (large) (Cohen, 1988). Effect sizes d greater than .40 were considered educationally relevant (Hattie, 2009).

Results

Pre- and post-episode tests on the learning tasks

Figure 3 presents the mean scores of the pre- and post-episode tests for Episode 1 and 2 for the three groups. Shapiro–Wilk tests indicated that the assumption of normality was not met (p < 0.05 for Episode 1 pre- and post-episode tests, and Episode 2 pre-episode test), hence, Wilcoxon Signed Rank tests were conducted to examine the changes in performance within each episode. The results showed that participants, on average, scored significantly higher in the post-episode (Episode 1: Mdn = 10.00; Episode 2: Mdn = 10.00) compared to the pre-episode tests (Episode 1: Mdn = 5.00; Episode 2: Mdn = 5.00) on the learning tasks, for both Episode 1 (Z = 6.31, p < .001, r = .84) and Episode 2 (Z = 5.78, p < .001, r = .77).

Fig. 3
figure 3

Mean scores of the pre-and post-episode tests. Scores were calculated out of participants’ accurate responses to 20 questions within a minute. Error bars represent standard errors of the means

Given that the data were not normally distributed, we compared improvements in the three groups using Kruskal Wallis tests for Episode 1 (PT: Mdn difference = 5.00; PT + : Mdn difference = 6.00; Control: Mdn difference = 5.00) and Episode 2 (PT: Mdn difference = 6.00; PT + : Mdn difference = 5.00; Control: Mdn difference = 4.00). These tests suggested that the improvements of the three groups were comparable in both Episode 1 [H (2) = .17, p = .920, r = .02] and Episode 2 [H (2) = 1.02, p = .601, r = .13].

Pre- and post-intervention assessments on unseen questions

Figure 4 shows mean accuracy scores in the pre-and post-intervention for the three groups. Shapiro–Wilk tests indicated that all data were statistically normal (all ps > .05). However, the preliminary Levene’s test suggested that the assumption of homogeneity of variance was not met for the post-test measures (p = .017).

Fig. 4
figure 4

Mean accuracy scores of the pre-and post-intervention assessments. Scores were calculated out of 25 questions. Error bars represent standard errors of the means

Paired sample t-tests were thus conducted to compare performance between pre-and post-intervention assessments in the three groups. These tests suggested significant improvements in all three groups [PT: t (18) = 10.33, p < .001, d = 2.37; PT + : t (18) = 7.68, p < .001, d = 1.76; Control: t (18) = − 4.12, p = .001, d = .95].

To compare participants' improvements between groups, a Welch one-way ANOVA with corrected degrees of freedom was used. The results showed a trend for a difference between the average scores of the three groups, which, however, did not reach levels of statistical significance, F (2, 34.63) = 2.61, p = .088, ω2 = .05.

To gain further insight into the non-significant trend of between-group differences, in a complementary analysis, we divided participants into lower- and higher-scorer categories based on their pre-test scores. Participants who scored at the 50th percentile and below were categorised as lower-scorers (n = 33), and those who scored above the 50th percentile were categorised as higher-scorers (n = 24). Figure 5 shows the mean accuracy scores of low- and high-scoring participants in the pre-and post-test. Shapiro–Wilk tests indicated that the assumption of normality was not met for the pre-and post-test scores (all ps < .05). Hence, a Wilcoxon Signed Rank non-parametric test was conducted to compare participants' scores between pre-and post-intervention assessments. The results showed that both low- and high-scoring participants achieved significantly higher mean scores at post-intervention compared to pre-intervention (Low-scoring: Z = 4.79, p < .001, r = .83; High-scoring: Z = 3.68, p < .001, r = .75].

Fig. 5
figure 5

Mean accuracy scores for low- and high-scoring participants at pre-and post-intervention assessments. Scores were calculated out of 25 questions. Error bars represent standard errors of the means

With regards to differences in the improvement of low- and high-scoring participants, a Kruskal Wallis test suggested a significant difference, H (1) = 4.48, p = .034, r = .59, with larger improvements for low-scoring (Mdn difference = 6.00) than for high-scoring participants (Mdn difference = 4.50).

Pre- and post-intervention assessment on broader critical thinking skills (IRFITs)

Figure 6 shows the average scores of the three groups in the IRFIT, the assessment of how well participants applied their critical thinking skills in a broader context of fallacy identification. These data were analysed with parametric statistics; in particular, a 3 × 2 mixed-design ANOVA was conducted, with Group as a between-subjects factor and Time as a within-subjects factor. The analysis showed a significant main effect of Group, F (2, 54) = 6.09, p = .004, ηp2 = .184 (‘large’ effect), which was further explored with posthoc comparisons. These suggested that the performance scores for the PT (M = 11.29) and the PT + intervention groups (M = 12.20) were higher than the scores of the control group (M = 9.07) (PT vs. Control: p = .216; PT + vs. Control: p = .007, PT vs PT + : p = .127). There was also a significant main effect of Time, F (1, 54) = 9.82, p < .003, ηp2 = .154, whereby the post-intervention score (M = 11.35) was higher than the pre-intervention score (M = 10.35), as well as a significant interaction between the two factors, F (2, 54) = 4.14, p = .021, ηp2 = .133 (see Fig. 6), reflecting a significant improvement for the PT (p = .001) and PT + (p = .046) intervention groups but not the control group.

Fig. 6
figure 6

Mean performance scores of the IRFITs at pre-and post-intervention. Scores were calculated out of four IRFITs at each time point. Error bars represent standard errors of the means

Knowledge retention

Figure 7 shows the average scores of the three groups in the post-episode assessments and the retention tests for Episode 1 and Episode 2. Shapiro–Wilk tests indicated that all data were statistically normal. Levene’s tests also showed that the assumption of homogeneity of variance was met. Thus, the data were analysed with a 3 × 2 mixed-design ANOVA with Group as a between-subject factor, and Time (post-episode assessment vs. retention test) as a within-subject factor. For Episode 1, the results showed no significant main effect of Group, F (2, 48) = .22, p = .803, ηp2 = .007; Time, F (1, 48) = 3.00, p = .090, ηp2 = .011; and no interaction, F (2, 48) = 1.35, p = .269, ηp2 = .009. Similarly, for Episode 2, there was no significant main effect of Group, F (2, 47) = .71, p = .497, ηp2 = .023; Time, F (1, 47) = 3.26, p = .077, ηp2 = .015; and no interaction, F (2, 47) = .40, p = .676, ηp2 = .004.

Fig. 7
figure 7

Mean scores for all three groups at the post-episode assessments and the retention tests of Episode 1 and Episode 2. Scores were calculated out of participants’ accurate responses to 20 questions within a minute. Error bars represent standard errors of the means

Discussion

In this study, we implemented and evaluated an online learning design aiming to improve critical thinking skills in university students based on a video-based learning approach that used frequency building under precision teaching. We also combined the frequency-building approach with structured problem-based training to further foster the transfer of the taught skills. We compared the learning performance of the three experimental groups, examining students’ performance in the taught materials, in unseen examples, and in more general fallacy-identification problems, as well as in follow-up tests.

With regards to whether PT could improve students’ learning of the taught material (RQ1), our results from the post-episode tests demonstrated that all groups showed significant and comparable improvements in their skill to identify the taught examples of fallacies. Thus, all three types of learning condition, PT-based and not, worked equally well in supporting video-based teaching of fallacy-identification and yielded comparable outcomes, in line with findings from an earlier study by Fox and Ghezzi’s (2003). Furthermore, taking into account that the broader PT literature tends to focus on simpler and low-level skills, our current findings are important because they suggest that the use of precision teaching can be extended to complex and high-level skills such as critical thinking (Cuzzocrea et al., 2011).

With regards to the application of the taught knowledge into unseen examples (RQ2), the analyses of the post-intervention assessments suggested that, again, all learning conditions yielded comparable improvements. Interestingly, these improvements were greater for students who scored at or below the 50th percentile. Although this result could be, partially, attributed to a ceiling effect, it demonstrates the usefulness of technology-enhanced learning designs, in particular, the use of bite-sized videos and frequency-building practice in enhancing the transfer of fallacy-identification skills of all students and especially those who present difficulties in critical thinking.

Turning to the transfer of the taught skills in a domain-general IRFIT task (RQ2), our results showed that, importantly, only the two PT groups showed reliable improvements in performance post-intervention. Thus, frequency building under the precision-teaching framework can foster the application of skills in novel contexts, in line with Kubina and Yurich (2012), who suggested that frequency building can lead to desirable outcomes of knowledge generalisation. In this study, the two PT groups were given access to practices that helped to build fluency in fallacy-identification skills. The ability to show the generalisation of skills beyond taught materials demonstrated that participants had achieved certain levels of fluency. Significant gains in post-intervention performance on a standardised critical-thinking test also reflect the benefits of frequency-building training and support the notion that skill generalisation is an outcome of fluency-focused training (Binder, 1996).

Furthermore, in the knowledge retention tests (RQ3), there were no significant differences between the post-episode assessment scores and the retention test scores, implying that students, regardless of groups, presented non-significant detriments in their fluency even after a week without practice. Earlier research suggested that the frequency-building practice can support the retention of skills for a longer period of time (Binder, 1996). It is, therefore, expected that the two PT groups would show better skills retention after an interval of no-practice days. However, the difference between the intervention and the control groups was not significant in our study. To understand this inconsistency between our findings and earlier research, further investigation into how frequency-building procedures impact long-term retention is warranted, possibly by extending the time point of retention tests beyond the one-week interval.

With regards to whether problem-based training can support further benefits in students’ acquisition, generalisation, and retention of critical-thinking skills (RQ4), improvements in the domain-general task learning were comparable in the PT and PT + group, suggesting that problem-based training is, indeed, not necessary for promoting the transfer of taught skills. This finding is in contrast with previous literature positing that rigorous practice for critical thinking is required until students can internalise the concepts learnt and demonstrate critical thinking skills intuitively in their daily lives (Paul & Elder, 2009).

In sum, the current study provides a foundation for understanding how the use of video technologies and frequency-building practice can be combined into an effective supplementary teaching tool to promote critical thinking in online settings. The integration of the two approaches is suitable for supporting students of various abilities. Our instruction framework draws on elements from Papert’s constructivism, in which effective learning occurs by building upon individual students’ prior knowledge through active engagement (Papert, 1980). In this study, the use of video technologies to present learning information in a “bite-sized” format helps to maximise students’ engagement with the content and offers students the flexibility to pause, rewind, and revisit any part of the video whenever necessary (Salina et al., 2012). The inclusion of online frequency-building intervention also improves the quality of the session, as it transforms it from solely a passive video-watching event to an active learning opportunity that helps students monitor their own learning and is necessary for knowledge construction (Gaudin & Chaliès, 2015).

This online learning approach addresses challenges in critical thinking instructional designs related to promoting active learning during students’ independent study time (Mandernach, 2006). Our study shows that this type of practice, which focuses on building fluency of skills, is flexible enough to be used in teaching complex concepts such as critical thinking and could lead to desirable learning outcomes, specifically, on the application of skills in a novel setting. Moreover, our study demonstrated that the online learning design of frequency-building intervention is accommodative to individual students, offering students the opportunity to practice their individual mistakes following each practice trial. A technology-enhanced model of frequency-building practice like this also allows a systematic presentation of stimuli and effective tracking of student engagement (Beverley et al., 2009). Our approach to teaching critical thinking skills is versatile and also applicable to the current landscape in Higher Education which the COVID-19 restrictions have transformed (Pokhrel & Chhetri, 2021).

Limitations and directions for further research

Our study is not without limitations. First and in terms of scope, our intervention focused on fallacy identification. However, critical thinking is a multifaceted construct, and future studies should be inclusive of more diverse processes related to the interpretation, analysis, evaluation, and inference, such as argument analysis, evaluation of the credibility of claims or sources, and identification of scientific versus pseudo-scientific procedures (McPeck, 1981).

Furthermore, in terms of research methodology, although participants in the three groups were exposed to similar instructional materials and procedures, the time of exposure in the learning task was not controlled. A more nuanced investigation of learning under precision teaching will need to explicitly examine the duration of exposure and usage of the learning materials. This is important as it has been argued that frequency-building procedures can reduce the time needed to master a targeted skill (Lokke et al., 2008). Furthermore, in the current study, a short-duration precision-teaching intervention yielded significant improvements in fallacy identification performance in novel problem-solving contexts—albeit a small one.

An additional limitation lies in the use of random group allocation in our experimental design, rather than controlling for the participants’ demographics across experimental groups. In this study, participants were randomly allocated to three groups that were exposed to the same instructional stimuli but differed in the way that the learning tasks were performed. Random allocation has been widely used in educational research to evaluate the effectiveness of interventions and to ensure that any group differences are due to chance (Forsetlund et al., 2007). Nevertheless, we acknowledge that there might be individual variations in participants’ educational level, enrolled course, and motivation to learn that we did not account for in this study. One could draw more robust conclusions by assessing how the impact of this intervention depended on these demographics.

Finally, in this study, we did not include instructors in the learning videos. Instead, we used animated videos created using the Powtoon platform. This decision was partly influenced by the time when the research was developed. COVID-19 lockdown restrictions were in place, and all physical engagements were halted during that period, limiting our ability to carry out video recordings with an instructor in place. While various studies have highlighted the benefits of Powtoon-based videos on student engagement and motivation (Muhammad Basri et al., 2021; Zamora et al., 2021), contrasting evidence suggests that some students find learning videos featuring a presenter to be more engaging (Guo et al., 2014; Pi et al., 2017). Future studies could examine the impact of the presence of instructors on students’ engagement and critical thinking skill training. An interesting possibility is to consider peers as presenters as evidence suggested that perceived similarity between a peer and the learner could create a favourable learning environment that can benefit learning (Bulte et al., 2007; Lockspeiser et al., 2008).

Conclusion

The current study demonstrated the potential of an online intervention approach of video-based learning and PT to improve critical-thinking skills of university students. After a brief intervention, which consisted of only two learning episodes, students showed improvements in fallacy identification performance, which transferred into novel problem-solving contexts. These results are important in an era of over specialisation, in which critical thinking is identified as one of the most desired yet most challenging educational outcomes for Higher Education. Given the increased use and acceptance of technology-enhanced approaches as a result of the recent transformation of the Higher Education landscape following the COVID-19 restrictions, the current results provide a new perspective for the combination of video learning and PT practice in an online learning environment. This new perspective regarding our combined approach suggests that technological innovations for critical thinking education are effective and can be easily accommodated to support active learning outside classrooms.