Effectiveness of Computer-Based Scaffolding in the Context of Problem-Based Learning for Stem Education: Bayesian Meta-analysis

Kim, Nam Ju; Belland, Brian R.; Walker, Andrew E.

doi:10.1007/s10648-017-9419-1

Effectiveness of Computer-Based Scaffolding in the Context of Problem-Based Learning for Stem Education: Bayesian Meta-analysis

Meta-Analysis
Open access
Published: 29 July 2017

Volume 30, pages 397–429, (2018)
Cite this article

Download PDF

You have full access to this open access article

Educational Psychology Review Aims and scope Submit manuscript

Effectiveness of Computer-Based Scaffolding in the Context of Problem-Based Learning for Stem Education: Bayesian Meta-analysis

Download PDF

20k Accesses
80 Citations
5 Altmetric
1 Mention
Explore all metrics

Abstract

Computer-based scaffolding plays a pivotal role in improving students’ higher-order skills in the context of problem-based learning for Science, Technology, Engineering and Mathematics (STEM) education. The effectiveness of computer-based scaffolding has been demonstrated through traditional meta-analyses. However, traditional meta-analyses suffer from small-study effects and a lack of studies covering certain characteristics. This research investigates the effectiveness of computer-based scaffolding in the context of problem-based learning for STEM education through Bayesian meta-analysis (BMA). Specifically, several types of prior distribution information inform Bayesian simulations of studies, and this generates accurate effect size estimates of six moderators (total 24 subcategories) related to the characteristics of computer-based scaffolding and the context of scaffolding utilization. The results of BMA indicated that computer-based scaffolding significantly impacted (g = 0.385) cognitive outcomes in problem-based learning in STEM education. Moreover, according to the characteristics and the context of use of scaffolding, the effects of computer-based scaffolding varied with a range of small to moderate values. The result of the BMA contributes to an enhanced understanding of the effect of computer-based scaffolding within problem-based learning.

Computer-Based Scaffolding Targeting Individual Versus Groups in Problem-Centered Instruction for STEM Education: Meta-analysis

Article 15 October 2019

Nam Ju Kim, Brian R. Belland, … Daryl Axelrod

Scaffolding practices for modelling instruction in STEM-related contexts: insights from expert and novice teachers

Article Open access 25 October 2023

Hans-Stefan Siller, Ortal Nitzan-Tamar & Zehavit Kohen

The effectiveness of self-regulated learning scaffolds on academic performance in computer-based learning environments: a meta-analysis

Article 04 April 2016

Lanqin Zheng

Introduction

The Next Generation Science Standards promote the use of problem-based learning (PBL), which requires that students construct knowledge to generate solutions to ill-structured, authentic problems (Achieve 2013). Central to student success in such approaches is scaffolding—dynamic support that helps students meaningfully participate in and gain skill at tasks that are beyond their unassisted capabilities (Belland 2014; Hmelo-Silver et al. 2007). When originally defined, instructional scaffolding was delivered in a one-to-one manner by a teacher (Wood et al. 1976). But researchers have considered how to use computer-based tools as scaffolding to overcome the limitations of one-to-one scaffolding such as high student to teacher ratios (Hawkins and Pea 1987). Computer-based scaffolding has been utilized in the context of PBL in Science, Technology, Engineering, Mathematics (STEM) education, and many studies have demonstrated the effect of computer-based scaffolding on students’ conceptual knowledge and higher-order skills. However, it is difficult to generalize from the results of individual studies without the use of systematic synthesis methods (e.g., meta-analysis) due to different educational populations in particular contexts. For that reason, several meta-analyses have addressed the effectiveness of computer-based scaffolding (Azevedo and Bernard 1995; Belland et al. 2017; Ma et al. 2014), but none focused specifically on scaffolding in the context of PBL.

This study aims to determine and generalize the effectiveness of computer-based scaffolding utilized in PBL in terms of several characteristics of computer-based scaffolding and its contexts through meta-analysis. In addition, a Bayesian approach was used to address some issues of traditional meta-analysis such as biased results due to publication bias and low statistical power due to a lack of studies included in meta-analysis.

Computer-Based Scaffolding in Problem-Based Learning

PBL is a learner-centered instructional approach that aims to improve students’ content knowledge and problem-solving skills through engagement with authentic and ill-structured problems, which have no single correct answer (Hmelo-Silver 2004; Kolodner et al. 2003; Thistlethwaite et al. 2012). Students acquire new knowledge by identifying knowledge gaps between their current level of knowledge and the level of knowledge it would take to address the given problem (Barrows 1996). To accomplish PBL tasks, students need diverse skills including advanced problem-solving skills, critical thinking, and collaborative learning skills (Gallagher et al. 1995). However, students who are new to PBL can struggle due to different levels of background knowledge, learning skills, and motivation. Scaffolding has been utilized to make the tasks in PBL more manageable and accessible (Hmelo-Silver et al. 2007) and to help students improve deep content knowledge and higher-order thinking skills (Belland et al. 2011). Computer-based scaffolding, especially, has had positive impacts on students’ cognitive learning outcomes. For example, students can be invited to consider the complexity that is integral to the target skill and spared the burden of addressing complexity that is not (Reiser 2004) through computer-based hints (Leemkuil and de Jong 2012; Li 2001; Schrader and Bastiaens 2012), visualization (Cuevas et al. 2002; Kumar 2005; Linn et al. 2006), question prompts (Hmelo-Silver and Day 1999; Kramarski and Gutman 2006), and concept mapping (Puntambekar et al. 2003). In addition, computer-based scaffolding can improve students’ interest and motivation toward their learning (Clarebout and Elen 2006).

Meta-analyses Related to Computer-Based Scaffolding

Computer-based scaffolding has been found to positively impact many variables across many studies. However, generalizing such results is difficult because the learning environment, population, and experimental condition vary across the studies. Therefore, some scholars tried to combine the results and synthesize information from multiple individual studies through meta-analysis. Belland et al. (2017) conducted a traditional meta-analysis to determine the influence of computer-based scaffolding in the context of problem-centered instructions for STEM education. The overall effect size of scaffolding was g = 0.46 (Belland et al. 2017). This result indicated that computer-based scaffolding can help students learn effectively in problem-centered instruction. However, the previous meta-analysis (Belland et al. 2017) did not specify the effects of scaffolding according to scaffolding and students characteristics within the context of problem-based scaffolding. In addition, in the case of conventional instruction, as opposed to problem-centered instructional models, meta-analysis indicated that computer-based scaffolding including intelligent tutoring systems positively impacted students’ learning (g = 0.66) regardless of instructor’s effects, study types, and region (Kulik and Fletcher 2016). Other meta-analyses on the effectiveness of intelligent tutoring systems showed a wide range of effect sizes: g = 0.41 among college students (Steenbergen-Hu and Cooper 2014), g = 0.09 for K-12 students’ mathematical learning (Steenbergen-Hu and Cooper 2013), d = 0.76 when compared the effectiveness with human-tutors (VanLehn 2011), and d = 1.00 for scaffolding within an early model of intelligent tutoring systems (Anderson et al. 1995). However, no meta-analyses have investigated the effectiveness of computer-based scaffolding in the context of problem-based learning.

Issues of Traditional Meta-analysis

Meta-analyses offer much more in the way of systematicity than traditional research reviews, but some scholars (Biondi-Zoccai et al. 2012; Greco et al. 2013; Koricheva et al. 2013) noted two potential pitfalls that can occur during meta-analysis. One issue is related to small-study effects, which can result in publication bias. In meta-analyses, the effect size is estimated based on observations reported in previous studies. For meta-analyses, these observations should be standardized, but multiple errors can occur because of the size of studies involved. Small studies tend to report larger effect sizes than large studies, and the effect sizes from studies with small sample sizes (i.e., n < 10) can be biased (Hedges 1986). For this reason, some scholars concluded that if both small-study effects and large-study effects are included in one data set, researchers should focus on analyzing only large-study effects (Biondi-Zoccai et al. 2012; Greco et al. 2013; Hedges 1986; Koricheva et al. 2013). However, in the case of educational research, large sample sizes are rare.

Another issue is that in traditional meta-analysis, there is no method to include a level of a variable if the number of studies coded at that level is too small. There has been debate on the minimum number of studies that should be included in meta-analysis (Guolo and Varin 2015). In theory, one can conduct meta-analysis with just two studies (Valentine et al. 2010), but in this case, statistical power is largely reduced. In one study that investigated the difference of statistical power in meta-analysis according to the included number of studies, the range of power in meta-analysis with 10 studies was not beyond 0.2 (random effect model with large heterogeneity), 0.4 (random effect with small heterogeneity), and 0.5 (fixed effect model) (Valentine et al. 2010). However, in educational research, especially PBL, it can be difficult to find a large number of studies on a given topic that all meet inclusion criteria, due to different education populations and levels, learning environments, and outcomes, which in turn make it difficult to arrive at reliable and validated results (Ahn et al. 2012).

Considering the abovementioned issues of meta-analysis, it is worth considering an alternative methodology to address publication bias resulting from small-study effects and a small number of included studies. This study suggests a Bayesian approach to conduct meta-analysis for determining the effect size of computer-based scaffolding in the context of problem-based learning. The explanation of Bayesian meta-analysis will be specified in the next section.

Bayesian Meta-analysis

To address the identified limitation of traditional meta-analysis approaches, one can use a Bayesian approach, which assumes that all parameters come from a superpopulation with its own parameters (Hartung et al. 2011; Higgins et al. 2009). Superpopulation refers to an infinite population of abstractions that has unique characteristics of the population, and the finite population itself is considered to be the sample of superpopulation (Royall 1970). That is, in sample-based inference, the expected value can be obtained from all possible samples of the given finite population. On the other hand, the expected value from superpopulation model-based inference is derived from all possible samples in an infinite population, which means that the number of the units constituting the population is infinite. Under this assumption of superpopulation, the Bayesian approach relies on (a) generating a prior distribution (ρ (Ɵ)) utilizing data from pre-collected studies that should not be included in Bayesian meta-analysis, (b) estimating the likelihood that the prior distribution is valid based on the observed data (ρ(data|Ɵ)), and (c) generating a posterior distribution, which can be calculated by the Bayesian law of probability (ρ(Ɵ|data)). In short, this approach can provide a more accurate estimate of the treatment effect by adding another component of variability—the prior distribution (Schmid and Mengersen 2013). Prior distribution is defined as a distribution that articulates researchers’ beliefs or the results from previous studies about parameters prior to the collection of new data (Raudenbush and Bryk 2002). Prior distributions play a role in summarizing the evidence and determining evidential uncertainty (Spiegelhalter et al. 2004). In the Bayesian model, μ (weighted mean effect size), τ (standard deviation of the between-study variance), and β (study-level covariates) of prior distributions can be important factors in estimating prior distributions (Findley 2011; Sutton and Abrams 2001).

Typically, the prior distribution can be divided into two types (i.e., informative and non-informative prior distribution) according to whether prior information (i.e., μ, β, and τ) about the topic exists or not. When prior information is not available, the Bayesian approach commonly uses non-informative prior distribution. The posterior distribution can be different according to how the between-study variance (τ ²) in the prior distribution is set up (see Table 1). It is important to consider all possible τ ² (i.e., maximum values of τ ² and minimum values of τ ²) in non-informative prior distribution.

Table 1 Reference prior distributions for τ ² (Spiegelhalter et al. 2004)

Full size table

As seen in Table 1, the purpose of reference prior distribution is to maximize divergence between the variances of studies (i.e., τ ²), and it can generate maximum effects of newly added data on the posterior distribution (Sun and Ye 1995). After assuming all possible τ ² across studies using the above reference prior distributions, one can identify the most suitable prior distribution model using deviance information criteria (Spiegelhalter et al. 2002).

Research Questions

The purpose of this study is to determine the effect of computer-based scaffolding in the context of PBL by addressing the following research questions.

1)
How does computer-based scaffolding affect students’ cognitive learning outcomes in the context of problem-based learning for STEM education?
2)
How does the effectiveness of computer-based scaffolding vary according to scaffolding intervention?
3)
How does the effectiveness of computer-based scaffolding vary according to scaffolding customization and its methods?
4)
How does the effectiveness of computer-based scaffolding vary according to the higher-order skill it is intended to enhance?
5)
How does the effectiveness of computer-based scaffolding vary according to scaffolding strategies?
6)
How does the effectiveness of computer-based scaffolding vary according to discipline?

Method

Search Process

The initial databases we used to search for articles were Education Source, PsycInfo, Digital dissertation, CiteSeer, ERIC, PubMed, Academic Search Premier, IEEE, and Google scholar. Those databases were recommended by a librarian and experts who have expertise in the fields related to this study. However, some articles were duplicated across databases or we could not find any article that satisfied our inclusion criteria from the following databases: PubMed, Academic Search Premier, and IEEE. Various combinations of the following search terms were used in the databases listed above: “scaffold, scaffolds, computer-based scaffolding/supports,” “Problem-based learning,” “cognitive tutor,” “intelligent tutoring systems,” “Science, Technology, Engineering, Mathematics,” and subcategories of higher-order thinking skills. The search terms were determined by researchers’ consensus and advisory board members’ advice.

Inclusion Criteria

The following inclusion criteria were used: studies needed to (a) be published from January 1, 1990, and December 31, 2015; (b) present sufficient information to conduct Bayesian meta-analysis (statistical results revealing the difference between treatments and control group, number of participants, study design); (c) be conducted in the context of problem-based learning within STEM education; (d) clearly reveal which types of scaffolding they used; and (e) address higher-order thinking skills as the intended outcome of the scaffold itself. We found a total of 21 studies with 47 outcomes (see Appendix). The numbers of studies and outcomes differ because some studies had multiple outcomes with different levels of the moderators, which were used in this meta-analysis (i.e., scaffolding intervention, higher-order thinking skills, scaffolding customization and its method, scaffolding strategy, and disciplines).

Moderators for meta-analysis

Scaffolding Intervention

Conceptual scaffolding provides expert hints, concept mapping and/or tools to engage in concept mapping, and visualization depicting concepts to help them identify what to consider when solving the problem (Hannafin et al. 1999). For example, scaffolding in Su (2007) was designed to focus attention on key content and to stay organized in a way to achieve project requirements. Strategic scaffolding helps students identify, find, evaluate information for problem-solving, and guide a suitable approach to solve the problems (Hannafin et al. 1999). An example can be seen in the scaffolding in Rosen and Tager (2014), which enabled students to construct a well-integrated structural representation (e.g., about the benefits of organic milk). Conceptual scaffolding can be distinguished from strategic scaffolding in that conceptual scaffolding helps students consider tasks from different angles through the reorganization and connection of evidence; on the other hand, strategic scaffolding tells students how to use the evidence for problem-solving (Saye and Brush 2002). Metacognitive scaffolding allows students to reflect on their learning process and encourages students to consider possible problem solutions (Hannafin et al. 1999). For example, the reflection sheet in Su and Klein (2010) encouraged students to summarize what they had learned, reflect upon it, and then debrief that information. Motivational scaffolding aims to enhance students’ interest, confidence, and collaboration (Jonassen 1999a; Tuckman and Schouwenburg 2004).

Higher-Order Thinking Skills

Scaffolding is often designed to enhance higher-order thinking skills (Aleven and Koedinger 2002; Azevedo 2005; Quintana et al. 2005). The definition of higher-order thinking and its subcategories differ according to different scholars. Higher-order thinking can be defined as “challenge and expanded use of the mind” (Newmann 1991, p. 325) and students can enhance higher-order thinking skills through active participation in such activities as making hypotheses, gathering evidence, and generating arguments (Lewis and Smith 1993). According to Bloom’s taxonomy (Bloom 1956), higher-order thinking is the stage beyond understanding and declarative knowledge. Therefore, analyzing, synthesizing, and evaluating can be classified as higher-order skills. Analysis means the ability to identify the components of information and ideas, and to establish the relations between elements (Lord and Baviskar 2007). For example, scaffolding in Bird Watching Learning (Chen et al. 2003) provided pictures, questions, and other information that learners can use to identify bird species. Synthesis refers to recognition of the patterns of components and the formation of a new whole through creativity. Through this ability learners can formulate a hypothesis or propose alternatives (Anderson et al. 2001). The mapping software in Toth et al. (2002) can help students formulate scientific statements using hypotheses and data with a summary of information found in web-based materials. Defined as the ability to judge the value of material based on definite criteria, evaluation allows learners to judge the value of data and experimental results and justify conclusions (Krathwohl 2002). For example, several strategies of scaffolding (question prompts, expert advice, and suggestion) in Simons and Klein (2007) provided learners support to rate the reliability and believability of each evidence item and the extent to which they believe the statements that they generated. Based on the above illustrative phrases, critical thinking and logical thinking can be combined to form the critical thinking category of “Analysis” and creative thinking and reflective thinking can be combined to form the critical thinking category of “Synthesis,” and problem-solving skills and decision-making can be combined to form the critical thinking category of “Evaluation” (Bloom 1956; Hershkowitz et al. 2001).

Therefore, in this study, I defined higher-order thinking skills as those cognitive skills that allow students to function at the analysis, synthesis, and evaluation levels of Bloom’s Taxonomy, and in this study, the categorization for intended outcomes can be analysis, synthesis, and evaluation as the variation of higher-order skills.

Scaffolding Customization and Its Methods

By effectively controlling the timing and degree of scaffolding, students reach the final learning goal by their own learning strategies and processes (Collins et al. 1989). In this sense, scaffolding customization is defined as the change of scaffolding frequency and its nature based on a dynamic assessment of students’ current abilities (Belland 2014). There are three kinds of scaffolding customization (i.e., fading, adding, fading/adding supports). Fading means that scaffolds are introduced and then pulled away. As an example of fading, Web-based Inquiry Science Environment (Raes et al. 2012) faded scaffolding according to the level of students’ learning progress (e.g., full function of scaffolding in the beginning step, but no scaffolding in advanced steps). On the other hand, adding is defined as increasing frequency of scaffolds, reducing the interval of scaffolding, and adding new scaffolding elements as the intervention goes on. In Chang et al. (2001), if students continue to struggle excessively, greater quantities and intensities of scaffolding can be requested through the use of a hint button. Moreover, other type of scaffolding customization is fading/adding, which is defined as increasing or pulling away scaffolds depending on students’ current learning status and their requests. Scaffolding, which neither increased nor decreased regarding its nature or frequency, was categorized as none. According to the meta-analysis reported by Belland et al. (2017), in 65% of the included studies, there was no scaffolding customization. Moreover, Lin et al. (2012) also pointed out a lack of a number of studies adopting fading function (9.3%) in a review of 43 scaffolding-related articles. This means that while many scholars maintained that fading is an important element of scaffolding (Collins et al. 1989; Dillenbourg 2002; Puntambekar and Hübscher 2005; Wood et al. 1976), scaffolding customization has largely been overlooked in scaffolding design.

There are three ways to determine scaffolding fading, adding, and fading/adding: performance-adaptation, self-selection, and fixed time interval. Performance-adaptation means that the frequency and nature of scaffolding can be changed by students’ current learning performance and status. On the other hand, self-selection is defined as the scaffolding customization by students’ own decision to request fading, adding, and both of them. In intelligent tutoring systems that can monitor students’ ability, scaffolding fading is often performance-adapted and adding supports is self-selected. In addition to the bases of performance and self-selection, scaffolding customization can also be fixed, defined as adding or fading after a predetermined number of events or a fixed time interval has passed (Clark et al. 2012). Among the scaffolding customization methods (i.e., performance-adapted, self-selected, and fixed), performance-adapted scaffolding customization was the most frequent (Belland et al. 2017), but there have been few studies that investigated which scaffolding customization method have the highest effects on students’ learning performance in the context of problem-based learning.

Scaffolding Strategies

Scaffolding strategies include feedback, question prompts, hints, and expert modeling (Belland 2014; Van de Pol et al. 2010). Feedback is the provision of information regarding the students’ performance to the students (Belland 2014). In Siler et al. (2010), a computer tutor that covered experimental design evaluated students’ designs and provided feedback about their selection of the variables of interest. Question prompts help students draw inferences from their evidence and encourage their elaborative learning (Ge and Land 2003). Students read question prompts that directed their attention to important problem elements and encouraged them to conduct certain tasks (Ge et al. 2010). Hints are clues or suggestions to help students go forward (Melero et al. 2011). For example, when students tried to change the paragraph text, computer systems showed word definitions and provided audio supports to read these words. Expert modeling presents how experts perform a given task (Pedersen and Liu 2002). In Simons and Klein (2007), when students struggled with balloon design, expert advice was provided to help them distinguish between valuable and useless information. In addition, several types of strategies can be used within one study to satisfy students’ different needs according to the contexts (Dennen 2004; Gallimore and Tharp 1990). As an example of several types of strategies, in Butz et al. (2006), students received expert modeling, question prompts, feedback, and hints to solve a real-life problem in their introductory circuits class.

Discipline

In this paper, “STEM” refers to two things: (a) the abbreviation of Science, Technology, Engineering, and Mathematics, in which scaffolding was utilized and (b) integrated STEM curricula. Integrated STEM education began with the aim of performance enhancement in science and mathematics education as well as the cultivation of engineers, scientists, and technicians (Kuenzi 2008; Sanders 2009). Application of integrated STEM education increased students’ motivation and interests in science learning and contributed to positive attitudes toward a STEM-related area (Bybee 2010). For example, the results of two meta-analyses indicated that the integrative approaches on STEM disciplines showed higher effects (d > 0.8) on students’ performance than in separate STEM disciplines (Becker and Park 2011; Lam et al. 2008). However, there are few studies investigating the effects of computer-based scaffolding, which has been commonly utilized in each STEM field, in the context of problem-based learning for integrated STEM education. Therefore, it may be worthwhile to investigate the comparison of scaffolding effects between integrated approach in STEM education and each STEM fields. In this regard, integrated STEM education and each STEM discipline (i.e., Science, Technology, Engineering, and Mathematics) are included as discipline moderator.

Table 2 shows the moderators and subcategories of each moderator in this meta-analysis.

Table 2 Moderators in Bayesian meta-analysis

Full size table

Prior Distribution in This Study

In Bayesian analysis, the estimation of the posterior distribution can be substantially affected by how one can set up the prior distribution. Typically, there are three methods to determine the prior distribution. One method is to follow experts’ opinions about parameter information related to a certain topic. Experts’ opinions reflect the results of existing studies, and it is possible that their opinion can represent current trends about the effects of a certain treatment. Unfortunately, there are few summarized experts’ opinions regarding the effects on computer-based scaffolding in PBL. Those that do exist can be seen as highly subjective. In this regard, this study excluded the use of expert opinion as a possible prior distribution models. As the second method, one can utilize the results of meta-analysis as a prior distribution. There are two representative meta-analyses related to computer-based scaffolding including intelligent tutoring systems (ITS)—Belland et al. (2017) and Kulik and Fletcher (2016). In the case of Kulik and Fletcher’s meta-analysis, their research interests focused on how the effects of computer-based scaffolding including ITS vary in the various contexts of learning environments such as sample size, study duration, and evaluation types. This means that their results did not emphasize the characteristics of ITS. Therefore, it is difficult to utilize these results as prior distribution of this study, which focuses on the characteristics of scaffolding. Recently, a National Science Foundation-funded project (Belland et al. 2017) aimed to synthesize quantitative research on computer-based scaffolding in STEM education. The moderators in this project are overlapped with the many moderators in this study. However, the big difference between the previous TMA and this paper is the learning contexts. This paper only focuses on problem-based learning, but the contexts in the TMA included several problem-centered instructional models (e.g., inquiry-based learning, design-based learning, project-based learning). Such problem-centered instructional models incorporate many different teacher roles, learning goals and processes, student learning strategies, and scaffolding usage patterns (Savery 2006). This makes it difficult to apply the results of TMA into the informative prior distribution in this paper, which only handled problem-based learning.

The last method to set up the prior distribution is to use non-informative prior distribution. If one does not have enough prior information about the parameter θ, one can hope that the selected prior has little influence on the inferences for generating the posterior distribution of the parameter. In other words, non-informative prior distribution only contains minimal information about parameters. For example, if assuming the range of parameter from 0 to 1, one can set up μ (i.e., the mean of parameter) as 0 and the variances of μ as 1. But, if the parameter occurs on an infinite interval, the variance between studies should be large enough (τ ² → ∞) to have little influence on the posterior. These techniques make the prior distribution non-informative. In this paper, several non-informative prior distribution models (i.e., uniform, DuMouchel, and Inverse Gamma), which specify different weighted values of the between-study variance, τ ², were used to identify the most suitable model fits for the given data.

Data Coding

All study features were coded by theory-driven constructs regarding scaffolding characteristics, roles, and the contexts of its use. And these codes were validated by experts in the field of scaffolding, problem-based learning, and STEM education. All effect sizes for all features were calculated by Hedges’s g, a measure of effect size that is corrected due to the use of weighted standard deviations. In addition, considering the variances between individual studies with the wide range of education population, subject areas, and scaffolding interventions, this study utilized the random effects model assuming that the true effect size may vary from study to study. The kinds of possible quantitative results from the individual studies were F statistics, t statistics, mean differences, and chi-square. With these quantitative results, all effect sizes of computer-based scaffolding corresponding the moderators were calculated using the metan package of STATA 14. Two graduate students who have extensive knowledge of scaffolding and problem-based learning, as well as coding experience for meta-analysis, participated in the coding work. The primary coder selected the candidate studies based on the inclusion criteria and generated initial codes about the moderators (i.e., scaffolding type, scaffolding strategy, scaffolding customization, scaffolding customization methods, higher-order thinking skills, and disciplines) in this study. The second coder also coded the data independently, and then the coding between two coders was compared. When there was inconsistency of coding between the coders, consensus codes were determined through discussion. Inter-rater reliability was calculated using the Krippendorff’s alpha statistic when the initial coding was finished. Krippendorff’s alpha is used to measure the level of coders’ agreement on the values of variables (i.e., nominal, ordinal, and ratio) in the coding rubric by computation of the weighted percent agreement and weighted percent chance agreement (Krippendorff 2004). Krippendorff (2004) recommended 0.667 as the minimum accepted alpha value to avoid the wrong conclusion from unreliable data. All Krippendorff’s alpha values (α ≥ 0.8) across all moderators were above the minimum standard for reliable data, and this indicated that there was strong agreement between the two coders (see Fig. 1).

Data Analysis

For data analysis, STATA 14 and WinBUGS 1.4.3 were utilized. WinBUGS 1.4.3 provides Bayesian estimation including prior distributions options by MCMC and STATA 14 imported the results and codes from WinBUGS and generate graphical representations.

Markov Chain Monte Carlo simulations were used to sample from a probability distribution of the Bayesian model (Dellaportas et al. 2002). Integrating Markov chains can replace unstable or fluctuated initial values of random variables with more accurate values through repetitive linear steps, in which the next state (i.e., value of variable) can be influenced by the current one, not by preceding one (Neal 2000). In this process, 22,000 MCMC iterations for estimation of posterior distribution were generated and 2000 initial iterations were omitted to eliminate initial values that were randomly given. After analysis, deviance information criteria (DIC) was utilized for identifying model fits (Spiegelhalter et al. 2002). The lowest value in DIC means the best model to predict the reproduction of data as the observed data (Spiegelhalter et al. 2004). DuMouchel for “scaffolding customization methods” and uniform prior distributions for all remaining moderators had the smallest value of DIC (see Table 3). Uniform and DuMouchel assume different variances between studies (i.e., τ ²), and the results can differ according to which prior distribution was used, even though the underlying dataset is the same. After MCMC generated the posterior distribution of each moderator, the validation of models was investigated through four types of graphs—trace plots, autocorrelation plots, histogram plots, and density plots.

Table 3 DIC values of prior distributions

Full size table

Observed Data Characteristics

The number of observed data points across subcategories within moderators is unbalanced (see Table 4). There was no included study that involved motivation scaffolding; thus, motivation scaffolding could not be included in this paper. Moreover, around 10.6% of the outcomes included in this paper had small sample sizes (n < 10), resulting in the possibility of small-study effects. Smaller studies often show larger effect sizes than larger ones, leading to overestimation of treatment effects (Schwarzer et al. 2015).

Table 4 Number of outcomes according to subcategories

Full size table

To investigate empirically if there were small-study effects, I conducted Egger’s regression to test the null hypothesis of “there are no small-study effects” (see Table 5). The result shows that there are small-study effects since the null hypothesis was rejected, p < 0.05.

Table 5 Egger’s regression test for small-study effects

Full size table

More than 80% of recently published meta-analyses may contain biased results caused by small-study effects (Kicinski 2013). This means that there may be a high probability of having biased results if a traditional meta-analysis approach with this data was employed. However, using a Bayesian approach can address potential small-study effects by shrinking overweighted effect sizes through interval estimation and the appropriate use of priors (Kay et al. 2016; Kicinski 2013; Mengersen et al. 2016).

Interpretation of Bayesian Meta-analysis

Bayesian inference is based on posterior probability, which is in turn based on the likelihood of the observed data, not point or interval estimation from the frequentist approach (i.e., confidence interval (CI)) because the standard error gets closer to 0 due to the large number of samples generated through MCMC simulations (Robins and Wasserman 2000). The Bayesian 95% credible interval (CrI) is similar in some ways to the 95% CI from the frequentist perspective, but there is a big difference between them in terms of basic principle and interpretation. A 95% confidence interval means the range including the true effect size in 95% of the cases across all possible samples from the same population. This means that the population parameters are fixed and the samples are random in the frequentist approach (Edwards 1992). On the other hand, Bayesian approach regards parameters as random and samples as fixed (Ellison 2004). Therefore, the Bayesian 95% CrI indicates a 95% probability range of values on the posterior distribution (i.e., parameters), which is generated by fitting the predetermined prior distribution including the information of parameters into the observed data. For example, a wider CI means huge standard error caused by little knowledge of effects or small samples, but CrI is a range of true effects on treatments at the level of populations. Therefore, most Bayesians are reluctant to use frequentist hypothesis testing using p values because the observed data was not included in the posterior distribution (Babu 2012; Bayarri and Berger 2004; Kaweski and Nickeson 1997). However, many scholars interpret the results of Bayesian analysis with the perspective of frequentists, and this causes misunderstanding of results (Gelman et al. 2014).

Results

Overall Effect Size of Computer-Based Scaffolding

The general effects of computer-based scaffolding compared to the group who did not receive any scaffolding was g = 0.385 (95% CrI = 0.022 to 0.802) based on 20,000 simulated iterations. The range of true effect sizes of computer-based scaffolding is modeled as population statistics from 0.022 to 0.802 with 95% probability, and this means that students who used computer-based scaffolding within the context of PBL showed better learning performance than those who did not at the level of population, which is normally distributed (θ = 0.385, σ = 0.254, τ ²(0,1000)) (see Fig. 2).

To verify the model with the above results, graphical summaries were generated (see Fig. 3). The pattern of trace was vertical and dense without any fluctuation of data, and this means that the mean parameter was well-mixed. In addition, the autocorrelation plots indicated that the value of autocorrelation closed to 0 as the lag increased. This indicated that the values generated by each lag were independent. Furthermore, histogram and density plots provided evidence supporting a normal distribution of the generated samples.

Subgroup Analysis

The non-informative prior distribution, which was utilized in this study, considers all possible τ ² (between-study heterogeneity) across the studies, and it justified subgroup analysis to identify potential moderator variables. This study has six moderators (i.e., scaffolding intervention, scaffolding customization, scaffolding customization methods, scaffolding strategies, higher-order thinking, and discipline).