Introduction

Agency, which is closely related to self-regulated learning (Zimmerman 2008), refers to the capacity of students to make choices during learning. Self-regulation includes monitoring one’s own behavior and its effects, judging it according to personal standards, and affecting self-reaction (Bandura 1991). In order for a student to self-regulate, he/she uses personal agency to make choices on future actions. Although there are attempts to investigate how we can best leverage student agency, it is not clear from literature in which circumstances agency may or may not be beneficial for learning. For instance, advanced students are often good self-regulated learners (Schunk and Zimmerman 2007; Zimmerman 2008), but novices are generally not good at regulating their learning, and hence benefit from instructional choices being made for them (Zimmerman 2000). Mitrovic (2001) and Mitrovic and Martin (2002) also demonstrated that advanced students were better at evaluating their knowledge, while novice students were worse at selecting problems to work on.

Several studies investigated the effect of agency on learning, and reported conflicting findings. Some studies found that increased student agency is associated with higher levels of motivation and involvement, and resulted in better learning outcomes (Snow et al. 2015; Rowe et al. 2011). Tabbers and de Koeijer (2010) demonstrated that giving students control over the pace and order of instructions in an animated multimedia presentation led to higher learning outcomes. Similarly, letting students customize game components has also shown to be positive for learning (Cordova and Lepper 1996; Snow et al. 2015).

On the other hand, Sawyer et al. (2017) focused on the variations in agency within the game Crystal Island. The students in the high-agency condition could control how they obtain knowledge by interacting with the environment and game characters, while the students in the low-agency condition had to follow a prescribed order of actions. The low-agency condition students acquired significantly higher learning gains compared to their peers in the high-agency condition. Nguyen et al. (2018) compared learners in two versions (low agency vs. high agency) in a mathematics educational game. In the low-agency condition, learners were guided to play games in a prescribed sequence, while their peers in the high-agency version could choose the games and the order in which to play them. Unlike the study conducted by Sawyer et al. (2017), they did not find any significant difference in learning between the low and high-agency conditions.

Although there have been many studies on the benefits of learning from worked examples and learning with Intelligent Tutoring Systems (ITSs), most of them represent settings with limited student control. In studies with worked examples, most often examples are presented in the fixed order. On the other hand, ITSs typically select the best problems for students to attempt and students typically have control over asking for help. Agency is not often studied in those kinds of experiments, which motivated us to conduct an experiment reported in this paper.

In a previous study (Chen et al. 2017), we added an adaptive strategy to SQL-Tutor, an ITS that teaches database querying, which selected learning activities to present to the student as preparation for problem solving. The strategy selected either a Worked Example (WE), an Erroneous Example (ErrEx), or a problem to be solved (PS), based on the student’s performance, or skipped the preparation task completely in case the student had shown high performance on previous problems. We used PS, WEs and ErrExs, as these types of learning activities have been shown to be effective learning strategies across a broad range of domains (Kalyuga et al. 2001; van Gog 2011; McLaren and Isotani 2011; Chen et al. 2016a; Durkin and Rittle-Johnson 2012; Stark et al. 2011). Two low-agency conditions in that study were 1) the adaptive condition, and 2) the fixed order condition, which restricted students to learn with a fixed sequence of worked example/problem-solving pairs and erroneous example/problem-solving pairs. The results showed that the adaptive condition was more beneficial for learning: the students who received learning activities adaptively achieved the same learning outcomes as their peers in the fixed order condition, but with fewer learning activities.

However, researchers also warn about negative consequences of too much adaptive support, which can be detrimental to students because it frees them from thinking (Hübscher and Puntambekar 2001). The capability to select learning activities is important for learning; a learner should be able to reflect on what is important to them and what they ought to consider learning about next (Mitrovic and Martin 2003).

Consequently, in the study reported in this paper, we investigated the effects of learning using variations of agency within SQL-Tutor. In the High-Agency version, students freely selected preparation tasks (WE, ErrEx, PS or none) before solving problems. In the Low-Agency version, the adaptive strategy selected preparation tasks for students based on their performance. Previous research shows that worked examples are more beneficial for novices (Atkinson et al. 2000; McLaren et al. 2008; Sweller et al. 1998). For advanced students, worked examples may become less effective or even lose their effectiveness for learning (Kalyuga et al. 2001; Kalyuga et al. 1998), because the support provided by worked examples is redundant for them. Erroneous examples have so far been shown to be particularly beneficial to students who have amassed a reasonable degree of domain knowledge (Große and Renkl 2007; Tsovaltzi et al. 2012). Therefore, for high prior knowledge students, our adaptive strategy either skips the preparation task altogether (when their performance on previous problems is high), or provides an erroneous example or a problem to solve. Although past research has demonstrated that erroneous examples are more beneficial for students with high prior knowledge, it seems that even students with low prior knowledge can benefit from erroneous examples (e.g., Durkin and Rittle-Johnson (2012), Chen et al. (2016b), Stark et al. (2011)). Therefore, for low prior knowledge students, the adaptive strategy presents either worked examples or erroneous examples, based on their performance on the previous problem. We attempted to answer two research questions:

  • Research Question 1: Do the Low- and High-Agency conditions differ on learning outcomes? Given the results of the Sawyer et al. (2017) study, we expected that the Low-Agency condition would lead to better learning outcomes compared to the High-Agency condition (H1).

  • Research Question 2: Are learning outcomes different for students with low or high prior knowledge? Given the past research showing that the HPK students are good at self-regulating and self-assessing (Zimmerman 2008; Mitrovic 2001), but LPK students commonly benefit from instructional choices being made for them (Zimmerman 2000), we hypothesized that High-Agency would be more beneficial for HPK students (H2a), and the effect of Low-Agency would be more pronounced for LPK students (H2b).

The paper is organized as follows. In the next Section, we overview some studies on the effect of agency on learning in computer-based environments. The following section presents a brief overview of learning from worked examples and erroneous examples, which provides the background for our study. Next, we present the experimental design, followed by the section on results, and finally discussion of our results.

Agency and Learning

Agency, which refers to the level of control that a student has to perform actions in a learning environment, is an important factor and leads to engagement and learning benefits (Bandura 1989; Zimmerman 2008). Scientific research has started to provide evidence that agency can be effective for enhancing motivation, interest, and attitudes that result in positive learning outcomes. Studies of agency have been done with various types of educational environments, such as multimedia learning and educational games, and also with students of different ages. We review some of the approaches in this section, which are summarized in Table 1.

Table 1 Overview of discussed studies

Calvert et al. (2005) report on a study conducted with preschool children in the context of a computer-based storybook. They compared four conditions: 1) full control by an adult who was reading the story and controlling the mouse, while the child was listening; 2) a joint control condition, with the adult reading the story and the child and child controlling the mouse; 3) a child-control condition and 4) a no-exposure condition. The authors reported that children who had control were more attentive and involved in learning than those who were guided by adults; however, they found no difference in the children’s memory of the content.

Cordova and Lepper (1996) demonstrate that giving elementary school students control over instructionally irrelevant aspects of an educational game resulted in higher motivation and interest, and led to better learning outcomes on a subsequent math test.

Several studies in the area of multimedia learning showed the interactivity principle (Mayer et al. 2003): giving students control over pace and order of instruction reduces cognitive load and increases transfer performance. In one such study conducted with university students, Tabbers and de Koeijer (2010) had a no-control condition, in which students watched a slideshow containing 16 slides, with each slide shown for 13 s with accompanying narration. Students in the other condition were given full control over pace and order of instructional material: they could stop or replay the slides, decide whether to listen to narration, and freely navigate between the slides. The results showed that giving students control over the multiple interactive features led to higher scores on transfer, but at the same time resulted in increased learning time.

Rowe et al. (2011) investigated the relationship between learning and engagement in Crystal Island, a game-based learning environment for microbiology, in which students explore an island where an epidemic has recently spread among a team of scientists. Crystal Island promotes a strong sense of agency, as students decide how to obtain information necessary to solve the problem, by interacting with game characters and other game objects. The authors conducted a study with middle-school students, and found that increased engagement was associated with better learning outcomes and problem solving. Particularly, students who performed better in the game also performed better at post-test, and these students were more successful at gaining information during play.

Sawyer et al. (2017) explored variations in student agency in Crystal Island, in three conditions. In the High Agency condition, students could freely explore the environment. The Low Agency condition required students to visit a series of game locations in a prescribed order, where they had to complete specific problem-solving actions. Finally, the No Agency condition provided students with a video of an expert playing the game to model an ideal path for solving the problem scenario. The Low Agency students made more incorrect actions, but also achieved the highest learning gains. The authors explain that such results were likely due to the extensive engagement with instructional materials.

Snow et al. (2015) conducted two studies with university students to investigate the effect of agency by analyzing students’ choice patterns within the game-based system iSTART-2. Student could choose various learning activities, and could also personalize the environment, which provided students a sense of control. The results showed that the higher quality self-explanations and better performance within the games were closely related to a student’s ability to exercise control choice patterns, as opposed to disordered (i.e., random) choice patterns.

Nguyen et al. (2018) also compared Low/High Agency conditions to investigate whether limiting agency could lead to high engagement and improved learning outcomes in a mathematics educational game, Decimal Point. The High Agency condition allowed students to choose how many and in what sequence they will play the game. The Low Agency condition guided students to play games in a prescribed order. Unlike Sawyer et al. (2017) study, they did not find increase in learning with Low Agency compared to when students learned with High Agency. The authors attributed this result to the effects of indirect control or teachers’ pressure. Additionally, students in the High Agency condition did not exhibit much agency, due to the design of the game layout which led more than half of the High Agency students to play the games in the same order as the students in the Low Agency condition.

However, research suggests that increasing student agency may not be beneficial for all students (Katz et al. 2006). Agency may lead to non-optimal learning such as increased learning time (Tabbers & de Koeijer 2010), or to difficulties with selecting, organizing and integrating information (Mayer 2004; Kirschner et al. 2006).

Learning from Worked/Erroneous Examples

A worked example (WE) consists of a problem statement, its solution, and additional explanations, and therefore provides a high level of assistance to students. Cognitive Load Theory (CLT) states that problem solving without support produces a high level of cognitive load for novices because of unproductive search procedures (Sweller et al. 1998), as a student needs to do a lot of reasoning while solving a problem with no feedback. Intrinsic load, germane load, and extraneous load are three different loads for the working memory in the CLT. Intrinsic load refers to the complexity of the learning materials (the number of interacting information elements a task contains) and the learner’s level of prior domain knowledge. The intrinsic load is higher when a novice student is studying a more complicated problem. It is possible to appropriately manage the intrinsic load by dividing the initial learning goal into a series of sub-goals that require fewer processing resources. Germane load is considered as the information that is related to the learning materials, such as self-explanations, which is a metacognitive process in which students explain provided learning material to themselves (Chi et al. 1994; Renkl 1997). Extraneous load is caused by information during learning that does not directly contribute to learning. Extraneous load refers to the load imposed on students’ working memory that does not contribute to learning. Extraneous load and germane load both depend on the way the task is presented, but only germane load contributes to learning (Clark et al. 2011). In order to solve a problem, a learner must consider both the current problem description and the goal state, find the differences between the problem description and the goal state, and find the problem-solving operators to reduce these differences. Many interacting elements associated with this learning process impose an extraneous load that interferes with learning. WEs may significantly relieve this load on students’ working memory thus allowing the students to learn faster and solve more complex problems (Sweller et al. 1998; Sweller 1988).

Numerous studies have investigated the effects of learning from WEs compared to learning from tutored problems solving (TPS) when the ITS has control over learning tasks (Schwonke et al. 2007; Schwonke et al. 2009; McLaren et al. 2008; Salden et al. 2010; McLaren and Isotani 2011). These studies showed that WEs result in shorter learning times, but commonly there was no difference in the knowledge gain compared to learning from TPS. Contrary to those results, Najar and Mitrovic (2014) conducted a study with SQL-Tutor (Mitrovic 2003; Mitrovic and Ohlsson 1999; Mitrovic 1998). (Mitrovic 2003; Mitrovic and Ohlsson 1999; Mitrovic 1998). They compared how students learn from a fixed set of problems, presented to students as examples only (EO), tutored problem only (PO) and alternating examples and tutored problems (AEP). They found that students learned more in the PO and AEP conditions than from EO condition; furthermore, presenting alternating isomorphic pairs of WE and TPS (AEP) to students produced the greatest learning. Also, they found that AEP significantly improved novices’ conceptual knowledge in comparison with PO condition, but advanced students did not improve significantly from EO condition. Najar et al. (2014) later compared an adaptive strategy to the alternating worked examples and problem-solving strategy (AEP). Similar to Kalyuga and Sweller (2005) study, the adaptive strategy was based on a measure of cognitive efficiency, where the performance (P) was calculated from the assistance the students received, and the students rated their mental effort (R) after solving each problem. The results showed that the adaptive condition led to better learning outcomes. Additionally, the adaptive condition resulted in shorter learning times for novices compared to their peers in the AEP condition. The advanced students in the adaptive condition learned more than their counterparts in the AEP condition.

In contrast to WEs, erroneous examples (ErrExs) present incorrect solutions and require students to find and fix errors. Presenting students with erroneous examples may help them become better at evaluating problem solutions and improve knowledge of correct concepts (van den Broek and Kendeou 2008; Stark et al. 2011), and procedures (Große and Renkl 2007), which, in turn, may help students to learn material at a deeper level. The presentation of ErrExs can vary, depending on the kind and amount of feedback provided, and the choice and sequencing of the learning activities (e.g. ErrExs provided in addition to problem solving, or WEs). Researchers have started to investigate empirically the use of erroneous examples in order to better understand whether, when, and how the erroneous examples make a difference to learning. Siegler (2002) demonstrated that learners were more likely to learn and think deeply about correct concepts that applied to a range of problem types while they explained both correct and incorrect solutions during a brief tutoring session in comparison to their peers who only explained correct solutions. Siegler and Chen (2008) compared WEs to ErrExs for mathematical equality problems. Children who studied and self-explained both the correct and erroneous examples had better learning outcomes than those who received and self-explained only correct examples. Große and Renkl (2007) found the learning benefits of ErrExs for students with a high level of prior knowledge, but not for novices.

Tsovaltzi et al. (2012) indicated that 6th-grade students improved their metacognitive abilities after learning from erroneous examples of fractions with interactive help using an ITS. Erroneous examples with interactive help also improved 9th and 10th grade students’ problem-solving skills and conceptual knowledge. The combination of WEs and ErrExs was shown to lead to improvements in both conceptual knowledge and procedural skills in Algebra (Booth et al. 2013). In our previous study (Chen et al. 2016a), we investigated whether ErrEx in addition to WEs and tutored problem solving would lead to better learning. We compared students’ performance in two conditions: alternating worked examples and problem solving (AEP) condition and a fixed sequence of worked examples/problem solving pairs followed by erroneous examples and problem-solving pairs (WPEP). The results showed that the addition of ErrExs improved learning on top of WEs and PS. When students were asked to explain why incorrect solutions were wrong, they engaged in deeper cognitive processing. Therefore, they were better prepared for the concepts required in the next isomorphic problem.

The discussed studies focused on investigating the effect of presenting varying levels of assistance (worked examples, tutored problems solving or erroneous examples) to students. However, there is a lack of studies that focus on the effect of agency when learning with ITSs. Students who are attempting to self-regulate often face limitations in their own knowledge and skills, which can cause cognitive overload and decreased interest and persistence (Duffy and Azevedo 2015; Harley et al. 2015). Mitrovic and Martin (2003) investigated the effect of scaffolding and fading problem selection in SQL-Tutor. They found that the fading problem selection strategy was effective, in which the system initially selected the problem for the students and explained why particular problems are good, and over time released the control over problem selection to students. Azevedo et al. (2016) demonstrated that deploying adaptive scaffolding and feedback in self-regulated learning produced better learning outcome compared to no scaffolding and feedback.

The goal of our study is to compare the learning benefits of variations of agency at different levels of prior knowledge (lower, higher). To the best of our knowledge, there are no studies that compare the effectiveness of low- and high-agency in ITSs.

Experimental Design

Participants

The study was performed with the volunteers from COSC265, a second-year database course at the University of Canterbury. Before the study, the students had learned about SQL in lectures and also had one lab session. There were 67 volunteers who signed the consent form, but 27 participants were excluded because they did not complete all phases of the study. Of the remaining 40 students, 11 were females, 25 were in the age range 18–20, 8 in the age range 21–23 and the remaining 7 were aged between 24 and 29. The majority (75%) were NZ Europeans, two participants were 2 British and the others were Asian.

Pre/Post Tests

At the beginning of the session, the students took an online pre-test. The pre-test had eleven questions. Questions 1 to 6 measured conceptual knowledge and were multi-choice or true-false questions (with the maximum of 6 marks). Questions 7–9 focused on procedural knowledge; question 7 was a multi-choice question (1 mark), question 8 was a true-false question (1 mark), while question 9 required the student to write an SQL query for a given problem (4 marks). The last two questions presented incorrect solutions to two problems and required students to correct them, thus measuring debugging knowledge (6 marks). The maximum mark was 18. Students received the post-test of similar complexity and length to the pre-test after completing all learning activities. The pre/post-tests are given in the Appendix.

Cronbach’s alpha value for the pre-test is .386, and for the post-test it is .214. The low values of Cronbach’s alpha for knowledge tests that cover a range of various aspects are not unusual (Taber, 2018). There are additional reasons for such low values of Cronbach’s alpha. Our pre/post-tests needed to be short, as the duration of the whole session was 100 min. Therefore, we generated a set of questions to get an understanding of the student’s domain knowledge. Cronbach’s alpha tends to increase with the number of questions, and our tests were short (11 questions each). There are no redundancies in the tests, as each question covers different SQL concepts.

The participants were labeled as LPK students if their pre-test score was less than the Split score (S), defined in Eq. 1. In the equation, M represents the median pre-test score (67%) from our previous study (Chen et al. 2017), while Xn represents the pre-test score of student n. Sn represents the Split score after student n completed the pre-test. Please note that the value of S changes dynamically as students complete the pre-test.

$$ {\boldsymbol{S}}_{\boldsymbol{n}}=\frac{{\boldsymbol{S}}_{\boldsymbol{n}-\mathbf{1}}+{\boldsymbol{X}}_{\boldsymbol{n}}}{\mathbf{2}}\kern0.75em \left({\boldsymbol{S}}_{\mathbf{0}}=\boldsymbol{M},\boldsymbol{n}\ge \mathbf{1}\right) $$
(1)

Materials

The study was conducted in the context of SQL-Tutor, which is a mature, constraint-based ITS for teaching SQL (Structured Query Language) (Mitrovic 2003; Mitrovic and Ohlsson 1999; Mitrovic 1998). We developed three modes of SQL-Tutor to correspond to WE, ErrEx, and PS. Figure 1 shows a screenshot of the problem-solving mode we used in the study. The left pane shows the structure of the database schema, which the student can explore to gain additional information about tables and their attributes, as well as to see the data stored in the database. The middle pane is the problem-solving space. The right pane displays the feedback on the student’s solution once s/he submits his/her solution. SQL-Tutor supports six levels of feedback (Mitrovic and Martin 2000). Simple (positive/negative) feedback, which is the lowest level of assistance, specifies whether the solution is correct or not. Error Flag feedback indicates the part of the solution that is incorrect (as shown in Fig. 1). The Hint level addresses a specific error and states the domain principle violated by the student’s solution. Partial Solution provides the correct version of a clause in which the student made a mistake. Other two feedback levels are List All Errors, which provides Hint-level feedback messages for all mistakes, and Complete Solution, which provides the full solution. The default feedback level is Simple feedback for the first submission unless overridden by the student. The feedback level is automatically moved up to the Hint level, but the student can ask for any feedback level at the time of submitting the solution.

Fig. 1
figure 1

Problem-solving mode of SQL-tutor

The interface of the WE mode is illustrated in Fig. 2. An example problem with its solution and explanation is presented in the center pane. A student can click the “Continue” button to confirm that they s/he has studied the example. The ErrEx mode is illustrated in Fig. 3. An incorrect solution is provided, and the student’s task is to analyse the solution, find and correct the error(s). The student can submit the solution to be checked by SQL-Tutor multiple times, similar to the problem-solving mode. In the example illustrated in Fig. 3, the student has marked the WHERE clauses as being incorrect and has entered answers that s/he believes is correct.

Fig. 2
figure 2

Worked example mode of SQL-tutor

Fig. 3
figure 3

Erroneous example mode of SQL-tutor

Erroneous solutions presented as ErrEx were selected from the set of incorrect solutions submitted by the participants of the Najar and Mitrovic (2013) study, which used the same set of problems as in our study. We analyzed the 465 submissions to the ten problems corresponding to the erroneous examples in our study from the 2013 problem-only condition. There were on average 5.59 submissions per problem (sd = 2.19). We identified the most frequent misconceptions (or the top two misconceptions) that students had about the relevant domain concepts. Erroneous examples include errors that address these misconceptions.

The students were additionally asked to rate the mental effort (R) and answer the self-explanation prompts required to complete a learning activity (i.e. WE, PS, or ErrEx). Figure 4 illustrates the interface of the mental effort rating bar (Lowest: yellow color, Highest: red color). Research has shown that WEs improve conceptual knowledge more than procedural knowledge, whereas problem solving results in higher levels of procedural knowledge (Schwonke et al. 2009; Kim et al. 2009). For that reason, different types of self-explanation should be provided. Consequently, Najar and Mitrovic (2013) designed the conceptual-focused SE (C-SE) prompts and the procedural-focused SE (P-SE) prompts, to complement learning with WEs and PS. C-SE prompts require the student to answer questions about relevant domain concepts after PS, while P-SE prompts required explanations of solution steps after WEs. A C-SE prompt is presented after a problem is solved, in order to aid the student in reflecting on the concepts covered in the problem they just completed (e.g. What does DISTINCT in general do?). On the other hand, P-SE prompts are provided after WEs to assist learners in focusing on problem-solving approaches (e.g. How can you specify a string constant?). Therefore, C-SE and P-SE prompts were used in the previous study (Najar and Mitrovic 2013) to increase learning. In our study, in order to keep our experimental design consistent with that of (Najar and Mitrovic 2013), participants received C-SE prompts after problems, and P-SE prompts after WEs, to complement learning activities so that both conceptual and procedural knowledge is supported. Since erroneous examples provide both correct and incorrect steps and require students to solve the incorrect steps, which refer to the properties of problems and WEs, we provided P-SE and C-SE prompts alternatively after ErrExs. Figure 4 also illustrates a C-SE prompt, located at the bottom right. The student answered the prompt incorrectly; in return, the system indicated the correct option and provided feedback on the option the student selected. Figure 5 shows a similar example, but with positive feedback in response to the student’s answer to the P-SE prompts. Students can attempt each SE prompt only once.

Fig. 4
figure 4

Mental effort rating and conceptual self-explanation

Fig. 5
figure 5

Procedural self-explanation

Procedure

The study was conducted in a single, 100-min-long session. Figure 6 illustrates the design of the study. Once participants completed the online pre-test, they were classified as Low Prior Knowledge (LPK) or High Prior Knowledge (HPR) students, based on their pre-test scores. Then they were randomly assigned to one of the two instructional conditions: (1) Low-Agency condition, which adaptively selected preparation tasks (WE or ErrEx for LPK, and ErrEx or PS for HPK students), or (2) High-Agency condition, in which students could select preparation tasks (WE, ErrEx, PS or skip) by themselves. The participants worked on 20 tasks, organized into ten isomorphic pairs and sorted by increasing complexity. Even-numbered tasks were problems to solve. Odd-numbered tasks are preparatory tasks, and could be presented either as WEs, ErrExs (with one or two errors), or problems to solve. The first preparatory task was different from the others because the student models were empty. For that reason, we used the pre-test score to determine the type of the first preparatory task. If the conceptual score on the pre-test was lower than the procedural and debugging scores, the first preparation task was presented as a worked example. If the student’s procedural score was lower than the other two scores, s/he received a problem as the first task. If the lowest score was on debugging questions, the first task was presented as an ErrEx.

Fig. 6
figure 6

Study design

Adaptive Strategy

The adaptive strategy uses Cognitive Efficiency (CE) to decide what the preparation task should be, based on the student’s problem-solving performance on the previous problem. CE is computed as the quotient between the problem-solving score on the most recent problem (P) and the self-reported mental effort score (R), CE = P ÷ R, as originally proposed in (Kalyuga and Sweller 2005). Both scores had the same range, 0 (lowest) to 9 (highest). The participants were asked to report the effort and answer the SE prompt after each task they completed (Fig. 4).

In constraint-based tutors, domain knowledge is represented as a set of constraints (Mitrovic 2003; Ohlsson 1994). Each constraint has two conditions, the relevance and satisfaction condition. When the student’s solution is matched to a constraint, if the relevance condition of a constraint is met, the satisfaction condition is checked next. Therefore, a relevant constraint can either be violated (when the satisfaction condition is not met) or satisfied. A solution is incorrect if it violates one or more constraints; therefore, the solution can be scored based on the violated or satisfied constraints. SQL-Tutor contains six key concepts, represented by the SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY clauses. Each concept can be scored according to how many constraints are violated for that concept. The student’s score for a clause is calculated using Eq. 2, in which Cv represents the number of violated constraints, while Cr represents the number of relevant constraints. When a solution does not violate any constraints for a clause, its score C is 1.

$$ \mathrm{C}=1-\raisebox{1ex}{${C}_v$}\!\left/ \!\raisebox{-1ex}{${C}_r$}\right. $$
(2)

However, Eq. 2 does not produce accurate scores when there are several violated constraints that come from the same mistake. For instance, if a solution missed one attribute in the FROM clause, several constraints will be violated. Equation 2 results in a big penalty in that case. To deal with this situation, we investigated Eq. 3 instead.

$$ \mathrm{C}=\left\{\begin{array}{c}{\log}_{\left(1/{\mathrm{C}}_{\mathrm{r}}\right)}\left({\mathrm{C}}_{\mathrm{v}}/{\mathrm{C}}_{\mathrm{r}}\right),\kern0.5em 0<{\mathrm{C}}_{\mathrm{v}}<{\mathrm{C}}_{\mathrm{r}}\\ {}\ 1,\kern7.25em {\mathrm{C}}_{\mathrm{v}}=0\end{array}\right. $$
(3)

We compared the scores produced by a human marker for the problem-solving question from the pre-test (Question 9). The mean score for 58 solutions was .77 (sd = .303). Equation 3 produces scores with the mean of .84 (sd = .26). The correlation between manual scores and the scores produced by Eq. 3 is significant and high (r = .864, p = 0). However, a student’s incorrect solution may not violate all relevant constraints. For example, one solution for Question 9 violated 5 out of 10 relevant constraints, and the human marker allocated 0 marks to it, while Eq. 3 resulted in the score of .301. For solutions with a higher number of relevant constraints, the difference between manual and automatically-calculated scores was larger. To handle this situation, we used Eq. 4. C is 0 if the number of violated constraints is equal to the number of relevant constraints, as in Eq. 2. The scores produced by Eq. 4 had the mean of .808 (sd = .282), and the correlation was stronger (r = .921, p = .000).

$$ \mathrm{C}=\left\{\begin{array}{c}{\log}_{\left(1/{\mathrm{C}}_{\mathrm{r}}\right)}\left({\mathrm{C}}_{\mathrm{v}}/.5{\mathrm{C}}_{\mathrm{r}}\right),\kern0.5em 0<{\mathrm{C}}_{\mathrm{v}}<{\mathrm{C}}_{\mathrm{r}}\\ {}\ 1,\kern7.25em {\mathrm{C}}_{\mathrm{v}}=0\\ {}\kern0.75em 0,\kern7em \ {\mathrm{C}}_{\mathrm{v}}={\mathrm{C}}_{\mathrm{r}}\end{array}\right. $$
(4)

Equation 5 calculates the solution score P as the sum of scores for all clauses the student specified (with a maximum of 6 clauses). Note that the clause score is zero and Eq. 4 is not applied if the clause is empty. The weight of a clause (Wi) is calculated on the basis of the ideal solution for a problem. Ct is the number of constraints relevant for the ideal solution. The weight of a clause (Wi) is calculated as a quotient of the number of relevant constraints for that clause (Cci) and Ct, as shown in Eq. 6.

$$ \mathrm{P}=\sum \limits_{i=1}^n{W}_i\ {C}_i $$
(5)
$$ {W}_i=\raisebox{1ex}{${C}_{ci}$}\!\left/ \!\raisebox{-1ex}{${C}_t$}\right. $$
(6)

The maximum value for P when using Eq. 5 is 1 (when the student’s solution is correct). Since the maximum value of R is 9, we need to have the same maximum value for performance, which gives us the final Eq. 7:

$$ \mathrm{P}=9\ \sum \limits_{i=1}^6{W}_i{C}_i $$
(7)

The CE score is computed after the student provides the mental effort rating. Figure 7 shows the relationship between CE and preparation tasks, while Figs. 8 and 9 illustrate how the preparation task (i.e. the first element of a pair of learning activities) is selected, based on CE and students’ prior level of knowledge. There were two types of erroneous examples: ErrExs with one error (1-error ErrEx) or two (2-error ErrEx). For HPK students, if CE was higher than 1, that illustrated very high problem-solving performance, and the preparation task was skipped. CE below 1 and greater than 0.75 showed a relatively good performance on the previous problem, and the preparation task chosen was a problem to be solved. An HPK student received a 2-error ErrEx before the next problem if CE was between 0.75 and 0.5, otherwise, they received a 1-error ErrEx if CE was lower than 0.5. For LPK students, if CE was higher than 0.5, the preparation task was a 2-error ErrEx. If CE was below 0.5 and greater than 0.25, they received a 1-error ErrEx as the preparation task. A worked example was provided to an LPK student if his/her CE was below 0.25.

Fig. 7
figure 7

Relationship between CE and preparation tasks

Fig. 8
figure 8

Adaptive selection of learning activities for LPK students

Fig. 9
figure 9

Adaptive selection of learning activities for HPK students

Fig. 10
figure 10

Self-selection prompt

Learning Activity Selection in the High-Agency Condition

The High-Agency condition allowed students to select preparation tasks on their own, as illustrated in Fig 10.

Results

Our study was conducted at a time when the participants had assessments due in other courses they were taking. Since participation was voluntary, only 40 students completed all phases of the study. Such a big attrition rate necessitated further investigation. We compared the incoming knowledge (i.e. the pre-test scores) of the participants who completed the study with those who abandoned it, in order to identify whether they were comparable or whether it was the weaker students who did not complete the study.When comparing the pre-test scores (Table 2), we found no significant differences between the scores of those students who completed or abandoned the study. As we mentioned above, the pre−/post-test consisted of conceptual, procedural, and debugging questions. There were also no significant differences in the scores for conceptual, procedural, and debugging questions. Therefore, the 40 remaining participants had the same level of background knowledge as the other participants.

Table 2 Pre-test scores (%) for participants who completed/abandoned the study

Research Question 1: Do the Low- and High-Agency Conditions Differ on Learning Outcomes?

There were 20 participants in the Low-Agency condition. We removed an outlier from the High-Agency condition, leaving 19 participants. Table 3 presents the test scores for the participants in the two conditions. We developed a repeated measures mixed effects model, with the knowledge score as the within-subject factor with two levels (pre- and post-test score), and the group as the between-subject factor. There was no significant interaction between the test scores and the group, F(1,37) = .002, p = .963. There was a significant difference in test scores between the pre- and post-test, F(1,37) = 55.56, p < .001, partial ƞ2 = .661, but not a significant difference between groups, F(1,37) = 1968.506, p = .094, partial ƞ2 = .074. Therefore, the participants in both group significantly improved their knowledge from the pre- to the post-test.

Table 3 Statistics for the two conditions

We calculated the effect size (Cohen’s d), with the following assumption: d ≥ 0.8 (large effect), d ≥ 0.5 (medium effect) and d ≥ 0.2 (small effect) (Cohen 1988). The effect size for the post-test (d = 0.47) was average. In the Low-Agency condition, the pre-test and post-test scores were positively correlated, and the correlation was significant. On average, the participants spent 94 min interacting with the learning tasks, and there was no significant difference in interaction time between the two conditions.

As explained earlier, preparation tasks for the Low-Agency condition were selected depending on Cognitive Efficiency (CE) scores on the previous problem and the students’ prior knowledge. Therefore, HPK students in the Low-Agency condition could receive PS, 2-error/1-error ErrEx as the preparation task, or skip to the next PS, while LPK students in the Low-Agency condition could receive a 2-error/1-error ErrEx or a WE. The students in the High-Agency condition could select any type of learning activity as the preparation task or choose to skip the preparation task entirely to move on to the next PS. The CE scores were calculated in both conditions after each problem was solved. Table 4 reports the CE scores, and the number of activities of different types the participants completed in the two conditions. There was no significant difference between the two conditions on the CE scores. On average, the students completed 18 learning activities, ten of which were problems to be solved.

Table 4 Student performance in the two conditions

Table 4 also reports the number of preparatory tasks the students in the two conditions completed. The Low-Agency group received significantly fewer problems (p = .001) and WEs (p < .001), but more ErrExs (p < .001) than the High-Agency group. The participants in the High-Agency group selected approximately the same number of preparation tasks of different types (i.e. problems, ErrExs, WEs and skips), while for the Low-Agency group the adaptive strategy selected the activities according to the CE scores.

Research Question 2: Are Learning Outcomes Different for Students with Low/High Prior Knowledge?

Once a student submitted the pre-test, SQL-Tutor classified him/her immediately as HPK or LPK, as described previously. To confirm whether Eq. 1 identified HPK/LPK students correctly, we additionally used the median split on the pre-test to classify students. The median split on the pre-test results in 22 LPK and 18 HPK students, which is the same as using Eq. 1.

Table 5 presents the test scores and the normalized learning gains for the LPK/HPK students in the two conditions. We constructed a generalized mixed model with the normalized learning gain as the response variable, and the group (i.e. agency) and the level of student’s prior knowledge (i.e. LPK or HPK) as the between-subject factors. The interaction between group and level was not significant, F(1,35) = 4.062, p = .052, partial ƞ2 = .104. There was a significant main effect of the level, F(1,35) = 5.87, p = .021, ƞ2 = .144. Therefore, the normalized gain of LPK students was significantly higher than the normalized gain of HPK students.

Table 5 Detailed test scores for LPK/HPK students

Table 6 presents the CE scores and information about the activities the students performed in the two groups. We developed a general linear model, with the student level (LPK or HPK) and group (i.e. agency) as fixed factors. There was no significant interaction between group and level for CE, but there was a significant main effect of level, F(1.35) = 5.386, p = .026, partial ƞ2 = .133. For the total number of learning activities the participants completed, there was no significant interaction between group and level, but there was a significant main effect of level, F(1.35) = 6.337, p = .017, partial ƞ2 = .153. For the number of problems received as preparatory activities, there was no significant interaction between group and level, but there was a significant main effect of group, F(1.35) = 9.997, p = .003, partial ƞ2 = .222. For ErrEx, there was a significant interaction between group and level, F(1,35) = 6.68, p = .014, partial ƞ2 = .16. For the number of WEs, there was no significant interaction between group and level, but there was a significant main effect of group, F(1.35) = 11.225, p = .002, partial ƞ2 = .013. For the number of skips, there was no significant interaction between group and level, but there was a significant main effect of level, F(1.35) = 6.337, p = .017, partial ƞ2 = .072.

Table 6 CE and the number of activities for LPK/HPK students

In the High-Agency condition, students selected the preparation task on their own. There was no significant difference between LPK and HPK students on the post-test scores (Table 5). Surprisingly, the CE scores of HPK and LPK students in the High-Agency condition were approximately the same, and they completed the same number of activities (Table 6). To further investigate this interesting finding, we analyzed the student’s task selection ‘step size’ and self-assessment accuracy between LPK/HPK students in the High-Agency condition, based on the Cognitive Efficiency and students’ task selection. Figure 11 presents the relationship between the student’s selection (High-Agency) and the system’s selection (Low-Agency), which could be used to infer a recommended ‘step size’ for task selection (e.g., a student selected WE as the preparation task and the system selected PS as the preparation task means a step size of +3). A positive step size means a recommendation to select a more challenging preparation task, a step size of 0 means a student selected the same preparation task as the system’s selection, and a negative step size means a recommendation to select a simpler preparation task.

Fig. 11
figure 11

Step Size of Preparation Task Selection

The mean step size for the High-Agency students is 0.49 (sd = 1.28). Specifically, the mean step size for the HPK students is 1.32 (sd = 1.14), while for the LPK students the mean step size is −0.07 (sd = 1.08). The difference is significant (U = 21, p = .038). The selections of LPK students were close to the selections the adaptive strategy would make, which explains the significant improvement for LPK students from the pre- to post-test.

Discussion and Conclusions

Some previous studies found that the increased student agency resulted in better learning outcomes (Snow et al. 2015; Rowe et al. 2011), while Sawyer et al. (2017) found that in their study the Low-Agency condition led to higher learning gains. In our study, the participants in both Low- and High-Agency conditions needed to solve a fixed sequence of 10 problems. Before the problems, the students received preparatory tasks. The adaptive strategy used in the Low-Agency condition provided WE or ErrEx as preparatory tasks to students with lower prior knowledge; for the students with higher prior knowledge, the preparatory tasks could have been skipped, if the students demonstrated high performance on previous problems, or they received ErrEx or problems to solve. In the study, we compared this Low-Agency condition to the High-Agency condition, which enabled students to select preparatory learning activities on their own.

The students improved significantly from the pre-test to post-test in both groups. Even though Low-Agency students had higher means of post-test scores and CE scores than High-Agency students, we found no significant differences in post-test scores and CE scores between the Low- and High-Agency students; therefore Hypothesis 1 was not confirmed.

We were also interested in whether Low- and High-Agency had differential effects for students with different prior knowledge. The LPK students had significantly higher learning gain compared to HPK students. The HPK students improved significantly from pre- to post-test only in the Low-Agency condition. Unlike other studies, such as (Zimmerman 2008; Mitrovic 2001), in which advanced students performed better when given freedom and control to perform actions, we did not find any significant improvements for HPK students in the High-Agency condition. On the contrary, HPK students in the Low-Agency condition had higher post-test scores than the counterparts in the High-Agency condition with a large effect size (d = .95). Therefore Hypothesis 2a was rejected. There was no significant difference in learning gains between LPK students in the two conditions; therefore, Hypothesis 2b was also not confirmed. The Low-Agency condition was beneficial for both LPK/HPK students.

To determine why LPK and HPK students performed similarly on the post-test in the High-Agency condition, we proposed the ‘step size’ to infer whether students selected harder or simpler preparation tasks compared to the adaptive strategy used in the Low-Agency group. The results revealed that HPK students selected significantly more challenging learning activities in comparison to LPK students, but the selections made by LPK students were similar to the system selections. The findings suggest that the adaptive strategy in the Low-Agency condition was efficient in selecting learning activities for both LPK and HPK students.

One limitation of the presented study is the small sample size. In order to achieve a power of 0.8, with the effect size d = 0.47 (i.e. the effect size on the post-test scores for the two conditions), it is necessary to have 138 participants (69 in each condition). Our study was conducted in an introductory database course taught at the University of Canterbury, which normally has about 200 enrolled students. The timing of the study coincided with assignments and lab tests in other courses the participants were taking, therefore, many students did not attend the scheduled labs, and some participants did not complete the study.

Another limitation of the study is in the difference in the types of preparatory activities the participants worked on in the two conditions. We found significant differences in the types of activities students from the High Agency group selected in comparison to the Low Agency group. In the Low Agency group, the adaptive strategy selected a high number of erroneous examples, while in the High Agency group the participant worked relatively equally on all types of learning activities. The preparatory activities affect learning differently; therefore our conclusions are not solely due to the difference in agency.

Several exciting research questions remain to be answered. We need to understand better the role of prior knowledge in learning from examples. All participants in our studies were familiar with SQL because they learned SQL in the lectures before participating in the studies. Even though our adaptive strategy is beneficial for students with different levels of prior knowledge, the results of our studies may be different with students who are fresh to the domain of SQL queries; it would be interesting to investigate the learning effect of using examples with this group of students.

Our adaptive strategy selects the learning activities for students based on their cognitive efficiency score on previous problems. The performance is computed from the student’s score on the first submission of a problem. However, students may simply ask for feedback by submitting an empty solution initially. Therefore, in future work, the performance scores could be calculated more precisely by adding the time control as well as the feedback element that may affect students’ learning during problem solving. Additionally, as we mentioned above, constraint-based SQL-Tutor models students by comparing students’ solutions to ideal solutions provided by the teacher. A violated constraint represents an error, which translates to incomplete or incorrect knowledge. Our adaptive strategy is based on the number of violated and relevant constraints, but it does not consider how well the student knows each constraint. One of the future directions is to further enhance the adaptive strategy, in which the calculation of performance will take into account the complete student model rather than only violated/satisfied constraints from the most recent problem.

We proposed a High-Agency strategy that allowed students to select learning activities on their own. We found, like the Mitrovic and Martin (2003) study, that LPK students who selected learning activities themselves performed as well as LPK students who received learning activities adaptively. HPK students chose more challenging learning activities when they did not receive any instruction on the activity selection. Thus, they may not have been able to identify gaps or misconceptions in their knowledge, which could have helped them to select appropriate learning support on their own. Furthermore, students who are attempting to self-regulate often face limitations in their own knowledge and skills, which can cause cognitive overload and decreased interest and persistence (Duffy and Azevedo 2015; Harley et al. 2015). Azevedo et al. (2016) demonstrated that deploying adaptive scaffolding and feedback in self-regulated learning produced better learning outcome compared to no scaffolding and feedback. Therefore, using adaptive scaffolding or feedback to guide students in High Agency would be an interesting topic for future research, particularly for HPK students.