Investigating the Impact of Backward Strategy Learning in a Logic Tutor: Aiding Subgoal Learning towards Improved Problem Solving

Learning to derive subgoals reduces the gap between experts and students and makes students prepared for future problem solving. Researchers have explored subgoal labeled instructional materials with explanations in traditional problem solving and within tutoring systems to help novices learn to subgoal. However, only a little research is found on problem-solving strategies in relationship with subgoal learning. Also, these strategies are under-explored within computer-based tutors and learning environments. Backward problem-solving strategy is closely related to the process of subgoaling, where problem solving iteratively refines the goal into a new subgoal to reduce difficulty. In this paper, we explore a training strategy for backward strategy learning within an intelligent logic tutor that teaches logic proof construction. The training session involved backward worked examples (BWE) and problem-solving (BPS) to help students learn backward strategy towards improving their subgoaling and problem-solving skills. To evaluate the training strategy, we analyzed students' 1) experience with and engagement in learning backward strategy, 2) performance, and 3) proof construction approaches in new problems that they solved independently without tutor help after each level of training and in post-test. Our results showed that, when new problems were given to solve without any tutor help, students who were trained with both BWE and BPS outperformed students who received none of the treatment or only BWE during training. Additionally, students trained with both BWE and BPS derived subgoals during proof construction with significantly higher efficiency than the other two groups.

Learning to derive subgoals reduces the gap between experts and students and makes students prepared for future problem solving.Researchers have explored subgoal labeled instructional materials with explanations in traditional problem solving and within tutoring systems to help novices learn to subgoal.However, only a little research is found on problem-solving strategies in relationship with subgoal learning.Also, these strategies are under-explored within computer-based tutors and learning environments.Backward problem-solving strategy is closely related to the process of subgoaling, where problem solving iteratively refines the goal into a new subgoal to reduce difficulty.In this paper, we explore a training strategy for backward strategy learning within an intelligent logic tutor that teaches logic proof construction.The training session involved backward worked examples (BWE) and problem-solving (BPS) to help students learn backward strategy towards improving their subgoaling and problem-solving skills.To evaluate the training strategy, we analyzed students' 1) experience with and engagement in learning backward strategy, 2) performance, and 3) proof construction approaches in new problems that they solved independently without tutor help after each level of training and in post-test.Our results showed that, when new problems were given to

Introduction
Attaining the skill to define subgoals is an important component of learning that leads to efficient problem-solving.Defining subgoals or 'subgoaling' in problem-solving refers to the process of decomposing a problem into smaller and easier sub-problems where each sub-problem has a contribution to the overall goal [1], or the process of refining the overall goal such that the refined goal (i.e.subgoal) eliminates or reduces the difficulty in achieving the original goal directly [2][3][4].Experts produce subgoals during problem-solving more efficiently and easily, and attaining the skill to generate subgoals can induce expert-like behavior in novices [1,5].Thus, researchers from different educational domains (mathematics, statistics, probability, geometry, programming, etc.) have explored varied methods to include subgoaling during problemsolving, such as subgoal labeled examples [1,6], expert explanations for subgoals, and asking students to write explanations for given subgoals [7,8], etc.).These studies suggest that subgoal-infused tutoring methods helped to improve novice performance, using test scores to measure learning.However, these score-based evaluations do not provide enough insight into how these methods impacted students' subgoaling skills.
Additionally, existing research scarcely investigated problem-solving strategies as a way to improve students' subgoal generation skills.Only a few research studies were found that compared expert and novice problem-solving strategies [for example, [9,10]], or investigated subgoal generation strategies in general [11].The two most common problem-solving strategies found in literature are forward and backward chaining.Forward chaining consists of starting from the givens in the problem description and working towards the goal at each step.Backward chaining consists of starting from the goal and refining the goal at each step until the refined goal can be justified by the givens.In other words, backward chaining can be seen as refining the goal to form subgoals at each backward step.Research on human cognitive processes suggests that students try to think backwards more when they need to refine the goal [9,10].However, they find carrying out a backward step to refine the given goal (i.e.subgoaling) difficult during problem-solving [12][13][14].These are the few research we know of that investigates difficulties faced by students when applying backward strategy during problem-solving.
Unlike prior studies, the main purpose of this study is to explore backward (BW) strategy learning as a medium to aid subgoal learning to improve students' problem-solving skills.To serve this purpose, we designed a training session within an intelligent logic tutor, Deep Thought, where we infused backward worked examples for demonstration of the strategy and backward problem solving for practice.Then, we investigated the impact and efficacy of our training strategy based on: 1) students' experience during the training; 2) students' score-based performance in new problems after training; and 3) student approaches to proof construction.We identify efficient subgoal derivation using a graph mining approach called Approach Maps [15].The key finding of our evaluation suggests that, although backward strategy learning can be difficult for students, it can improve students' subgoaling and problem-solving skills.The main contributions of this study are: 1) an efficient training strategy for backward strategy learning to improve subgoaling and problem-solving skills which can be easily adapted for tutors from other structured problemsolving domains; 2) demonstration of a graph mining based approach analysis that helped us answer how the training session impacted students' subgoaling skill; and 3) important insights on students' problem solving approach obtained from the evaluation results that could be helpful in problem and training session design within automated tutors to improve students' experience and skills.

Related Work
Importance of Subgoals in Learning and Problem Solving: Existing literature claims that providing subgoals reduces students' cognitive load and helps them perform better [16,17].Here, cognitive load refers to the load on a students' working memory during learning through problem-solving [18].Thus, researchers from varying domains (including mathematics, natural, and computer sciences) explored subgoal-based instruction during traditional problem solving, and within tutors/learning environments to improve student performance.Margulieux et al. [16] showed that subgoal-labeled materials reduced students' cognitive load and helped them to be better in programming problem-solving in Android App inventor [19].Here, a subgoal label refers to a name applied to a set of steps in a problem solution, that segments the overall solution, reducing difficulty [1].Margulieux and Catrambone [20] also explored subgoal-labeled instructional text coupled with subgoal-labeled worked examples as instructional support within Android App Inventor.They found that students receiving the support outperformed students who do not receive it.Catrambone and Holyoak [21] found evidence that providing subgoals during problem-solving can also be helpful in the transfer of problem-solving skills in domains like algebra and probability.Morrison et al. [22] conducted an experiment where they compared giving subgoal labels against asking students to generate subgoals while they solved programming puzzles, called Parson's problems, that require ordering given pieces of code.They found that students who received subgoal labels performed better than those who had to generate subgoals in low cognitive load Parson's problem post-assessments.
Zhi et al. [23] proposed a data-driven algorithm for subgoal extraction using prior student programming problem attempts.Marwan et al. [24] used this technique to extract subgoals, and presented those along with programming tasks, within a block-based programming environment, iSnap.They found evidence of better performance, higher task completion rate, and less idle time when subgoals were presented in the system.Additionally, Shabrina et al. [25] found evidence that when subgoals and subgoal completion-based feedback are presented in iSnap, students closely followed the subgoals, and tried to achieve them, which shaped their approach and interaction with the environment.Cody and Mostafavi [26] provided subgoals during logic proof construction within an intelligent logic proof tutor.Contrary to research that found positive results with subgoals, they observed that students who received subgoals skipped more problems, and had a significantly higher dropout rate.
Aiding Subgoal Learning or Subgoaling: Although subgoals can reduce excessive cognitive load during problem-solving and thus improve students' performance, they can hinder the learning process and make students unfit for solving future problems [27].Thus, researchers also explored methods that might help students to learn subgoaling so that they can form subgoals themselves for a new problem.Existing studies showed that subgoal labels that are not context-specific but are more abstract, are most effective in fueling transfer and helping students to learn subgoaling [6,28].Richard Catrambone [1,28,29] showed that worked examples with abstract subgoal labels for groups of steps helped students to learn subgoaling better and these students were able to successfully transfer the skill to problems that follow a different procedure than what they did during training.Morrison et al. [30] explored two instructional methods in introductory programming tasks: 1) subgoal labels given with the task, and 2) requiring students to generate their own subgoals.Their hypothesis of the first group performing better in posttests was only partially supported by statistical analysis results.In a recent study, Margulieux and Catrambone [7,8] showed that students learned better in posttests when they were presented with subgoals and asked to write explanations for the subgoals, when compared to generating their own subgoals.
The main takeaway from existing research focused on helping students to learn subgoaling is that abstract or context-free subgoals aid students better in learning the subgoaling procedure.Also, students learn better when they generate explanations for subgoals themselves rather than when they are given explanations.However, adding the requirement for generating self-explanation during training showed evidence of student struggle as measured in terms of spent time.
Problem-Solving Strategies and Learning to Subgoal: In a backward strategy or backward chaining, problem-solving is carried out starting from the goal and in each step, the goal is refined to a new subgoal until the initial problem state is reached.Backward strategy is often compared with meansends analysis [31], which involves carrying out steps to reduce the difference between givens and the goal of a problem.Existing literature claims experts can form subgoals while working in the forward direction due to their high prior knowledge in a domain [9,10,32,33].Also, they may switch between forward and backward strategies during problem-solving [9].Prior work has shown that knowing how and when to use each problem-solving strategy is a sign for preparation for future learning [34,35].However, due to low prior knowledge, novices tend to use backward strategy to figure out substeps of a problem [10].Matsuda et al. [14] explored both forward and backward strategy in a geometry theorem proving tutor and observed that students who learned forward strategy performed better than those who learned backward strategy.They concluded being efficient in using backward strategy is hard for students as they face difficulty in coming up with unjustified statements (subgoals) that are to be proven next.
In this study, unlike prior research, instead of exploring instructional methods, we aimed to aid subgoal learning by infusing a training that induces students to learn backward strategy through demonstration (using backward worked examples (BWE)), and practice (using backward problem solving (BPS)) within Deep Thought.The training phase was long, involving 20 logic proof construction problems that should provide students with enough time to master or adjust to the strategy.We evaluated our training procedure based on students' experience/responses to the training, and test score-based performance.Additionally, unlike existing studies, we investigated efficiency in subgoal derivation while students solved a new problem using approach map analysis.

Deep Thought, The Logic Proof Tutor
We conducted our study using an intelligent logic tutor, Deep Thought (DT).In DT [Figure 1], students are given logic proof construction problems where the premises and the conclusion to be proved are given as visual nodes.A list of logic rules is provided from where students can select rules to apply on premises to derive new ones to reach the conclusion.During the training levels, the tutor also provides on-demand next-step hints and proactive hints, called assertions, that appear when the system predicts that students need help [36].DT is organized into 7 levels: one pretest level with 4 problems, 5 training levels with 4 problems in each level, and one post-test level with 6 problems.Each problem is either of type Worked Example (WE) where the tutor constructs the proof [Figure 2a] or Problem-Solving (PS) where students are required to construct the proof [Figure 2b].The tutor does not offer any hint or support in the last problem of each level or in the posttest problems.Within DT, logic proofs can be constructed using both forward (FW) [Figure 2b] and backward (BW) [Figure 3b] strategies.

Deployment and Data Collection
We deployed DT with our three training treatments in a Discrete Mathematics course for computer science majors offered in a public research university in the United States.The students used DT for a take-home assignment.Each student participating in the course was assigned to one of the three conditions after they completed the pretest level.The assignment algorithm distributes students equally across the treatment groups while ensuring that the pretest scores of the three groups come from the same distribution.At the end of the experiment, 168 students completed all 7 levels of the tutor with 59 students coming from group C, 55 from T 1 , and 54 from T 2 .While students worked in DT, our system collected all information required to replay and reconstruct their proof construction attempts for all problems.The collected data includes all student interactions with the interface (click, selection/deselection of rules/nodes, etc.), and proof steps (derivation/deletion record of nodes with associated predecessors and rules, direction of derivation [FW/BW], spent time, etc.).For each PS problem completed by the students, they were assigned a score that is a function of accuracy and time taken to construct the proof.Note that each logic proof problem given in DT can have multiple solutions where the shortest proof is considered to be the optimal one.Thus, shorter proofs with fewer correct and incorrect rule applications and efficient proof construction in less time received higher problem scores (max score = 100).Note that this score function was devised as a measurement of learning only for research purposes that considers both efficiency in proof construction and optimality of the constructed proofs and has been used in prior research [34,35,37,38].However, students' course grades were assigned only based on completion of problems given within DT so that they are not impacted by the experiment.

Research Questions
The main goal of this study was to aid students' subgoaling skills by introducing and having them learn and practice BW strategy while they construct logic proofs.Our hypothesis was that 'Learning BW strategy will improve student performance and make them better prepared for new problem solving by improving their subgoaling skill'.However, prior research [14] states that learning and engaging in BW strategy causes struggle for students.We additionally investigate students' training experience and their response to it.Thus, our investigation on the efficacy and impact of backward strategy learning training focused on the following three research questions: • RQ1 (Students' Experience, and Response): How does the backward strategy training impact students' experience in DT, and how do they respond to the training?• RQ2 (Impact on Performance): How does learning backward strategy impact students' performance in new problems?• RQ3 (Impact on Subgoaling Skills): How does backward strategy learning impact students' subgoaling skills?
In the subsequent sections, we describe statistical and graphical approach analyses that we conducted to address each of our research questions and corresponding results.Throughout this study, we used Kruskal-Wallis tests to find significant differences (p < 0.05) across the training groups and performed posthoc pairwise Mann-Whitney U tests with Bonferroni Correction (corrected p < 0.016) to find an ordering of the groups.Note: To report the results of statistical analyses, we show means as a measure for central tendency, and p-values from pairwise posthoc tests as evidence while comparing the three training groups.For details check out the supplementary materials.We found significant differences in the average of time-related metrics in the training problems.Both step time and overall problem time were significantly higher for T 2 than C and T 1 : in case of problem time, P M W (T 2 > C)3 =0.006, and P M W (T 2 > T 1 ) < 0.0001; for step time, P M W (T 2 > C)=0.004, and P M W (T 2 > T 1 )=0.0001.This trend in avg.step and problem time of T 2 students during training were due to BPS problems [Notice BPS metrics in Table 1].This implies that carrying out BW steps was difficult for students which required more time during training.Interestingly, T 1 and T 2 had significantly fewer step counts than C (P M W < 0.0001, and P M W =0.001 respectively for the two cases).These statistics suggests exposure to BW strategy possibly pushed T 1 and T 2 students towards shorter solution attempts (i.e.thinking/working BW potentially encouraged students to take better steps to reduce the distance between the goal, and given premises).
Additionally, we observed that T 2 students required significantly more sessions than C and T 1 students (P M W < 0.0001 in both cases) to complete training problems.Here, each new login or edit separated by a long time period from previous edit is considered a new session.T 2 students also had a significantly higher restart count (where students electively start a problem over again) than T 1 (P M W = 0.0006) and marginally higher than C (P M W =0.02).These statistics suggest T 2 students struggled during training due to BPS problems, since BPS problems are restrictive by design, requiring students to

Student Engagement in Backward(BW) Strategy
As a measure of students' independent engagement in BW strategy, we used backward (BW) action count which is representative of student intention or attempts to work backwards.We calculated the counts across the three training groups for the 4 th problem of each training level (Level 2-6), and the 6 posttest problems in Level 7, since no training treatment or tutor help are given in these problems.We report mean BW action counts for each problem across the groups and p-values when significant differences were found in Table 2.
As shown in However, in the later phases of training, and in the post-test problems, they possibly became calculative in carrying out a BW action as indicated by the reduced count (∼3 -∼11).Our later approach analyses (Section 6.1) confirmed that in the early phases of training students used BW strategy inefficiently, with too many BW actions.However, with time T 2 possibly adapted to the new skill and was able to use it efficiently (fewer but correct BW steps) to refine goals to subgoals leading to better performance.
From our analysis of training phase metrics, and students' BW strategy engagement, it is evident that BPS problems posed T 2 students a significant amount of struggle (needed more time, more sessions, and more restarts).
However, when given new problems, this group voluntarily engaged in BW actions when using the strategy was not even a requirement.On the other hand, group T 1 did not seem to face many struggles.But, in terms of BW strategy usage, they behaved similarly to control C. The statistics described above suggest that only BWE may not be motivating, or educational enough for students to attempt to derive propositions in the backward direction.Thus, we conclude that, to successfully motivate students to engage in BW strategy, both examples (BWE), and practice (BPS) are necessary.Additionally, a long training period might be necessary to allow students sufficient time to adapt and become efficient in using the strategy.

RQ2: Impact on Performance
In this section, we investigate students' performance in relationship with their exposure to BW strategy through BWE (T 1 ), or both BWE, and BPS (T 2 ), or none (no exposure control group C).We calculated students' problem scores across the three training groups over the training and the post-test periods.Again, we focus on the problems where no training treatment/tutor help was given (training PS problems: 2.4-6.4,and post-test problems 7.1-7.6)and students solved them independently with the option of using any of forward (FW) or backward (BW) strategies.Additionally, we calculated problem-solving time, and step count for each problem to investigate the source of higher/lower scores.We report average metric values and significant differences (p-values from Mann-Whitney U tests) across the training groups in Table 4 (Problem Scores) and 3 (Problem Time and Step Count).
Problem Scores: As reported in Table 4, there were no significant differences in problem scores across the groups in the pretest.However, in the earlier phases of training, T 2 received significantly lower scores (in prob.2.4, and 4.4), or lower scores on average (in prob.3.4) than the other two groups [Table 4, column 2-4;row 2-4].As the training progressed, T 2 outperformed both C, and T 1 (in prob.5.4), or performed at least as good as them (in prob.6.4) [Table 4, column 2-4; row 5-6].In the posttest [Table 4, column 2-4; row 7-12], T 2 consistently outperformed the other training groups in all problems (higher avg. in 7.1-7.2,and significantly higher in 7.3-7.6).The problem scores suggest T 2 became better at problem-solving over the period of training.On the other hand, T 1 mostly performed similar to C (except for 3.4-5.4[insignificanthigher avg.]), and did not show consistent signs of improvement.
Recall that, in problems 2.4-4.4,T 2 students were observed to engage in too many BW actions [Table 2, column 4;row 2-4].Note that, the solution of each problem in DT is 5-15 steps long, and too many backward actions indicate unnecessary propositions/actions were explored by the students.However, in the later levels, they engaged in fewer BW actions (possibly only the correct ones) and also received higher scores.This trend suggests that T 2 students improved their efficiency over time in using the BW strategy.4].This could be possibly due to deriving unnecessary/incorrect steps which were not part of the proof, or due to requiring more time to derive each step.However, as students progressed in DT, problem-solving time for T 2 decreased on average (from 5.4-7.2shown in Table 3).For problem 7.3-7.6[Table 4, column 2-4; row 9-12], T 2 took significantly lower time than T 1 and C while constructing proofs.Due to the similar pattern of problem-solving time and scores, we concluded that learning BW strategy helped T 2 students to identify the correct proof construction approach in less time, improving their scores.Step Count: Logic proof problems within DT can have multiple solutions with different lengths.However, possibly due to having the same level of prior knowledge (measured during pretest), for most problems students were observed to construct similar proofs with similar lengths (details are discussed in Section 6.1 using Approach Maps).As shown in Table 3 column 6-9, in most of the training, and post-test problems, no significant differences were found in step counts across the three groups.However, we observed, in problems 7.3, and 7.5, T 2 had significantly fewer steps than C and T 1 [Table 3, column 6-9; row 9 & 11].Note that step count can be different due to the adoption of different solution approaches, or due to different unnecessary/incorrect proposition derivation counts.Our later approach analysis showed that to solve problem 7.3, the shortest student solution was 8 steps long, and 81% of T 2 students (44 out of 54 students) adopted the shortest 8-step approach, whereas the percentages for C, and T 1 were only 54% (32 out of 59 students), and 49% (27 out of 55 students) respectively.A similar pattern was observed for problem 7.5 where 65% T 2 students adopted the shortest 6-step solution, whereas the percentages for C, and T 1 were only 35%, and 41% respectively.These results suggest that BW strategy has the potential to drive students toward shorter solutions.However, this trait could be dependent on the specific problem a student is working on, since this pattern was only observed in two post-test problems.
The results of our score-based performance analysis suggest that the combination of BWE and BPS improved students' problem-solving skills, and helped them to perform better in the post-test.T 2 students obtained higher scores by constructing post-test proofs faster, or by constructing shorter posttest proofs.On the other hand, T 1 students who only received BWE, behaved and performed mostly like the control group C, who were not introduced to the backward strategy at all.In the next section, we investigate in more detail student solution approaches to identify the source of students' improved problem-solving skills as demonstrated by test scores.

RQ3: Impact on Subgoal Learning 6.1 Approach Map Analysis
To investigate T 2 students' higher performance (as reported in Section 5) in relationship with subgoaling, we generated graphical representations of students' proof construction attempts using Approach Maps [15].We identified expert subgoals in those solution approaches and analyzed each proposition derived using statistical tests to identify the instances where one training group was more efficient than another.From our analyses of BW action counts, and performance, we identified four scenarios: 1) Poor performance of T 2 cooccurring with a lot of BW actions (2.4-4.4); 2) Better performance of T 2 with no significant differences in BW actions (Prob.5.4); 3) Significant higher performance of T 2 co-occurring with comparatively more BW actions due to less problem solving time (7.3-7.6), or 4) due to fewer step counts (7.3, and 7.5).In the subsequent subsections, we first describe the construction method of approach maps and then, analyze approach maps of representative problems for each of the scenarios mentioned above.

Approach Map Generation Method
An approach map, proposed by Eagle et al. [15], is a graphical representation of students' problem-solving approaches.The steps to construct approach maps are briefly described below: Step 1 (Construct Interaction Networks from Students' Action Logs): An interaction network [39] for a problem is essentially a graph consisting of nodes and edges representing all students' problem-solving states and actions.For DT problems, a state is the set of all propositions (both justified, and backward derived unjustified ones) a student has at any point of the construction of a proof, and an action is the addition or deletion of a proposition through the application of a logic rule.Note that the propositions in a state are lexicographically ordered and the order of their derivation is ignored, since considering the order of derivation could increase the number of states exponentially, and no complex computation would be feasible on the interaction network.
As students progress in the construction of a proof, they move from state to state via actions.For example, if at state, S 0 (¬(K ∧M ), J ⇒ (K ∧L), L ⇒ M ), the action DeMorgan's rule on ¬(K ∧ M ) is applied, the new state will be S 1 (¬(K ∧ M ), ¬K ∨ ¬M, J ⇒ (K ∧ L), L ⇒ M ).An incorrect rule application can result in the previous state and next state to be the same.The tuple (current state, action, next state) is called an interaction.So, a students' solution attempt for a logic proof problem is a directed graph of interactions.Our code implementation generates interaction networks for a problem by conjoining all the interactions from student attempts.There is a single start node in the interaction network containing the given premises.There can be multiple end states (since there can be multiple solutions for the same problem) where each end state contains the justified goal statement.Additionally, to facilitate statistical analyses on the network, the interaction network includes data on frequency of node and edge visits, time spent on/before each interaction, and step counts before each interaction across the three training groups.
Step 2 (Girvan-Newman Clustering Algorithm): Students, due to not having expert-like prior knowledge/skills, require exploration leading to derivation/deletion of unnecessary/incorrect propositions along with correct propositions throughout a proof construction attempt.These derivations/deletions form visible clusters/regions in the interaction network where the major outcome of these regions are the proposition(s) contributing to the final proof.Also, different approaches to solving the same problem can result in different regions.To identify these regions, the approach map technique applies the Girvan-Newman community clustering algorithm [40] on interaction networks.The clustering algorithm takes as input an interaction network with start/end nodes, and self-loops (edges originating and ending at the same node) removed.Additionally, edge weights are assigned based on the cumulative visit frequency of the corresponding interaction.At each iteration of the algorithm, the edge with the highest edge-betweenness is removed from the network.Edge betweenness [41] of an edge is calculated by calculating the shortest paths between all pairs of nodes and counting the number of shortest paths that go through that edge.Then, the connectivity of the resulting graph is calculated using modularity score [42].Each connected component of the resulting graph is marked as a region.This process is continued till there is no edge left to be removed.The output of the algorithm is the graph with the highest modularity score and the clusters/regions are the connected components within that graph.
Step 3 (Approach Map Generation from Clustered Network): The clustered interaction network for any logic proof problem is a fairly large graph where student approaches to solve the problem cannot be visually detected.Thus, we simplified the clustered networks to approach maps using the method adopted by Eagle et al. [39].First, we added the start and end nodes back to the clustered interaction network and then applied the following steps: 1) Represent each region with a single node and label them with the proposition(s) with the highest number of incoming and outgoing edges from and to other regions, and all propositions derived to generate the latter from the former.This step filters out unnecessary propositions derived by the students, and keeps only the ones contributing to the proof; 2) Combine parallel edges, and actions between regions to a single edge with a composite action label; and 3) Keep only unique paths between the start and goal nodes.These three steps convert a clustered interaction network to a pseudo-graph called an approach map, where the start node is connected to the goal node via region nodes.A path from the start node to a goal node represents a solution approach, where the propositions contributing to the solution can be visually identified from the labels of the regional nodes in between.
Approach Map Presentation: In the approach maps presented in this paper [for ex.refer to Figure 5], we only showed the most common student solution approaches and used them to discuss differences found across the training groups.The start node contains the given premises, and the goal node contains the statement to justify.Region nodes are labeled as R1, R2, etc.Each path from Start to Goal represents an approach (labeled as A1, A2, etc.).Each edge is labeled with the applied rule(s), and visit frequency across the training groups as [n(C), n(T 1 ), n(T 2 )].Edge thickness and color are based on visit frequency (frequent edges are thicker and colored blue; non-frequent edges are colored black and of unit thickness).Expert-identified subgoal propositions are colored blue.Bold-faced propositions (Blue/Black) indicate significant differences in derivation across groups.Regions attached to the multicolor edge(s) indicate the last propositions of those regions were sometimes derived backwards by students.BW derivation counts are also attached in such cases.Next, we present approach maps for representative problems discussing different scenarios.

Scenario 1 (Poor Performance of T 2 Co-occurring
with Many BW actions): Prob.2.4 Problem 2.4 asks to derive In the approach map of this problem [Figure 5a], we show the most common three-step solution approach for this problem (adopted by 96% of the students) labelled as A1 [Start → R1(¬(A ⇒ ¬C)) → R2 (B, A ⇒ J) → Goal] in the figure.Note that in this approach, A ⇒ J was identified as a subgoal by experts [marked blue in Figure 5a].
The statistics above, together with the BW action count presented in Section 4.2, suggest that T 2 students attempted to define subgoals using BW strategy explicitly just after the first level of training.However, only a few T 2 students (14 students) were successful in deriving the correct subgoal (A ⇒ J).In this phase, they were still adapting to the BW skill and struggled (needed more time) to identify, and derive correct propositions in the backward direction.In the instances of failed BW derivation attempts, students were observed to derive more unnecessary propositions (in both FW/BW direction) that increased time, and step count decreasing their scores.On the other hand, when students were successful in deriving BW steps, the unnecessary proposition count decreased, but they still needed more time.

Scenario 2 (Improved performance of T 2 with no
Significant Differences in BW actions): Prob.5.4 Problem 5.4 asks to derive ¬J from the premises: ¬(K ∧ M ), J ⇒ (K ∧ L), andL ⇒ M .From the approach map for this problem [Figure 5b], we identified 6 solution (labelled A1 -A6) approaches.Among these approaches, A2  These results suggest, that although in this problem, most T 2 students did not explicitly derive BW steps, they were efficient in deriving propositions identified as subgoals, possibly through BW thinking which helped to outline the entire proof in less time with fewer unnecessary steps.Also, the few students who explicitly derived BW steps were more successful (fewer unnecessary propositions, and less time) than they were in the previous problem where they needed more time to work backwards (in prob.2.4).We concluded that at this phase (5th level of training) T 2 students who received both BWE, and BPS were better adapted to using BW strategy (explicitly/implicitly) for subgoaling.However, in this problem, although T 1 received higher avg.scores than C [Table 3], we found no significant evidence supporting improvement of group T 1 .In this problem, T 2 students engaged in BW derivations comparatively more than C and T 1 , and they did so efficiently.They also continued to show improved subgoaling behavior.In addition to fewer unnecessary steps and less time, in this problem, we observed T 2 students be more driven toward the shortest solution.On the other hand, the other expert subgoal for A2 is D ⇒ A (from region R2).Seven (7) T 2 students explicitly derived this proposition backwards.Overall, when adopted approach A2, T 2 students discovered subgoal A ⇒ C with significantly fewer unnecessary proposition derivations than that of C [Mean unnecessary proposition count before deriving A ⇒ C for C, and T 2 = 14.90, and 11.25; P M W (T 2 < C)=0.001].
These statistics show that T 2 students not only identified complex subgoals early in their solution attempts (with less unnecessary propositions, and time), but they were also able to figure out a plan to derive those subgoals (as indicated by early derivation of prerequisite propositions and quicker consecutive steps).Possibly, having the BW skill motivated BW thinking that helped to identify subgoals, and an outline of the solution of the problems, which overall decreased the time required to solve it.However, explicit BW derivations were observed only in 19 out of 54 (∼ 35%) T 2 students.

Discussion
In this study, we explored BWE and BPS within an intelligent logic tutoring system to help students adapt to backward strategy use with an aim of improving their subgoaling and problem-solving skills.Our results showed the effectiveness of our training method and revealed important insights on how backward strategy learning impact students' competence in problem-solving.We have summarized the major findings below.
RQ1 (Training Struggle and Increased BW Strategy Usage): Our results showed that BPS problems caused struggle for students during training (students needed more time, sessions, and restarts).Note that prior studies claim that students usually find backward derivations difficult [14].Also, experts mostly switch between strategies during problem-solving, unlike BPS problems where students were required to construct the entire solutions backwards, making the training highly complex from a cognitive point of view.However, prior studies claim that challenging, and complex activities increase motivation, induce students to engage deeply, and pay more attention, which helps them to find intrinsic patterns/connections among different parts of a problem, and eventually learn better [43][44][45].Conforming to this claim, our later findings showed that T 2 students voluntarily engaged in backward derivations while solving new problems, and also eventually outperformed the other groups.On the other hand, T 1 students receiving an easier training only involving BWEs behaved and performed like the control group.
RQ2 (Improved Problem Solving Achieved Over Time): The results From our performance analyses showed that the combination of demonstration (BWE), and practice (BPS) helped students to adapt with backward strategy and improved their problem-solving performance (higher scores, decreased problem-solving time, and step counts).However, improved performance was not observed immediately after students were exposed to the BW strategy.In the earlier phases of training (2.4-4.4),we observed T 2 to spend significantly more time while solving simple problems leading to lower scores.As training progressed, T 2 increasingly became more efficient in problemsolving and outperformed C, and T 1 .Recall that BWE/BPSs were given to students mostly during the first half of training [Figure 4].However, T 2 students continued to improve throughout later phases of training, and posttests.This pattern suggests that BWE+BPS training may require allowing students enough time to adapt to and become adept with the BW strategy, before they can successfully integrate it into their problem-solving approach.
RQ3 (Improved Subgoaling Skill): Our approach map analyses revealed that T 2 students derived expert-identified subgoal propositions more efficiently (with less time, and fewer unnecessary derivations) than the other two groups.On the other hand, T 1 did not show any significant consistent evidence of improved subgoaling, possibly due to superficial exposure to BW strategy through BWEs only.Our analyses also showed that, although T 2 students engage in BW derivations more than the C, and T 1 , not all T 2 students derived explicit BW steps.However, overall improved subgoaling behavior of T 2 students hints at implicit BW strategy use where students formed subgoals using BW thinking, but generated nodes only in the forward direction while working in the DT system, much like experts often do.Investigating the Impact of Backward Strategy Learning in a Logic Tutor networks.Proceedings of the national academy of sciences 99 (12)

Statements and Declarations
Some journals require declarations to be submitted in a standardised format.
Please check the Instructions for Authors of the journal to which you are submitting to see if you need to complete this section.If yes, your manuscript must contain the following sections under the heading 'Declarations': • Funding: The work was supported by NSF grant 2013502.
• Conflict of interest/Competing interests: The authors have no relevant financial or non-financial interests to disclose.• Authors' contributions: All authors contributed to the study conception and design.Material preparation, data collection and analysis were performed by Preya Shabrina.The first draft of the manuscript was written by Preya Shabrina and all authors commented on previous versions of the manuscript.All authors read and approved the final manuscript.

6. 4
Scenario 3(Improved Performance of T 2 (less time and steps) with Fewer Effective BW Steps): Prob.7.3 Problem 7.3 asks to derive ¬H from the premises : ¬(K ∧E), A ⇒ E, andH ⇒ (K ∧ A).The approach map for this problem [Figure 5a] shows the three most

Table 1
Training Phase Metrics Values (avg.)acrossC,T 1 , and T 2 (For T 2 , metric values are shown separately for BPS, and PS).To measure students' experience with and response to the training treatments, we calculated step time (avg.timetaken to derive one single proposition), problem time, step count, and restart/session counts.We did not find any differences across the groups during pretest.However, during the training phase, we observed significant differences in the metrics when performed Kruskal-Wallis tests with subsequent contrast analyses (pairwise Mann Whitney U tests with Bonferroni correction).Table1shows the training phase metrics values for PS/BPS problems.Recall that T 1 , and T 2 both were given BWE.Groups C and T 1 received only PS to solve themselves, while T 2 additionally received BPS.

Table 2
Engagement in Backward Actions across the Three Training Groups while Solving Pretest Problems, Training PS Problems (fourth Problem of Level 2-6), and posttest PS Problems.Interpreting p-val column: Read T 2 >T 1 (<0.0001) as T 2 has significantly higher value than T 1 with p¡0.0001.T 2 >T 1 (0.0001) means p=0.0001 for the hypothesis.
2 >C (0.001) construct proofs entirely in backward direction and indirectly encouraging the need for taking the best action at each step.

Table 2 ,
T 2 students carried out significantly more BW actions than group C and T 1 in most of the problems under consideration [2.4,3.4, 4.4, 5.4, 6.4, and 4 out of the 6 posttest problems: 7.3-7.6].The group that received backward examples, T 1 , behaved similarly to C in terms of lower explicit usage of BW strategy.Also, in the earlier phases of training 2.4-4.4,T 2 students took too many BW actions (∼31 -∼59 actions) [Table 2, column 4; rows 2-4].

Table 3
Total Time and Step Count for pretest, training (2(4)-6(4)), and post-test(7(1)-7(6)) PS Problems across the Three Training Groups.To investigate the source of higher/lower scores, we first analyze problem-solving times across the three training groups.Problem-solving time showed a similar pattern as problem scores.In 2.4 and 4.4, T 2 took significantly more time than C and T 1 [Table 3, column 2-4; row 2-