The Dynamics Between Self-Regulated Learning and Learning Outcomes: an Exploratory Approach and Implications

van der Graaf, Joep; Lim, Lyn; Fan, Yizhou; Kilgour, Jonathan; Moore, Johanna; Gašević, Dragan; Bannert, Maria; Molenaar, Inge

doi:10.1007/s11409-022-09308-9

The Dynamics Between Self-Regulated Learning and Learning Outcomes: an Exploratory Approach and Implications

Open access
Published: 05 July 2022

Volume 17, pages 745–771, (2022)
Cite this article

Download PDF

You have full access to this open access article

Metacognition and Learning Aims and scope Submit manuscript

The Dynamics Between Self-Regulated Learning and Learning Outcomes: an Exploratory Approach and Implications

Download PDF

Joep van der Graaf ORCID: orcid.org/0000-0003-1205-2193^1,5,
Lyn Lim²,
Yizhou Fan³,
Jonathan Kilgour³,
Johanna Moore³,
Dragan Gašević^3,4,
Maria Bannert² &
…
Inge Molenaar¹

6208 Accesses
7 Altmetric
Explore all metrics

Abstract

Self-regulated learning (SRL) has been linked to improved learning and corresponding learning outcomes. However, there is a need for more precise insights into how SRL during learning contributes to specific learning outcomes. We operationalised four learning outcomes that varied on two dimensions: structure/connectedness and level/deepness of knowledge. Specifically, we assessed how surface knowledge measured with a domain knowledge test (independent concepts) and a concept map (connected concepts), and deep knowledge measured with a transfer test (independent concepts) and an essay (connected concepts) were associated with frequencies of SRL activities during learning, assessed by concurrent think aloud, while taking into account students’ metacognitive and prior knowledge. Forty-four university students performed a 45-minute problem-solving task integrating information about three topics to write a vision essay on the future of education. Results of the pre-/post-test analysis revealed a learning gain in domain knowledge and concept maps. Low cognitive activities were associated with all knowledge measures, except the concept maps and transfer. Furthermore, specific low cognitive activities showed either a positive or negative association; for example, processing showed a positive association with essay, but a negative association with domain knowledge. High cognitive activities were associated with the essay (connected concepts), but not with the concept map. Both metacognitive activities and knowledge were related to transfer. To conclude, taking the level and structure of knowledge into account helps specify the association between SRL activities during learning and the related learning outcomes.

Knowing how to learn: development and validation of the strategy knowledge test for self-regulated learning (SKT-SRL) for college students

Article Open access 07 March 2024

An intervention study for the development of self-regulated learning skills

Article 23 October 2020

Metacognitive illusion or self-regulated learning? Assessing engineering students’ learning strategies against the backdrop of recent advances in cognitive science

Article Open access 23 October 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Dynamics between Self-Regulated learning and learning outcomes: an exploratory Approach and Implications

An important competence students need is Self-Regulated Learning (SRL), especially relevant for “learning to learn” (European Union, 2019). Self-regulating learners use cognitive activities (read, practice, elaborate) to study a topic, and use metacognitive activities (planning, monitoring, evaluation) to actively monitor and control their learning and motivate themselves to reach learning goals (Schunk & Greene, 2018). Generally, SRL has been associated with improved learning (Schunk & Greene, 2018), but diverse effects have been found for different outcomes of learning (Bannert, 2006). Often, SRL is associated with improved deep knowledge, which can be measured with transfer tests (Bannert et al., 2009; Bannert, Sonnenberg, et al., 2015) and essays (e.g., Greene & Azevedo 2007, 2009), but its impact on surface knowledge is less clear. It is important to recognise that learning outcomes differ in how they assess knowledge and its representation in mental structures (Frey et al., 2017). Therefore, we systematically investigated the relations between particular SRL activities during learning, as measured with think aloud, and different learning outcomes in an exploratory study. We characterised four learning outcomes using two dimensions of mental representation of knowledge (de Jong & Ferguson-Hessler, 1996).

Self-regulated learning

SRL is depicted as a goal-oriented process, in which learners actively steer their learning (Winne & Hadwin, 1998). Models of SRL consist of the so-called CAMM activities, which are Cognition, Affect, Metacognition, and Motivation (Azevedo et al., 2018). SRL research often investigates cognition and metacognition (e.g., Winne, 2018). Cognition refers to the mental action of representing and processing information (Sternberg, 1981), and includes low-level information processing during learning, such as reading and repeating information, as well as high-level information processing like elaboration and organisation of the information processed (e.g., Bannert et al., 2014). Metacognition is the knowledge one has of their own cognitive processes (Flavell, 1979) and includes activities such as, planning, orientation, monitoring, and evaluation (e.g., Veenman 2013).

A commonly used theory of SRL is the COPES model (Winne, 2018a), which describes four loosely sequenced phases (Winne & Hadwin, 1998). The first phase entails the definition of the task. It includes the metacognitive act of orientation, which is searching the learning environment and activating existing prior knowledge to identify conditions relevant to the task. The second phase focuses on setting goals and creating plans. Via metacognitive planning, the learner constructs goals to work on the task and plans an approach to pursue those goals. In the third phase, the task is executed using cognitive tactics and strategies. A range of cognitive activities is used to perform the task and construct new knowledge and skills. Low-level cognitive activities (reading, repeating, and processing information) serve the goal of processing and understanding provided information and high-level cognitive activities (elaboration and organisation) help to deepen new understanding (King, 2002; Molenaar & Chiu, 2017; Volet et al., 2009). In addition, the self-regulated learner engages in the metacognitive acts of monitoring and control (Winne, 2018a). This means monitoring which information was relevant for learning and monitoring (changes in) progress towards set goals. Control can be exerted by changes in the enactment of tactics and strategies. In phase four, learners reflect on their general approach and make changes for future learning (Winne & Hadwin, 1998). Thus, SRL is a multifaceted construct in which metacognitive, high, and low cognitive activities play a role to empower learning.

Learning outcomes

SRL has been associated mostly with learning outcomes that can be categorised as deep, such as transfer test and essay scores, whereas the role of SRL on surface knowledge is less clear (Bannert et al., 2009; Greene & Azevedo, 2007). Studies in education research have measured learning outcomes which often vary along two dimensions—the structure and level of knowledge measured (de Jong & Ferguson-Hessler, 1996). Structure is a dimension that ranges from unconnected and independent concepts to interconnected network structures of multiple concepts (Reif & Heller, 1982). In text comprehension research, this dimension has been introduced as coherence (McCarthy & McNamara, 2021). Independent concepts are pieces of information, such as a definition of artificial intelligence, that are not organised around fundamental concepts and thus are not connected to other relevant concepts within the domain (Alexander, 1992). Towards the other end of the dimension, we find highly connected knowledge structures. This means that the learner has organised pieces of information into a network of concepts with meaningful relations (Alexander, 1992). An example in artificial intelligence is organising supervised and unsupervised machine learning around the fundamental concept of machine learning. The relation of “supervised” with “machine learning” can be specified as “dependence on human supervision (by means of labelling data)”. The relation of “unsupervised” with “machine learning”, then would be “independence of human supervision”.

The other important dimension that should be considered more in SRL research is the level of knowledge, which ranges from surface to deep knowledge (de Jong & Ferguson-Hessler, 1996). A surface-level representation comprises concrete pieces of knowledge, whereas a deep-level representation encompasses understanding underlying concepts (Glaser, 1991). When presented with a problem about pulleys, the surface-level representation is made up of “a pulley”, whereas a deep-level representation also includes “conservation of angular momentum” (Chi et al., 1981). Deep knowledge is assumed to enable inferencing and making analogies, which allows the transfer to new situations (Glaser, 1991).

Structure and level of knowledge can be measured towards each end of their dimensions. Surface knowledge measures that differ in terms of structure knowledge are a domain test and a concept map. Often unconnected, independent concepts are assessed using multiple-choice questions addressing specific concepts and procedures within a domain (Bannert et al., 2009). A well-known method that has the potential to reveal the global organisation of a learner’s knowledge network is a concept map (Lehmann et al., 2020; Thurn et al., 2020). A concept map visualises the interrelations between concepts within a domain, which may resemble how these concepts are organised in the mind (Hilbert & Renkl, 2008). Deep knowledge measures that differ in terms of structure knowledge are a transfer test and an essay. A common measure of deep knowledge is a far transfer test (Bannert et al., 2009). Far transfer makes use of a new situation to which previously constructed knowledge can be applie. Therefore, it requires students to learn with a deep conceptual understanding (Lin & Lehman, 1999). Furthermore, when self-explaining relevant concepts, students also need to structure their knowledge. Such a mental model that contains deep, connected knowledge can be assessed using an essay test (Greene & Azevedo, 2007, 2009).

Table 1 Learning Outcomes Operationalised as Surface vs. Deep Knowledge and Independent vs. Connected Concepts

Full size table

To sum up, learning outcomes can be characterised along two dimensions—the structure and the level of knowledge. We have placed our operationalisation of different learning outcomes in Table 1. Although the table might suggest a dichotomy between surface and deep knowledge, and independent and connected concepts, we consider them as dimensions. Thus, our learning outcomes are located on two continuums. For instance, the domain test is operationalised as more surface than deep and more independent than connected.

SRL and learning outcomes

Next, we are interested in the relation between different SRL activities and the aforementioned learning outcomes. As indicated, we focused on low and high cognitive activities, and metacognitive activities.

Low cognitive activities

Low cognition refers to cognitive processes involved in understanding given information (King, 2002). Low cognitive activities often occur in the phase where learners execute the task per se (phase 3 in the COPES model; Winne 2018a), such as when learners acquire and consolidate an initial knowledge base (Frey et al., 2017). They are, therefore, assumed to be relevant for all learning outcomes. Low cognition includes reading, repeating, and processing, and has also been referred to as surface strategy use (Dinsmore, 2017; Molenaar & Chiu, 2017) studied children from Grades 4 to 6 who learned in triads. Triads who showed more low cognition (an aggregate of low cognitive activities) had more new words in their essay than triads with less low cognition. Their findings indicated that low cognition helped build a common foundation of knowledge. In a study with university students, the low cognitive activities, reading and repeating, were more common in students with less knowledge than students with more knowledge, suggesting the relevance of low cognitive activities for acquiring knowledge (Bannert et al., 2014). In this case, reading can be considered a beneficial process to gain knowledge (Frey et al., 2017), while repeating can be unproductive (Bannert et al., 2014). Another activity deals with students’ interactions with their products on a low cognitive level, called processing (Molenaar & Chiu, 2017), but the association of this specific low cognitive activity with learning outcomes has not been investigated yet.

High cognitive activities

In contrast, high cognitive activities, aimed at (re)organising previously acquired knowledge (King, 2002) may contribute to more connection among knowledge concepts. High cognition has also been referred to as deep strategy use (Dinsmore, 2017). For instance, knowledge organization can be fostered when students combine the purposes and consequences of machine learning when they are making notes. The study by Molenaar & Chiu (2017) also showed that triads with more high cognition had a higher essay quality, measured as originality, than triads with less high cognition. Previous studies have also demonstrated that the process of organising selected information in a coherent structure aids acquisition of domain-specific knowledge (e.g., Cook & Mayer 1988).

Metacognitive activities

Finally, metacognitive activities can help to deepen understanding (Bannert et al., 2009). Metacognitive activities consist of orientation, planning, regulation of cognitive activities, monitoring execution of planned actions, and evaluation of the outcome of task processing (Efklides, 2008; Veenman & Elshout, 1999). When studying students’ learning process, Lin & Lehman (1999) found an association between the metacognitive activities of planning and monitoring, and creating deep knowledge, assessed with a transfer test in college students. They explained that planning activates prior knowledge to integrate new knowledge. Monitoring helps by tracking knowledge development and identifying potential gaps and misconceptions that can be resolved. The association between metacognitive activities and deep knowledge has often been replicated (e.g., Bannert 2006; Bannert et al., 2009).

Prior and metacognitive knowledge

Although the focus of this paper was on associations between the frequency of SRL activities during learning and learning outcomes, we acknowledge the relation between a learner’s resources and SRL activities during learning. Learners with sufficient resources for dealing with the demand of self-regulation seem to be able to successfully engage in SRL and cope with the task requirements (Seufert, 2019). Two such resources are prior domain knowledge and metacognitive knowledge (Seufert, 2019), which are also two of the internal conditions in the COPES model (Winne, 2018a). According to the expertise reversal effect (Kalyuga, 2007), learners with more prior domain knowledge perceive tasks as less intrinsically loading. They will be able to compensate for additional extraneously loading aspects imposed by the task. For example, in a study that used think aloud to capture SRL, prior domain knowledge was positively associated with metacognitive activities (planning and monitoring), using diverse strategies, and elaboration, suggesting more effective learning (Moos & Azevedo, 2008). Note that more prior knowledge does not mean more regulation; instead, these results suggest a different and more effective way of regulation. The next step would be to identify how these regulation activities are associated with posttest measures.

Whereas prior domain knowledge constitutes knowledge about a specific topic of study, metacognitive knowledge is knowledge about different cognitive approaches and their usefulness in other contexts (Händel et al., 2014). According to the COPES model, internal conditions, including metacognitive knowledge, are elements the learner perceives could affect work on the task (Winne, 2018a). While metacognitive knowledge has been associated with academic achievement in reading and mathematics (Neuenhaus et al., 2011) and models have formulated how metacognitive knowledge can affect SRL (Efklides, 2008), we are unaware of studies explicitly addressing the association between metacognitive knowledge and SRL during learning. Previous research does provide two clues. Firstly, sufficient resources (Seufert, 2019) seem to help students consider relationships among content, which is a central element in SRL (Moos & Azevedo, 2008). Secondly, timed tasks are often used in SRL research (e.g., Bannert et al., 2014; Moos & Azevedo, 2008; Deekens et al., 2018), limiting the number of activities that can be performed and imposing an extraneous load on the learner. Indeed, increased time pressure has been associated with increased extraneous cognitive load (Barrouillet et al., 2007) and less-effective cognitive activities (e.g., Sidi, et al., 2017). Thus, having sufficient resources can help learners deal with time pressure. More specifically, metacognitive knowledge might help learners choose and perform effective cognitive activities, such as high cognition, resulting in better learning outcomes, at the expense of less effective cognitive activities, such as low cognition.

The Present Study and Hypotheses

As described so far, the beneficial effects of SRL activities during learning have been associated with learning outcomes. Different effects have been found depending on the conceptualisation of learning outcomes. However, a systematic analysis of how different SRL activities during learning relate to different learning outcomes has yet to be conducted. There have been few attempts, to our knowledge, to contrast different learning outcomes. Research by Deekens and colleagues (2018) is a notable exception. In two studies, they used think aloud to measure SRL during learning and analysed associations between SRL frequencies and pre- and post-test measures. In both studies, the SRL activities labelled as monitoring (part of metacognition) were positively associated with high cognition, which in turn were positively associated with post-test performance on both a declarative (domain test) and conceptual knowledge (essay) measure. Low cognitive activities negatively associated with essay quality (Deekens et al., 2018). Although surface knowledge of independent concepts (domain test) and deep knowledge of connected concepts (essay) was assessed, they were not contrasted with surface knowledge of connected concepts and deep knowledge of independent concepts. In addition, it would be useful to determine which low cognitive activities contribute in what way to learning, because some low cognitive activities might be needed when prior knowledge is low (Dinsmore & Alexander, 2016), implying a mediation effect of prior knowledge on SRL to learning outcomes on the posttest. Therefore, the present study investigated the association between SRL activities (metacognitive, high cognitive, and low cognitive) and learning outcomes (deep vs. surface knowledge, and independent vs. connected concepts), see Fig. 1. This study was exploratory, due to the small number of participants.

In a pre-/post-test design, students performed a learning task of 45 min, during which they had to read about three topics and write an essay about the future of education, see Fig. 2. We recorded think aloud during learning and coded it to identify SRL activities. Our analyses identified the associations between SRL (low and high cognition, and metacognition) and learning outcomes (domain test, concept maps, transfer test, and essay). We also controled for metacognitive knowledge, as assessed at the pretest.

The present study aimed to identify the associations between SRL activities, measured as frequencies of think aloud codes, and learning outcomes, measured as prior knowledge at pretest and learning products at posttest. Our hypotheses concern the association of frequency of SRL activities during learning with particular learning outcomes, see Fig. 1. Hypothesis 1: We hypothesised that low cognitive activities would be associated with all learning outcomes: Reading can contribute to domain knowledge (H1a) (Frey et al., 2017), while repeating might be unproductive for all learning outcomes (H1b) (Bannert et al., 2014). Also, we explored the role of processing because its relation to learning outcomes has not been investigated yet (H1c). Hypothesis 2: We hypothesised high cognitive activities would be associated with measures of connected knowledge (Cook & Mayer, 1988). Hypothesis 3: We hypothesised metacognitive activities, both analysis and monitor, would be associated with measures of deeper knowledge (Lin & Lehman, 1999). In addition to our main hypotheses, we expected prior knowledge to be negatively related to low cognitive activities and positively with high cognitive and metacognitive activities (Moos & Azevedo, 2008). We expected metacognitive knowledge, as assessed on the pretest, to be associated with cognitive activities (Händel et al., 2014) and transfer scores (Bannert et al., 2014; Bannert & Reimann, 2012).

Methods

Participants

University students (n = 46) with an average age of 21 years (SD = 3 years) participated in the present study. Two participants were removed from the analyses, because there were problems with their think alouds. Thus, analyses were conducted with 44 participants (34 female and ten male). Thirty-nine participants were enrolled in a bachelor’s degree program and five in a master’s program^{Footnote 1}. They were from a wide range of degree programs, of which psychology (11) and communication science (six) were the most common. We informed participants about the present study, and they were given the opportunity to ask questions, after which they gave active consent to collect data. Our research lab’s ethical committee approved the present research.

Procedure

Students started with the pretest, which consisted of demographic questions, a domain knowledge test, three concepts maps (one per topic), and a metacognitive knowledge questionnaire. Next was the learning session. Students’ task was to write an essay about the future of education using the provided informative texts. They were given 45 min to read the texts and write an essay. We recorded think aloud during their learning session. The final part was the posttest. Students completed three knowledge tasks: the same domain knowledge test and concepts maps, and a transfer test.

Materials

Apparatus

The learning environment was presented via a laptop on a separate monitor (23 inch; 1920 × 1080 pixels) to the participants. Windows 10 with default settings was installed on the laptop. Students used a keyboard and mouse, which were connected to the laptop. The learning environment was created for this study’s purpose. It ran on a local PHP-server and was presented via an internet browser.

Learning environment

The learning environment consisted of three panels, see Fig. 3. The left-hand panel was used to present the menu, a search function, links to the instruction and rubric, a button to change the essay mode (small, medium, or large essay), and a count of the number of words in the essay. The middle panel presented informative texts (of which six pages also had a picture) and the essay. The right-hand panel was used to interact with four tools: a planner, timer, highlighter, and note-taker.

Regarding the texts, there were instructions, a rubric, and informative texts about three different topics. The instructions were on the landing page. Students were instructed to write an essay of 300 to 400 words in 45 min. The essay should incorporate information from the three learning topics: artificial intelligence (AI), differentiation, and scaffolding. Furthermore, the essay should offer a vision of the future of education based on these topics. The instruction explicitly stated that 45 min is a short time and that participants could skip texts and start writing the essay if they wanted. Most of the text was relevant for the essay, but some parts were not. Students had access to the essay rubric, which contained the details of the essay assessment criteria based on the learning instruction and goals. The informational texts addressed: AI—what is AI, how does AI work, and four common forms of AI (Van Wetering et al., 2019); differentiation—what is differentiation, and how to apply it in the classroom (Deunk et al., 2018); and scaffolding—roots of scaffolding, what is scaffolding, and applying scaffolding (Reiser & Tabak, 2014). These three topics were chosen because of their potential relevance for education in 2035. To elicit decisions from the students about what to read, we added irrelevant texts for AI—history of AI (Russell et al., 2010), differentiation—standards for teacher education (Darling-Hammond, 2017), and for scaffolding—cognitive apprenticeship (Collins & Kapur, 2014). All texts were presented in Dutch. The original English texts were translated by the author(s) of the current paper, and the translations were discussed and finalised in collaboration with experts in the respective fields.

To draft the essays, the learning environment had a text field, where students could type. The number of words in the essay was automatically detected and presented. The essay had three modes, which were the read mode (small-sized essay), a hybrid mode for reading and writing (medium-sized essay) as in Fig. 3, which filled half of the page, allowing a hybrid reading-writing mode, and write mode (large essay), which almost completely filled the page. The default mode of the essay area was the read mode and students could change the essay mode via the essay size button.

With respect to the tools, students could use a search function, planner, timer, highlighter, and note-taker. The search function could be used to type in a search term, and matching results from the informative texts were presented. A student could click on a result to go to that page or cancel the search. The planner showed a timeline of 45 min with six blocks below it. Each block represented an activity: orientation, processing AI, processing differentiation, processing scaffolding, essay writing, and free choice. These blocks could be dragged to the timeline to create a plan. The free choice could be used to type in an activity of the student’s choice. The timer displayed the remaining time, but only when students clicked on it. The displayed time would disappear after two seconds. The highlighter and note-taker functioned similarly. Students started by selecting text. Then they could choose to highlight or take a note. It was optional for both highlights and notes to select any tags to be associated with them. The final step was to save the highlight and, for notes, to insert a note or a note title and save the note.

Coding SRL in think aloud protocols

Students were instructed and trained to think aloud. There was a short training beforehand whereby the experimenter demonstrated how to think aloud, and students could practise. When a participant fell silent, the experimenter reminded them to continue thinking aloud. During the learning session, utterances were recorded. Utterances were categorised as SRL activities using a coding scheme. Our coding scheme was based on previously developed coding schemes (Bannert, 2007; Molenaar et al., 2011). We coded metacognitive, high and low cognitive, motivational (activities indicating positive or negative appraisal), procedural (activities concerning the procedure of learning), and non-codable utterances (such as murmuring), see Table 2. Only metacognitive and cognitive activities were used for analyses, because other categories had a low frequency and were not relevant for our research questions. Two trained raters coded the utterances. Before coding, segments were created based on sound detection. The length of these segments was changed in rare cases, where sound detection went wrong (when a participant spoke very softly or did not speak, but there was another sound, such as moving the keyboard). Coding was done within ELAN software (ELAN, 2020). We then calculated a modified kappa, which takes potential differences in segment length into account (Holle & Rein, 2015). Inter-rater reliability was substantial (Munoz & Bangdiwala, 1997), κ = 0.53-0.65 (κ_max = 0.81-0.82). Due to a low frequency and in line with the theoretical framework described in the introduction, we merged four categories, comparable to Engelmann & Bannert (2019). We merged orientation and planning into analysis; monitoring and evaluation into monitor; rereading, superficial repetition, and superficial writing down into repeat (note that superficial repetition and superficial writing down capture behaviour that include literal repetition of information from the text and rereading captures reading the same words again); and elaboration and organisation into high cognition.

Table 2 Main and Subcategories for Coding Think Aloud, Including Descriptions and Examples

Full size table

Knowledge tests

Domain test

Domain knowledge was assessed at pretest and posttest. A domain test was developed to assess surface knowledge of independent concepts. Students answered 30 questions with four response options in total, ten questions per topic. Questions addressed conceptual knowledge. An example was: “What is unsupervised machine learning?” with the options, A: “You teach the algorithm what the relationships are between data labelled by humans”, B (correct answer): “You ask the algorithm to cluster data itself by finding patterns in a dataset”, C: “You teach the algorithm what the data are, which are needed to perform a task”, and D: “You ask the algorithm to choose its own data and use it to perform a task”. When the correct answer was chosen, one point was scored. Zero points were given for incorrect answers. The maximum score was 30. Reliability was acceptable (Kline, 2000), α = 0.60, λ₂ = 0.65, ω_t = 0.68, at pretest and α = 0.59, λ₂ = 0.64, ω_t = 0.66, at posttest.

Concept maps

Concept maps were assessed at pretest and posttest. Three concept maps were used to assess surface knowledge of connected concepts. There was one concept map per topic. Three experts created a concept map based on the informative texts per topic. Experts were free to choose the concepts they included. One of the authors made a synthesis that resulted in three expert concept maps with 15 concepts per topic. Concepts that were in only one of the three experts’ maps were omitted from the final expert concept map. Students’ task was to organise these 15 concepts in such a way that the concept map helps to explain the topic and to connect concepts that they thought were related. Thus, students were given a fixed set of 15 concepts per map. Concept maps were scored by comparing the links between concepts and the width of the concept map to the expert concept map, as proposed by (Pirnay-Dummer et al., 2010). This means that two similarity indices were calculated per concept map, which indicated a close similarity to the expert maps when they were close to 1 and a large deviation when they were close to 0. The first similarity index was the number of correct links. A link was counted as correct when two concepts were connected that were also connected in the expert map. The second index was the path length. The length of the longest path in a concept map was compared to the expert map reflecting a students’ range of connected knowledge (Pirnay-Dummer et al., 2010). We wanted to create one concept map score for pretest and one for posttest, for parsimony, to be used in subsequent analysis. To verify that two indices per topic for three topics composed a measure of connected surface knowledge, separately for pretest and posttest, we conducted a CFA. We specified a pretest component, a posttest component, and a path between them. Furthermore, we added covariances between the correct link scores of the same topic between pretest and posttest. The model showed that the number of correct links and path length of differentiation on the pretest did not load on the pretest concept map component, which might be explained by the students’ low prior knowledge of this topic. We excluded these from the final model, which had a good fit, χ² (32) = 35.359. p = .313, CFI = 0.979, RMSEA = 0.049, 90% CI [0.000 − 0.127], SRMR = 0.078. Thus, the data supported a general concept map score on the pretest of AI and scaffolding, and a general concept map score on the posttest of AI, differentiation, and scaffolding. Therefore, we calculated an average of four indices at the pretest and six indices at the posttest to analyse concept maps.

Metacognitive knowledge

Metacognitive knowledge was assessed at the pretest stage using the MESH (Bannert et al., 2015) based on a previous questionnaire (Händel et al., 2013). The participants were asked to read seven learning-related scenarios and to rate five to six statements per scenario. The statements referred to strategies that varied in their degree of effectiveness for the given situation. The participants filled out the questionnaire by ticking one option per statement. The options ranged from “not useful” to “very useful”, a six-point Likert scale. Responses were scored by comparing how statements within a scenario were rated. One point was given whenever one of the 43 key comparisons was in line with the expert rating. A comparison was in line when one statement was preferred over the other in the same way as in the expert rating. Thus, a high score means high metacognitive knowledge. One statement was missing in the present study, resulting in 41 key comparisons in total and, thus, a maximum score of 41. Reliability was good (Kline, 2000), α = 0.92, λ₂ = 0.92.

Transfer test

Transfer of knowledge was assessed at the posttest. A transfer test was developed to assess deep knowledge of independent concepts. Ten questions addressed the transfer of AI to the medical domain. There were four answer options. An example was: “Which is an example of how artificial intelligence has been used in hospitals?” with the options, A: “Sophisticated tube transport system to transport patient records within the hospital”, B: “Using robot vacuums for cleaning hospital floors to minimise infection”, C: “Advanced wireless communication system which minimises disruption to equipment”, and D (correct answer): “Using sophisticated algorithms to diagnose diseases”. When the correct answer was chosen, one point was scored. Zero points were given for incorrect answers. Four questions were removed to achieve acceptable reliability. Thus, the maximum score was 6. Reliability was acceptable (Kline, 2000), α = 0.60, λ₂ = 0.62.

Essay

The essay was written during the learning session and scored by two independent raters using a coding scheme aligned with the rubric. The essay was scored to assess deep knowledge of connected concepts. The coding scheme described five categories: (1) explanation of three topics (maximum of 9 points; 3 points per topic), (2) connection of the topics to the future of education (maximum of 6 points; 2 points per topic), (3) suggestions of how the topics can be used in the future (maximum of 3 points; 1 point per topic), (4) originality, which was a scaled inverse of a copy score obtained using WCopyfind (maximum of 3 points), and (5) the number of words: 250–450 words resulted in 3 points, 200–249 or 451–500 words in 2 points, 150–199 or 501–550 in 1 point, and any number below 150 or above 550 in 0 points. Thus, the maximum score was 24 points. Two raters were trained to score the essays. They were instructed in the use of the coding scheme and discussed any discrepancies during training to reach agreement. Inter-rater reliability was almost perfect (Munoz & Bangdiwala, 1997), Fleiss-Cohen κ = 0.89.

Table 3 Descriptive Statistics (Median with 25th and 75th percentile)

Full size table

Data analysis

Data used for analyses were: number of correct answers on the metacognitive knowledge test (1 in Tables 3 and 4), number of correct answers on the knowledge test on pretest (2) and posttest (12), mean of the similarity and path length index of the concept maps at pretest (3) and posttest (13), frequencies of SRL activities: first reading (4), repeat (5), processing (6), high cognition (7), analysis (8), and monitor (9), number of points on the essay based on a coding scheme (10), and number of correct answers on the transfer test (11).

First, we tested whether learning occurred using Wilcoxon signed-rank tests. Second, as a first investigation of the associations and in preparation of the structural equation modelling (SEM), we calculated correlations using the Spearman rank coefficients, because not all data were normally distributed. Third, we conducted SEM using the lavaan package (Rosseel, 2012) in R (R Core Team, 2020). The Maximum Likelihood (ML) estimator with Huber-White robust statistics was used for the estimation of the model and path coefficients. Not all variables had a normal distribution, see Table 3. Therefore, we first modelled different distributions of the SRL activities (frequencies of think aloud), as Greene et al., (2011) proposed. This analysis showed that regular Ordinary Least Squares (OLS), BIC = 6358.29, outperformed the other models with our data: Poisson, negative binomial, and their zero-inflated variants, BIC’s > 7700. We continued with an OLS model, but we requested robust statistics to deal with deviations from non-normality.

The type of model we built was a so-called structural model without latent variables. The way variables entered the model is as follows: we ordered variables in the model to reflect our pre-/post-test design, where pretest measures were used as independent variables, SRL activities as mediators, and posttest measures as dependent variables (see Fig. 1). Note that essay quality was regarded as a posttest measure, because it was the product of the learning process (and most of the time, it was also the final action performed during the learning session). We first tested a hypothesis model with paths specified based on our hypotheses. Then, we respecified the model to improve model fit. Model fit is acceptable when CFI and TLI are close to 0.95, RMSEA close to 0.06, and SRMR close to 0.08 (Hu & Bentler, 1999). Respecification was done by trimming down the model, removing paths with p-values above 0.10. Then, paths were added based on modification indices and whether they were in line with correlations, in line with the exploratory nature of the present study. We stopped when the Chi-square test indicated a good model fit (p > .05).

In the final model, mediation effects (indirect effects), direct effects, and total effects were tested. Mediation effects were tested by multiplying the paths involved in the mediation. An example was the indirect effect of metacognitive knowledge on pretest via first reading to essay quality. The presence of direct effects in the final model depended on the respecification step. During respecification, empirical tests were used to trim the model. This meant that in two cases in the present analyses, mediation was tested, where there was no direct effect. Note that a significant direct effect is not a prerequisite for significant mediation (Hayes, 2009). The total effect was the sum of the direct and indirect effect and thus, was only calculated when both direct and indirect effects were present.

Results

Before addressing the research questions, we tested whether learning occurred. Results showed that learning occurred, domain test scores were higher on posttest than pretest, p < .001, r = .51, (a large effect), and concept map scores as well, p = .003, r = .31, a medium effect (Cohen, 1992). The descriptive statistics are depicted in Table 3, and correlations are depicted in Table 4. The median score on the domain knowledge at the pretest was 17 correct and at posttest 21 correct. The increase in concept maps was from 0.49 to 0.54. The frequencies of the SRL activities showed that first reading was most frequent (274 times), and processing was least frequent (40 times).

The correlations showed associations between the SRL activities. Low cognitive activities, first reading, repeat, and processing, and high cognition were correlated. Metacognitive activities, analysis and monitor, were correlated to a lesser extent with the other SRL activities. Furthermore, learning outcomes were correlated with prior knowledge, except for the essay score. The essay was correlated with low and high cognitive activities. The transfer test showed two correlations with metacognitive activities: analysis, p = .071, and monitor, p = .054. Domain knowledge correlated negatively with repeat, p = .071. Concept maps did not correlate with SRL activities.

Table 4 Spearman’s Rank Correlations of Learning Outcomes and SRL Activities

Full size table

In line with our pretest-posttest design, we constructed a model with pretest measures predicting SRL activities (as mediators) and both pretest measures and SRL activities as predictors of posttest measures. Based on our hypotheses, we constructed a model, which had a poor fit. We respecified the model based on the correlations and modification indices. After respecification, the final model had an acceptable fit (Hu & Bentler, 1999), see Fig. 4. Only the SRMR showed a lesser fit, which indicated more variables might be associated. This could be due to associations within SRL activities and/or variables being dropped in the SEM model due to their low statistical detectability in the case of multiple independent variables. Therefore, fit was considered as acceptable. All specified paths can be found in Fig. 4 and Appendix A. No additional paths were specified.

We found first reading to be negatively associated with essay quality, and not to be associated with other learning outcomes, in contrast to hypothesis H1a, see Table 5. We found a negative association of repeat with domain knowledge, in line with hypothesis H1b. Processing was negatively associated with domain knowledge and positively with essay quality. We did not have explicit expectations about processing and explored its associations (H1c). Concept maps and transfer were not associated with low cognitive activities. We found that high cognition was positively associated with essay quality, but there was no association with concepts maps. This partially supported our hypothesis that high cognition would be associated with measures of connected concepts (H2). We found analysis to be positively associated with transfer, but no associations of metacognitive activities and essay quality were found. This partially supported our hypothesis that metacognition would be associated with measures of deep knowledge (H3).

Table 5 Hypotheses and Whether Findings Support Them

Full size table

Regarding the effect of prior knowledge, we found, as expected, associations of domain knowledge with low cognition, namely processing, and of metacognitive knowledge with first reading. In addition to the hypothesised associations, we found autoregressive effects of domain knowledge and concept maps. Furthermore, domain knowledge and concept maps were positively related at pretest, and domain knowledge and transfer test were positively related at posttest as revealed by correlated errors.

Finally, mediation effects were found. Domain knowledge (pretest) was associated via processing with domain knowledge (posttest), p = .015. This means that higher scores on domain knowledge (pretest) were associated with higher scores on processing, and higher scores on processing were associated with lower scores on domain knowledge (posttest). Metacognitive knowledge (pretest) was associated via first reading with essay quality, p = .048. This means that higher scores on metacognitive knowledge (pretest) were associated with lower scores on first reading, and higher scores on first reading were associated with lower scores on essay quality (posttest).

Discussion

The present study investigated the association between the frequency of SRL activities during learning (low and high cognitive, and metacognitive activities) and learning outcomes (deep vs. surface knowledge, and independent vs. connected concepts). We want to stress the exploratory nature of our study and therefore, results should be interpreted with caution. Our hypotheses were partially confirmed. Low cognitive activities were related to learning outcomes, but not all. High cognitive activities were associated with one of two connected concepts measures: essay quality. Metacognitive activities were associated with one of two measures of deep knowledge, namely, with transfer. Note that we used frequencies of SRL activities in our analyses, which limits drawing conclusions about sequential or temporal aspects, but it does allow conclusions about the overall occurrence of specific learning activities in relation to learning outcomes.