Introduction

Self-regulated learning (SRL) describes a goal-oriented learning process that enables students to achieve learning goals by using cognitive, metacognitive, and motivational learning strategies in an adaptive way (Zimmerman, 2000). Although college students seem to possess high declarative SRL strategy knowledge (i.e., knowledge about beneficial and adverse strategies; Dresel et al., 2015), they often show deficiencies in the spontaneous application of SRL strategies during learning processes (Foerst et al., 2017; Peverly et al., 2003). Investigating conditional strategy knowledge (i.e., knowledge about which strategy is most useful for a specific learning situation; Paris et al., 1983) could lead to new insights into why students are not able to put their declarative knowledge into action. In order to investigate this gap between declarative SRL strategy knowledge and its application in more depth (and aid the design and evaluation of effective interventions), it is necessary to develop instruments that cover conditional SRL strategy knowledge for the whole learning process and that can provide sufficiently valid and reliable SRL assessments. In past research, assessment methods have been developed that measure SRL as a stable competence or a dynamic event based on the respective underlying theory (for an overview see Wirth & Leutner, 2008; Rovers et al., 2019). As most assessment methods (such as questionnaires) have put a quantitative lens on SRL (i.e. the more, the better; measurement of frequency or agreement), to date, methods that qualitatively assess SRL (i.e. the more adequate, the better; measurement of fit between strategies and learning situations) have been underrepresented (Wirth & Leutner, 2008; see Sect. 1.3 on SRL assessment methods). Strategy knowledge tests (SKTs) represent one promising approach to measuring knowledge about the adequacy of different SRL strategies for a given learning problem; furthermore, SKTs can be used economically with large samples of students. Nevertheless, as yet, there is no available test for conditional SRL strategy knowledge for college students that combines cognitive, metacognitive, and motivational components (Boekaerts, 1999) within a process model viewpoint (Zimmerman, 2000). Therefore, based on four distinct studies, the development and validation of such a new scenario-based conditional SRL strategy knowledge test is described in this manuscript.

Self-regulated learning

In general, self-regulated learning (SRL) is defined as “processes whereby learners personally activate and sustain cognitions, affects, and behaviours that are systematically oriented towards the attainment of personal goals” (Zimmerman, 2011, p. 1). According to Boekaerts’ (1999) model, SRL is described as a stable competence that is composed of three hierarchical layers that differ in their regulatory focus. The inner layer refers to the regulation of information processing and comprises knowledge about and the effective use of cognitive learning strategies, such as the organisation and elaboration of the learning material (cognitive component). The middle layer concerns the regulation of the learning process and encompasses the application and control of learning strategies through metacognitive strategies such as planning, monitoring, and reflecting (metacognitive component). The outer layer is about the regulation of the self and encompasses motivational processes such as goal setting and motivational beliefs such as academic self-efficacy (motivational component). Although this model is very prominent, it does not consider the dynamic character of SRL. Correspondingly, process models such as the original model of Zimmerman (2000), the updated model of Zimmerman and Moylan (2009) as well as the expanded process model of Usher and Schunk (2018) describe the cyclical nature of learning processes and comprise a planning phase before the learning starts, a performance phase that covers this learning, and a reflection phase that follows after the learning has ended. Each reflection phase is seen as influential for the next planning phase, which is why SRL can be seen as a cycle of the described learning phases. All the phases of Zimmerman’s model (2000) as well as of the expanded process model by Usher and Schunk (2018) comprise several subcomponents that are relevant to each specific phase and that represent the aforementioned cognitive (e.g. elaborative learning strategies), metacognitive (e.g. self-monitoring), and motivational components (e.g. self-efficacy beliefs). Although there are several prominent SRL models (Panadero, 2017), the component model of Boekaerts (1999) and process models such as the model of Zimmerman (2000) have proven useful for designing interventions, especially for college student populations (Panadero, 2017). This is why we used these two models as theoretical base for the development of our strategy knowledge test. This test should help diagnosing SRL knowledge gaps in college students that can be obliterated by suitable interventions.

Although SRL is highly relevant for all kinds of learners, this is especially the case for college and university students (Bembenutty, 2011; Dresel et al., 2015; Theobald, 2021). College represents a highly autonomous learning environment that is not as structured as high school (Cohen, 2012; Park et al., 2012) and can lead to learning struggles and emotional problems, such as feelings of isolation (Wei et al., 2005). External regulation is substantially lower in college compared to school as learners have to highly engage themselves in learning processes that happen outside the classroom and without direct instruction from teachers, parents, or peers (Wolters & Brady, 2021). Additionally, college learning is often time-consuming and challenging due to its complexity and extent (Zusho, 2017). Although the systematic review of Asikainen and Gijbels (2017) did not find clear evidence for the assumption that college students developed deep learning during college, first longitudinal studies on the transition from high school to college spoke in favour of a transition jump as college students had shown increases in deep and self-regulated learning after transitioning to college (Coertjens et al., 2017). In accordance to these findings on the relevance of SRL for college learning, small to moderate positive correlations (based on Cohen’s (1988) effect size classification) between SRL and academic achievement were repeatedly shown in college samples (e.g., 0.23 < r < 0.31, Dörrenbächer-Ulrich et al., 2021; 0.21 < r < 0.35, Kitsantas et al., 2008; see Richardson et al., 2012, for a meta-analysis). Besides this, SRL is related to emotional outcomes; college students belonging to a competent SRL profile showed significantly lower test anxiety scores than less optimal self-regulators (Ning & Downing, 2015). Moreover, moderate-to-high negative relationships between SRL and stress and depression (-0.35 < r < -0.54, Park et al., 2012), and moderate-to-high positive relationships between SRL and academic self-concept (0.37 < r < 0.67, Ommundsen et al., 2005) have been found.

Conditional SRL strategy knowledge

In the context of SRL competences, performance in learning and studying tasks is seen as enacted SRL knowledge (Blömeke et al., 2015; Wirth & Leutner, 2008). Therefore, when conceptualizing SRL as a competence, SRL strategy usage can be considered as enacted knowledge about SRL strategies. While declarative strategy knowledge concerns the knowledge on what a strategy is, procedural strategy knowledge comprises knowledge on how a strategy should be used. Conditional strategy knowledge refers to knowledge on the conditions (when and why) for a specific strategy to be useful, i.e., which strategy is useful in which situation (Paris et al., 1983). Correspondingly, Karlen (2016) posited that conditional SRL strategy knowledge acts as a necessary prerequisite for the usage of SRL strategies. Concerning declarative and procedural SRL strategy knowledge scores in college students, rather high mean values, but low actual use of these strategies have been reported (Foerst et al., 2017). With regard to conditional strategy knowledge, studies on its assumed relationship to using SRL strategies are rare: In his study, Karlen (2017) reported significant, but moderate correlations between the results of a conditional SRL strategy knowledge test and an SRL questionnaire in high school students (0.29 < r < 0.32). A study of Dörrenbächer-Ulrich et al. (2021) also found non-significant low correlations between conditional SRL strategy knowledge and self-reported usage of SRL strategies (0.05 < r < 0.22). In line with this reasoning, college students have shown problems using SRL strategies (Peverly et al., 2003).

In order to understand the missing or low relationship between strategy knowledge and strategy usage, it is helpful to consider Hasselhorn’s (1996) conceptualization of strategy maturity. Hasselhorn described different deficiencies that can result in not using a strategy although knowledge about it exists; the concept of mediation deficiency holds that learners are not able to use a specific strategy even if they have been directly taught or instructed to use it, while production deficiency is present if strategies are used following instruction, but are not autonomously implemented in the behavioural repertoire. In addition, Hasselhorn assumed a usage deficiency when learners can clearly use a strategy, but its use costs considerable cognitive capacity, and they are not capable of recognizing situations in which this strategy would be most effective. Foerst et al. (2017) found evidence that corresponding production deficiencies are predominant in college students with regard to SRL strategies; in their study, learners scored rather high in a strategy knowledge test but were not capable of autonomously using the strategies.

It seems necessary to investigate the gap between declarative SRL strategy knowledge and SRL usage in more depth; correspondingly, there is a need for instruments that provide sufficiently valid and reliable SRL assessments. One important aspect of SRL knowledge that is focused on in this study is the development of an instrument to assess conditional SRL strategy knowledge. Strategy knowledge tests measure conditional knowledge about the adequacy of different SRL strategies with regard to a given learning problem. They are promising because they can be used as standardized and economical instruments with large student samples. Moreover, the results of such a test can be a sound basis to design adaptive interventions to foster strategy knowledge and strategy application. To the best of our knowledge, there is no conditional SRL strategy knowledge test available to date that assesses knowledge on cognitive, metacognitive, and motivational SRL components (as in Boekaerts’, 1999, model) within a process model viewpoint (e.g. Zimmerman, 2000) in college students. As research has provided several methods for measuring SRL with different foci, in the next section, we provide an overview of the different assessment methods, with a specific focus on strategy knowledge tests.

Assessment of SRL

As SRL comprises several components and can be described using different theoretical models, there are many assessment methods for measuring SRL (Rovers et al., 2019). On the one hand, SRL can be conceptualized as a competence that is related to general and stable self-regulated learning behaviours. On the other hand, SRL can be seen as time-dependent and situation-dependent and shows dynamic changes (Cleary & Callan, 2018). In line with this, SRL assessment methods can be categorized as offline or online (Wirth & Leutner, 2008). Offline methods assess learners’ more general competences to help self-regulate their learning (e.g. self-reported usage of goal setting and planning strategies), and are therefore independent of specific learning situations. In contrast, online methods refer to a specific learning task and assess actual strategy use while performing a task in a given situation. Wirth and Leutner (2008) additionally distinguished between quantitative standards (“maximum view”; performance increases with more strategies used) and qualitative standards (“optimum view”; performance increases with better fit between strategies used and the actual task). Questionnaires are popular offline methods, whereas microanalysis is a frequently used online method; these are described in more detail below.

Self-report questionnaires are often used to assess SRL and can be categorized as offline assessments that rely on quantitative standards. Students indicate their general learning behaviour irrespective of a specific learning situation and rate the frequency of their strategic learning behaviour while answering statements like “Before I start learning, I develop a time plan for the process” (Dörrenbächer & Perels, 2016a). Questionnaires are highly economical as they can be used to examine large samples in a standardized way and when only a short testing time is available. In their review on SRL assessment methods, Roth and colleagues (2016) concluded that, in general, SRL questionnaires show satisfactory evidence concerning the factorial structure, and acceptable-to-good evidence concerning reliability and convergent, discriminant, and predictive validity (Roth et al., 2016). With regards to the most frequently used SRL questionnaire of their review, the “Motivated Strategies for Learning Questionnaire” (MSLQ; Pintrich et al., 1991), Cronbach’s alpha values between 0.52 and 0.93 were reported. Furthermore, evidence indicated the MSLQ’s convergent validity (e.g. a relationship between the MSLQ and academic delay of gratification; Bembenutty & Karabenick, 1998) as well as discriminant validity (e.g., partially lower correlations between dissimilar facets than between similar facets of different SRL questionnaires, Muis et al., 2007). With regard to predictive validity, positive relationships of moderate size between MSLQ values and academic performance have been found (e.g. Kitsantas et al., 2008; Pintrich et al., 1993).

However, despite these satisfying psychometric characteristics, SRL questionnaires have also been criticized (Rovers et al., 2019); as students mostly indicate their general learning tendencies, it is usually unclear which situations they use as the basis for this generalization. Answering items referring to past behaviour also can lead to retention problems (Winne & Perry, 2000) and self-report bias. Moreover, the competence perspective on SRL is less suitable for assessing changes in SRL during intervention studies. Additionally, Artelt (2000) reported that questionnaires mix up strategy knowledge and actual strategy usage, and although questionnaires predicted academic achievement to a certain extent (Rovers et al., 2019), the amount of explained variance was mostly low-to-moderate (Veenman & Spaans, 2005).

A recently introduced online method for capturing SRL processes is microanalytic assessment, which can be seen as qualitative measurement in the aforementioned sense (Wirth & Leutner, 2008). SRL processes are assessed during an actual learning situation by asking students to answer open questions about their planning, performance, and reflection phases of learning while doing the actual learning task (Callan & Cleary, 2018). Microanalytic assessment is, therefore, a structured and theory-driven thinking-aloud method that is not retrospective and, as Cleary et al. (2012) state, reveals a minimized self-report bias due to its contextualization. The assessment items are mostly open-ended and are scored through coding by independent raters. Several studies have shown that microanalytic assessment predicted academic achievement; for example, Artino and colleagues (2014) found moderate correlations between the microanalytic assessment of strategic planning and medical students’ course grade, GPA (grade point average), and subject-specific examination results (0.29 < r < 0.40). In line with this, Callan and Cleary (2018) reported moderate correlations between microanalytic metacognitive monitoring assessments and maths performance (0.25 < r < 0.36). In their review, Cleary et al. (2012) summarized the evidence for microanalysis from former studies; interrater agreement was mostly strong (Kappa coefficients of around 0.90 in most studies), while Cronbach’s alpha coefficients were often not reported due to the use of single item measures. Furthermore, they reported that achievement groups differed reliably in their microanalysis results and were the first to provide evidence on predicting academic achievement by microanalysis. In addition, Cleary et al. (2012) underlined that microanalysis showed construct validity as the correlation patterns they found were in line with the theoretically assumed cyclical feedback model (Zimmerman, 2000). Nevertheless, to date, the use of microanalysis has been generally sparse in SRL research. This might be because the scoring of open answers is very time-consuming and needs a sound theoretically grounded coding scheme.

A relatively new approach to measuring SRL competences is to use SKTs. These instruments are characterized as offline measures with a qualitative standard because, in the sense of conditional strategy knowledge, knowledge about the fit of specific SRL strategies for distinct learning or studying situations is focused. It is assumed that learners can only apply SRL strategies successfully when these strategies are appropriate for successful learning in specific learning situations (Grassinger, 2011). Moreover, SKTs are economical as they can be used time-efficiently, are suitable for large samples, are standardized, and do not rely on postdictions of behaviour. Instruments that aim to assess SRL strategy knowledge in different populations are available (e.g. Händel et al., 2013: lower secondary school students; Maag Merki et al., 2013: upper secondary school and college students) and refer mostly to subject-specific strategies (e.g. mathematics; Lingel et al., 2014), important general competencies (e.g. reading; Schlagmüller & Schneider, 2007), or only one component of SRL (metacognition; Karlen, 2017). In SKTs for SRL, participants are introduced to learning scenario vignettes that are specific for the population (e.g. reading a textbook chapter as preparation for a course in college). They then rate the usefulness of several SRL learning strategies with regard to the specific scenario; importantly, these different strategies represent different levels of usefulness based on theoretical foundations and expert ratings (Händel et al., 2013). The scores for strategy knowledge are computed by comparing learners’ ratings with expert ratings (Steuer et al., 2019; see the Methods section for more detailed information on scoring). As several studies used this scenario-based measurement approach successfully, we followed this approach as well in the present study. In the few former studies on the psychometric evidence of SKTs, appropriate item-total correlations (e.g.., Karlen, 2017: 0.21 < rit < 0.60; Maag Merki et al., 2013: 42 < rit < 0.56) and at least acceptable evidence regarding Cronbach’s alphas were found (e.g., Karlen, 2017: α = 0.77; Maag Merki et al., 2013). Concerning convergent validity, two studies reported moderate correlations for self-reported SRL strategy use (e.g., Karlen, 2017: 0.29 < r < 0.32; Maag Merki et al., 2013: r = 0.26). With regard to predictive validity, findings were somewhat inconsistent, ranging from low (Maag Merki et al., 2013: r = 0.11) to moderate (Karlen, 2017: r = 0.37) correlations between strategy knowledge and academic achievement. In contrast to other SRL questionnaires, it is clear that SKTs for SRL refer to (conditional) knowledge about strategies without mixing this knowledge with strategy usage, leading to a more unambiguous measurement of qualitative SRL competences in the sense of knowledge.

Aim of the present study and the process of instrument validation

The main aim of the present study was to describe the development and validation process for a new Strategy Knowledge Test for Self-Regulated Learning (SKT-SRL) for college students that comprises three important components – cognition, metacognition, and motivation (see Boekaerts, 1999) – within a process approach to learning that comprises planning, performance, and reflection (see Zimmerman, 2000). Based on previous conditional SRL strategy knowledge tests (e.g., Händel et al., 2013), we designed a test that comprised seven learning scenarios, each presented with three useful and three less useful strategies for this scenario. Learning strategy knowledge was then indicated by comparisons of students’ ratings of these useful and less useful strategies in order to assess if the students differentiated between useful and useless strategies for the specific learning scenarios. After presenting the development of the SKT-SRL using an expert rating approach (Study 1), we present the results of three studies that used the instrument. In Study 2, a pilot study, we inspected the important psychometric characteristics (e.g. item difficulties, reliability evidence in terms of Cronbach’s α) of the first version of the instrument and the first scoring scheme. Moreover, we looked at the relationships between the SKT-SRL knowledge scores, the scores from an SRL questionnaire (Dörrenbächer & Perels, 2016a), and academic achievement. The questionnaire was developed for college students and captures self-reported SRL strategy usage in the sense of Zimmerman’s model (2000). It has been used in previous studies (Dörrenbächer & Perels, 2016a) showing good evidence concerning reliability as well as criterion validity with regard to academic achievement. In Study 3 (Validation Study 1), we also analysed the important psychometric characteristics and looked at the relationships between the (adapted) SKT-SRL knowledge scores and those of the aforementioned SRL questionnaire, the SRL microanalysis, and the academic achievement measure. In Study 4 (Validation Study 2), we covered two measurement points and tested the factorial structure and the test–retest reliability of the new instrument. Moreover, we examined how the SKT-SRL was related to the aforementioned SRL questionnaire, the indicator of academic achievement and additional study-related constructs that are important for academic success, such as test anxiety, academic self-efficacy, academic self-concept, and well-being.

Based on previous studies, we hypothesized to find at least acceptable-to-good reliability evidence (Cronbach’s α). Concerning the test–retest reliability, we assumed to find at least moderate correlations. Regarding convergent validity, we assumed to find moderate correlations between the SKT-SRL scores and the SRL questionnaire scores (as this questionnaire is also categorized as an offline method) and low correlations between the SKT-SRL and the SRL microanalysis (as this is categorized as an online method). Concerning academic achievement scores, we expected to find low-to-moderate correlations with the SKT-SRL scores. With regard to the correlations with other constructs important for academic success, we hypothesized that students with higher strategy knowledge scores showed more favourable scores for test anxiety, academic self-concept, and well-being.

Study 1: scale construction and expert ratings

The main aim of Study 1 was to construct the new instrument and use an expert rating approach to investigate the content validity of the newly constructed scenarios and the corresponding strategies. The expert rating approach was applied on the basis of previous studies on scenario-based strategy knowledge tests (e.g., Händel et al., 2013; Maag Merki et al., 2013; Steuer et al., 2019). In these studies, expert ratings were used to generate standards forming a basis to estimate the quality and correspondence of learners’ responses. Accordingly, Wirth and Leutner (2008) described expert ratings concerning the fit of strategies for specific scenarios as common method to validate the appropriateness of the selected scenarios and strategies. We then used these expert ratings to make decisions about whether to change the scenarios with the corresponding strategies or not. We developed the new SKT-SRL based on existing theoretical models and previous findings on SKTs. For the theoretical base, we relied on Zimmerman’s (2000) process model, which is well-known and widely used in SRL research. Moreover, we considered the three central SRL components: cognition, metacognition, and motivation (e.g. Boekaerts, 1999).

In line with Händel et al. (2013), we developed seven learning scenarios that covered different learning problems with regard to cognition, metacognition, and motivation. In order to develop suitable scenarios representing the typical everyday learning scenarios of college students, we considered learning situations suggested by Dresel et al. (2015). Within an interview approach, these authors identified learning situations, such as preparation for an oral or written examination or an oral presentation, that are relevant and demanding with regard to the use of SRL strategies. We chose one metacognitive and one motivational scenario for each of three SRL phases (planning, performance, reflection), as well as one cognitive scenario for the performance phase (because cognitive strategies are not considered to be central to the planning and reflection phases – see Zimmerman, 2000 – cognitive scenarios were not included for these phases). This resulted in seven scenarios in total. To select more and less useful strategies for each scenario, we relied on the suggestions of Engelschalk et al. (2015) and Schwinger et al. (2007) for the motivational component (either an expectancy or a value motivational problem, depending of the scenario) and established self-report items from (Dörrenbächer & Perels, 2016a) for the cognitive and the metacognitive component. For each scenario, we selected and constructed three useful and three less useful strategies. Because we wanted to assess conditional strategy knowledge (which strategy is suitable for the specific learning and studying situation of the scenario), we also presented general SRL strategies that were not useful for the specific scenarios presented (e.g. a metacognitive strategy such as time planning was not seen as useful for a motivational learning problem). Regarding the strategies, we chose wordings that were not suggestive of whether the strategy was useful or not useful (see Table 10 in the Appendix for all the scenarios and strategies [the German version used in the pilot and validation studies can be found in Table 11 in the Appendix]). While answering the SKT-SRL, test participants were expected to rate the appropriateness of the respective strategy for the specific learning scenario. In addition to constructing the first version of the instrument, the main goal of Study 1 was to inspect expert ratings regarding the newly constructed scenarios and strategies. Specifically, we asked experts to assign the scenarios to one of the three SRL phases (planning, performance, reflection) and components (cognitive, metacognitive, motivational). Furthermore, the experts were asked to rate the usefulness of the different strategies for each scenario.

Methods

Expert sample

In order to obtain data concerning the suitability of the scenarios and the selected strategies, we contacted several experts by email and asked them to participate in the expert rating. The experts were researchers from the field of educational psychology and educational science who had a specific focus on SRL. Therefore, we assumed that these experts had elaborated knowledge on SRL components, phases, and strategies and could judge the usefulness of a strategy for a specific situation. Because this study was conducted in Germany and the scenarios and strategies were worded in German, we contacted German-speaking researchers. Seventeen researchers from Germany, Austria, and Switzerland took part in the online expert rating stage. Most were professors in educational psychology or educational science; the others were post-doctoral researchers or PhD candidates in the field of SRL research.

Procedure

In the first step, we presented the newly developed scenarios without the strategies to the experts and asked them to assign the scenarios to both the SRL phases and SRL components. In the second step, using open-ended questions, we asked the experts to suggest helpful SRL strategies representative of the learning problem described in each scenario. In the third step, experts were asked to rank six strategies per scenario (as suggested by Händel et al., 2013; Maag Merki et al., 2013) from the most useful strategy to the least useful strategy. In the fourth step, we asked the experts to rate the usefulness of each of these strategies for each scenario on a four-point Likert scale ranging from 1 = not useful at all to 4 = totally useful. Participation in the study was voluntary and anonymized, as we did not obtain any personal data from the experts.

Results

In the first step, the experts mostly assigned the scenarios to both the intended SRL phase and the SRL component (Table 1). Therefore, we judged all the scenarios as suitable.

Table 1 Frequencies of expert assignment of scenarios to SRL phases and SRL components

Regarding the learning problems described in each scenario, we analysed the SRL strategies suggested by the experts in the second step. Interestingly, almost all the strategies suggested by the experts represented the SRL component that the strategy was aimed at (e.g. in the planning phase, all the experts suggested that motivational strategies are helpful for solving motivational problems), and the specific strategies suggested by the experts were mostly in accordance with the useful strategies created by us (e.g. for the metacognitive scenario in the planning phase, all the experts suggested that time planning and/or goal setting were helpful strategies). Furthermore, at least half of the expert-suggested strategies were the same as our a priori selections for the learning scenarios, except for the metacognitive scenario in the reflection phase, where experts remarked that the problem explained in the scenario was not entirely clear. As a consequence, we rephrased and improved this scenario.

Concerning the rank order for the strategies, which was created by the experts in the third step, we analysed the rank that was most frequently given by the experts. The three strategies that we classified a priori as useful were always among the top three expert strategies while the three less useful a priori strategies were always ranked in the bottom three expert strategies. Regarding expert ratings of the usefulness of the strategies suggested by us in the fourth step, all of the strategies classified a priori as useful had a mean rating higher than three, while the strategies classified a priori as less useful mostly had mean ratings lower than two (see Table 10 in the Appendix for the usefulness ratings).

Conclusion and consequences

To summarize, the expert ratings showed that the scenarios were suitable for the different phases and components of SRL. Moreover, the strategies selected by us fitted well to the strategies suggested by the experts. Only the metacognitive scenario for the reflection phase had to be rephrased and improved. Finally, the ratings and the rankings of the strategies by the experts highlighted the suitability of the selected strategies for representing both more useful and less useful scenario strategies. Therefore, the instrument was considered to be appropriate for a first analysis of its psychometric properties and the examination of its relationship with another SRL instrument.

Study 2: pilot study

The first aim of the pilot study was to inspect the scoring method for the first version of the instrument as well as its central psychometric characteristics, such as item difficulties and evidence regarding reliability. We analysed the seven scenario subscales and summed them into three component subscales (a metacognition subscale and a motivation subscale, each with three scenarios, and a cognitive subscale, which corresponded to the one cognitive performance scenario) and an overall scale that contained all the scenarios/subscales. Relying on the classification suggested by DeVellis (2017, p. 145) concerning Cronbach’s α as reliability evidence (α < 0.60: unacceptable, 0.60 < α < 0.65: undesirable, 0.65 < α < 0.70: minimally acceptable, 0.70 < α < 0.80: respectable, 0.80 < α < 0.90: very good, α > 0.90: consider whether the scale should be shortened) and based on previous studies, we hypothesized that we would find at least respectable-to-very good reliability evidence.

The second aim focused on the relationships between the SKT-SRL scores (subscales and overall scale) and an SRL questionnaire assessing self-reported SRL strategy usage (to examine convergent validity), and academic achievement. Based on previous research regarding the assessment of SRL with different methodological approaches as well as the relationship of conditional strategy knowledge and strategy usage, we expected moderate correlations between both instruments (with higher correlations for convergent subscales [e.g. between the motivational subscale of the SKT-SRL and the motivational subscale of the SRL questionnaire] than for divergent subscales [e.g. between the motivational and metacognitive subscales of the SKT-SRL]). Furthermore, we assumed to find low-to-moderate correlations between the SKT-SRL scores and the achievement indicator.

Methods

Sample and procedure

The sample consisted of N = 143 teacher education students from a medium-sized German university (for more information see Table 2), who attended a weekly course lecture on educational sciences for teacher education students. Participation in the study was voluntary and the students answered the questionnaire during course time after giving their informed consent. Data was completely anonymized.

Table 2 Sample characteristics for the three studies

Instruments

Strategy knowledge test

Concerning conditional strategy knowledge, the students were asked to rate the six SKT-SRL strategies per scenario on a four-point Likert scale ranging from 1 = not useful at all to 4 = totally useful. Participants were explicitly instructed to rate the usefulness of each strategy for solving the learning/studying problem described in the scenario independently of their own behaviour when studying. In order to prevent gender artefacts, we presented the characters in the scenario situations as female for the female participants and male for the male participants. To analyse the student ratings, we calculated the difference scores by subtracting the rating for each less useful strategy from the rating for each useful strategy; this resulted in nine comparison scores per scenario (as described for other conditional strategy knowledge tests, e.g., Maag Merki et al., 2013). Difference scores ranging from –3 to 0 were scored with 0 points (as the less useful strategy was rated higher than or equal to the useful strategy), difference scores ranging from 1 to 3 were scored with 1 point (as the less useful strategy was rated lower than the useful strategy). The maximum attainable score was 9 points per scenario and 63 points (9 points per scenario × 7 scenarios) for the total scale.

Self-regulated learning questionnaire

To examine the convergent validity of the newly developed SKT-SRL test, we administered a self-report SRL questionnaire on SRL strategy usage previously used in several studies (Dörrenbächer & Perels, 2016a). The questionnaire comprised 53 items representing three subscales: a cognitive subscale (7 items), a metacognitive subscale (19 items), and a motivational subscale (27 items). Participants answered on a four-point Likert scale ranging from 1 = total disagreement to 4 = total agreement (see Table 4 for the descriptive statistics and Cronbach’s α values).

Academic achievement

As an indicator of scholastic achievement, we used German high school graduation certificate GPA data (Abiturnote); this ranged from 1 (the best grade) to 4 (the worst grade).

Data analysis

To gain first insight into the psychometric properties of the newly developed instrument, we calculated the means, standard deviations, and inspected possible ceiling effects of the scales. Following Terwee et al. (2007, p. 37), we inferred ceiling effects if at least 15% of the students achieved the highest possible score per scale. Furthermore, we analysed item difficulties as well as scale reliability estimates (Cronbach’s alpha). We ran all these analyses for the seven scenario subscales, the component subscales for cognition, metacognition, and motivation, as well as the total scale. As the cognitive subscale consisted of only one scenario for the performance phase, the results for the cognitive scenario subscale were identical to those for the cognitive component subscale. Regarding validity evidence, we examined the correlations between the total scale and component subscale SKT-SRL scores and the SRL questionnaire and academic achievement scores.

Results

The descriptive statistics of the SKT-SRL scales are depicted in Tables 3 and 4. In order to investigate ceiling effects, we examined the percentage of participants attaining the maximum score. Except for the motivational scenario for the performance phase, we found considerable exceedance of the 15% criterion stated by Terwee et al. (2007). Item difficulty scores showed a substantial range with some items being moderately difficult and some being less difficult (0.19 < p < 0.97). Regarding the reliability evidence, following the classification of DeVellis (2017), Cronbach’s alpha values were minimally acceptable for two scenario subscales (cognitive scenario of the performance phase and motivational scenario of the reflection phase) and respectable for the remaining five scenario subscales (Table 3). Concerning the component subscale reliabilities (Table 4), the cognitive and motivational subscales showed minimally acceptable Cronbach’s alpha values, while the metacognitive subscale had a very good value and the whole scale had a respectable value.

Table 3 Study 2 (Pilot Study): Means (M) and standard deviations (SD), item difficulties, internal consistencies, and percentage of participants with maximum scores for the scenario subscales
Table 4 Study 2 (Pilot Study): Correlations between the SKT-SRL, the SRL questionnaire, and GPA

The SKT-SRL subscale correlations between the cognitive, metacognitive, and motivational subscales (averaged across scenarios) were low-to-moderate (see Table 4). The overall (total) SKT-SRL and SRL questionnaire scores correlated substantially (r = 0.32) while the correlations between the convergent subscales (referring to the same component; 0.18 < r < 0.33) were numerically higher than the correlations between the divergent subscales (referring to differing components; 0.03 < r < 0.21). We found no significant correlations between the SKT-SRL scores and academic achievement as measured using high school GPA.

Conclusion and consequences

Taken together, these initial results regarding the psychometric characteristics of the SKT-SRL were somewhat promising. On the one hand, the item difficulties showed a reasonable variability and corresponding standard deviations. On the other hand, ceiling effects were found for most of the subscales, which indicated that the items (based on the developed scoring mechanism) were too easy for our student sample. In order to reduce the ceiling effects and lower the item difficulty scores, we modified the scoring mechanism (see Validation Study 1 below). With regard to the initial reliability evidence, we found acceptable-to-very good Cronbach’s alphas for the majority of the subscales. Concerning the initial convergent validity evidence, we found moderate relationships between the SKT-SRL and the SRL questionnaire (and therefore between conditional SRL strategy knowledge and self-reported strategy usage), replicating results from previous studies on conditional SRL knowledge tests. However, we found no significant correlations between the SKT-SRL and high school GPA. To conclude, these results were partly promising; however, further modification and improvement of the SKT-SRL scoring mechanism and further evidence concerning reliability and validity was required. Accordingly, we designed Study 3 (the first validation study).

Study 3: validation study 1

The aim of the first validation study was to investigate the improved SKT-SRL scoring mechanism, especially with regard to ceiling effects found in the pilot study, reliability evidence, and validity evidence based on relationships with another SRL questionnaire (quantitative offline measure, according to Wirth & Leunter, 2008) and SRL microanalysis (qualitative online measure). Therefore, we used two assessment approaches to capture SRL strategy usage. Although SRL questionnaires have been criticized in the past (e.g., Rovers et al., 2019), we used this approach for determining convergent validity as it is the most applied approach for measuring SRL (Roth et al., 2016). We expected to find moderate correlations between the SKT-SRL and SRL questionnaire scores (as both instruments are categorized as offline methods, we assumed higher interrelations for the convergent subscales; see Study 2: Pilot Study) and low correlations between the SKT-SRL and SRL microanalysis (categorized as online method) scores. Furthermore, we investigated the relationships between SKT-SRL and scholastic achievement; based on previous research, we expected to find low-to-moderate correlations.

Methods

Sample and procedure

The sample consisted of N = 99 teacher education and psychology students from a medium-sized German university (for more information see Table 2). The students were recruited via information listed on the Faculty homepage. Participation in the study was voluntary, although participation in empirical research projects was required as partial fulfilment of the study programme. Students signed an informed consent form before participating in the research and data was completely anonymized.

Due to the COVID-19 pandemic, we conducted the study online. Participants received a link to the survey programme. After clicking the link, the survey began and students first provided demographic information (age, gender, semester, study subject, GPA). Students then worked on a non-fictional text about a rather unknown sports discipline before answering a corresponding knowledge test. In this context, microanalytic questions were implemented. Following this, the participants completed the SKT-SRL and then the SRL questionnaire.

Instruments

Strategy knowledge test

Students completed the new SKT-SRL as described for the pilot study (Study 2). Based on the results of the pilot study, we adapted the scoring mechanism for the SKT-SRL. If participants rated the less useful strategy as equally useful or more useful than the useful strategy (no difference or negative difference), 0 points were given. If participants rated the useful strategy as 1 point higher than the less useful strategy (difference of 1), 1 point was given. If participants rated the useful strategy as 2 or 3 points higher than the less useful strategy (difference of 2 or 3), 2 points were given. Using this new scoring scheme, we aimed to better represent differing SRL knowledge levels and reduce the ceiling effects found in the pilot study. We therefore still geared our strategy knowledge test scoring mechanism to that of previous studies (e.g. Händel et al., 2013; Karlen, 2017), while obtaining a more fine-grained rating. For each scenario subscale, the maximum attainable score was 18 (2 points per comparison × 9 comparisons). The resulting maximum attainable total score was 126 (2 points per comparison × 63 comparisons) for the overall SKT-SRL scale, 54 for the metacognitive and motivational strategy scales (2 × 27 comparisons), and 18 for the cognitive strategy scale (2 × 9 comparisons).

Self-regulated learning questionnaire

We used the same SRL questionnaire as used in the pilot study to gain information on self-reported strategy usage (Dörrenbächer & Perels, 2016a), which revealed very good reliabilities for the metacognitive and motivational subscales and the overall scale (see Table 6). For the cognitive subscale, Cronbach’s α showed an unacceptable value of 0.57. Roth et al. (2016) concluded in their review, that it is not uncommon for SRL questionnaires to show low to moderate reliability evidence. In conclusion, we classified our results as sufficient for further analyses.

Microanalytic assessment

To obtain information on SRL strategy usage from an online measurement, we used microanalytic assessment. Students had to read a text about “Sepak Takraw”, an Asian ball sport. First, participants had to obtain an overview of the text and the task within 90 s. They then answered microanalytic questions on the planning phase (e.g. “Do you have a plan for working on the task? If yes, please explain.”). The participants were then given three minutes to read the text in more depth. They then answered microanalytic questions on the performance phase (e.g. “Which strategies do you use to learn the content of the text?”). After an additional three minutes of learning the content of the text, students completed a knowledge test on the text and then answered microanalytic questions on the reflection phase (e.g. “Would you change something if you could work on the text again?”). The assessment therefore included open-ended items on cognitive, metacognitive, and motivational SRL processes. Answers were independently coded by two trained raters based on a theory-based coding scheme (see Appendix Table 12, for an example coding scheme for a motivational question for the planning phase). To analyse the interrater reliability of the coded data, we inspected intraclass correlations (ICC) (2, 1) with a two-way random single measure for absolute agreement. The results indicated a high mean interrater agreement (according to Koo & Li, 2016) for the microanalytic component subscale questions (cognitive subscale, two questions, ICCmean = 0.78; metacognitive subscale, eight questions, ICCmean = 0.87; motivational subscale, four questions, ICCmean = 0.85). To further analyse the coded data, we first created a mean score from the ratings of both raters and then a summed score for each subscale.

Academic achievement

As in the pilot study, we used German high school graduation certificate GPA data (Abiturnote) to indicate scholastic achievement, ranging from 1 (the best grade) to 4 (the worst grade).

Data analysis

As in the pilot study, we calculated the means, standard deviations and percentage of participants attaining maximum scale scores to identify possible ceiling effects. Furthermore, we analysed item difficulties as well as scale reliabilities in the form of Cronbach’s alphas. We ran analyses for all seven scenario subscales, the component subscales for cognition, metacognition, and motivation as well as the overall scale. Concerning convergent validity, we calculated the correlations between the overall and component subscale SKT-SRL scores, the overall and component subscale SRL questionnaire scores, the overall and component subscales SRL microanalysis scores, and academic achievement. The characteristics of the subscales and total scales are depicted in Tables 5 and 6.

Table 5 Means (M), standard deviations (SD), item difficulties, and Cronbach’s alphas for the scenario subscales of the SKT-SRL: Validation Study 1 (upper line) and Validation Study 2 (lower line)
Table 6 Validation Study 1: Correlations, means (M) and standard deviations (SD) between the SKT-SRL, SRL questionnaire, SRL microanalysis, and GPA

Results

As seen in Table 5, the item difficulties varied substantially, with some items being classed as difficult and others being classed as easy (0.18 < p < 0.95). Concerning the percentage of participants attaining maximum scale scores, a slight exceedance of the 15% criterion (Terwee et al., 2007) was evident for the metacognitive scales for the planning and performance phases. Regarding reliability evidence, Cronbach’s alpha values were respectable-to-very good for all the scenario subscales, the component subscales, and the total scale.

The cognitive, metacognitive, and motivational subscales of the SKT-SRL correlated at least moderately with each other (see Table 6). As hypothesized, we found moderate convergent correlations between the metacognitive and motivation subscales of the SKT-SRL and the SRL questionnaire, but not with regard to the cognitive subscale. The divergent correlations were also partly of moderate size. In contrast to our hypotheses, the analyses resulted in low and non-significant correlations of the SKT-SRL with the microanalysis subscales (with the exception of a moderate correlation between the SKT-SRL motivational subscale and the microanalysis metacognitive subscale). Furthermore, the SKT-SRL scores did not correlate significantly with high school GPA.

Conclusion and consequences

The first validation study was aimed at investigating the improved SKT-SRL scoring mechanism with regard to ceiling effects, reliability, and convergent validity (based on associations between the SKT-SRL, another SRL questionnaire, and SRL microanalyses). We also investigated the relationship between the SKT-SRL and academic achievement. Although ceiling effects were still present for the metacognitive scenarios in the planning and performance phases, the percentage of participants with maximum scale scores only slightly exceeded the 15% criterion (Terwee et al., 2007). Comparing these values to those of the pilot study, the ceiling effects were substantially reduced and were only present in two scales (compared to six scales in the pilot study). This underlines the adequacy of the adapted SKT-SRL scoring mechanism. With regard to reliability evidence, the analyses revealed respectable-to-very good Cronbach’s alphas for all the scenario and component subscales as well as the overall scale. Moderate relationships between the SKT-SRL component subscales and the SRL questionnaire component subscales intimated some level of convergent validity for the SKT-SRL, replicating results from the pilot study. Nevertheless, the low reliability of the cognitive questionnaire subscale has to be taken account, which might have influenced the results. Future studies should aim at using instruments with at least acceptable reliability evidence for all subscales. In addition, we found no significant correlations between the SKT-SRL and the SRL microanalyses, which is in accordance with previous studies on the relationship between offline and online SRL measurements (Foerst et al., 2017). The relationship between conditional SRL strategy knowledge and SRL strategy usage therefore depends on the assessment methods used to capture strategy usage. Both sets of results indicate partially favourable validity of the SKT-SRL for assessing conditional SRL strategy knowledge. However, the correlation between SKT-SRL and academic achievement as measured by GPA did not differ significantly from zero. To conclude, further research is needed regarding the psychometric characteristics of the SKT-SRL. Specifically, further evidence on its test–retest reliability, factorial structure, and relationship to other relevant constructs could be insightful.

Study 4: validation study 2

The aim of the second validation study was to replicate the psychometric properties reported in Validation Study 1 and investigate the correlations between the SKT-SRL, an SRL questionnaire capturing self-reported strategy usage, and high school GPA. Regarding the relationships between the SKT-SRL and the SRL questionnaire, we again expected to find moderate correlations (with higher correlations between the same components than between different components). We expected to find low-to-moderate correlations between the SKT-SRL scales and the achievement indicator. Moreover, this second validation study also investigated the factorial structure of the newly developed instrument using confirmatory factor analysis. In addition, we aimed to analyse the test–retest reliability of the SKT-SRL component scores over the course of ten weeks; for this, we assumed at least moderate test–retest correlations, because conditional strategy knowledge is seen as an aptitude and therefore a rather stable competence (Wirth & Leutner, 2008).

To gain first insight into the relationships between the newly developed SKT-SRL and other relevant variables, we examined correlations between the SKT-SRL scores and different constructs that are highly relevant to college students and that previous research has shown to be related to SRL in general (please note this was mostly in terms of strategy use as assessed by self-report questionnaires and not in terms of conditional SRL strategy knowledge). Based on the findings of previous research, we chose three motivational constructs: perceived usefulness of SRL strategies, academic self-efficacy, and academic self-concept. Perceived usefulness of SRL strategies seems to be an important factor that influences the usage of SRL strategies (e.g. see Rosário et al., 2012, 2013) and is hypothesized to exist alongside higher motivation for using SRL strategies. Perceived usefulness could help to explain the predominant production deficiency effect (relatively high strategy knowledge, but less automatic strategy application; Cerezo et al., 2019; Foerst et al., 2017) in college students. Students with higher academic self-efficacy beliefs also report more intense usage of SRL strategies (Bai et al., 2021; Bernacki et al., 2015). As applying SRL strategies tends to sit alongside knowledge of SRL strategies, we expected to find moderate-to-high relationships between conditional SRL strategy knowledge and academic self-efficacy. The final motivation construct examined here was academic self-concept, which is defined as cognitive representations of one’s own competences with regard to academic performance situations (Marsh & Martin, 2011) and is highly related to the use of SRL strategies (Ommundsen et al., 2005). Therefore, we also expected to find a positive correlation between academic self-concept and conditional SRL strategy knowledge.

Additionally, we considered two affective constructs: well-being and test anxiety. Well-being is a central marker of mental health and supports individuals to cope with stress and work productively (WHO, 2016). Several studies have shown positive correlations between well-being and adaptive academic functioning, including the usage of SRL strategies (Davis & Hadwin, 2021; Grunschel et al., 2016; Howell, 2009), and negative correlations between the application of SRL strategies and depression rates in college students (Van Nguyen et al., 2015). Therefore, we expected to find a positive correlation between conditional SRL strategy knowledge and well-being. In addition, previous research found negative relationships between self-regulated learning and test anxiety (Kesici et al., 2011; Ning & Downing, 2015; Rodarte-Luna & Sherry, 2008). Students that regulate their learning process more effectively tend to be more successful in their endeavours, and in turn, experience lower levels of test anxiety. Although research indicated negative relationships between SRL and test anxiety, effect sizes were rather small; therefore, we expected only a weak relationship. Furthermore, we included study satisfaction as a subjective outcome variable. Based on previously reported moderate positive correlations between SRL strategy usage and study satisfaction (Spörer & Brunstein, 2005), we also expected to find a moderate positive correlation between conditional SRL strategy knowledge and study satisfaction.

Methods

Sample and procedure

The sample consisted of N = 207 teacher education students from a medium-sized German university (for more information see Table 2). The students attended a lecture on educational sciences for teacher education students and decided whether to take part in the study during the lecture. Participation in the study was voluntary, although participation in empirical research projects was required as partial fulfilment of the course. The data was pseudonymized by using codes and students had to sign an informed consent in advance of participating in the research. Ten weeks later, participants from the lecture were asked to complete all the study tasks for a second time. In total, N = 105 teacher education students (see Table 2) took part at both measurement points (Time 1: t1 and Time 2: t2). Dropout analyses comparing both samples showed no significant differences regarding age (F(1, 206) = 1.26, p = 0.26), but students that participated at both measurement points (t1 and t2) studied in a lower mean semester (F(1, 206) = 7.37, p < 0.01) and showed better high school GPA (F(1, 205) = 10.15, p < 0.01) than students who only took part at the first measurement point (t1).

The participants scanned a quick response (QR) code with their smartphones and worked on the instruments within an online survey tool. After providing their demographic information and indicating their high school GPA academic achievement score, they completed the SKT-SRL and the SRL questionnaire, and rated the SRL strategies for their usefulness. The participants then completed questionnaires on academic self-efficacy, academic self-concept, well-being, test anxiety, and study satisfaction (as well as additional variables unrelated to this study).

Instruments

Strategy knowledge test

We applied the same version of the SKT-SRL as was used in Validation Study 1 (see Sect. 4.1.2).

Self-regulated learning questionnaire

To measure self-reported strategy usage, we used the same questionnaire as was used in the pilot study and the Validation Study 1, but excluded the items on academic self-efficacy from the motivational subscale because we used a distinct questionnaire to measure this construct (see below). Therefore, the motivational scale had five items less than in the previous studies. Reliability estimates (Cronbach’s alphas) were respectable-to-very good for the subscales and the overall scale (see Table 8).

Usefulness of self-regulated learning strategies

Each item of the SRL questionnaire that described a distinct strategy was followed by a usefulness statement: “I think the application of this strategy is useful for my learning” (Dignath & Fischer, 2024). Students were asked to rate the usefulness of the strategy referring to themselves on a four-point Likert scale (1 = totally not true, 4 = totally true). They were also asked to rate the general usefulness of each strategy. In sum, this resulted in 33 usefulness ratings (α = 0.89) as not all the SRL questionnaire items described specific strategies but assessed general concepts (e.g., “I enjoy learning.”).

Academic self-efficacy

We assessed academic self-efficacy using the Study-Specific Self-Efficacy Scale (Schiefele & Moschner, 1997); students answered the 10 items on self-efficacy beliefs concerning tests and academic achievement (e.g., “Even if a test is hard, I know what to do to pass the test.”) on a four-point Likert scale (1 = total disagreement, 4 = total agreement; α = 0.87).

Academic self-concept

We measured self-concept using four items (e.g. “I am a good student.”) used in previous studies (e.g. Rost & Sparfeldt, 2002). The items were rated on a five-point Likert scale (1 = total disagreement, 5 = total agreement; α = 0.72).

Well-being

Well-being was measured using the WHO-5 Well Being Index (Topp et al., 2015) with five items (e.g. “Regarding the last two weeks, I felt calm and relaxed.”) that were rated on a six-point Likert scale (1 = never, 6 = the whole time; α = 0.88).

Test anxiety

Test anxiety was assessed with 10 items (Hodapp et al., 2011) – five items related to worry (e.g. “I’m concerned about my performance.”; α = 0.83) and five items related to emotionality (e.g. “I feel anxious.”; α = 0.91) – on a five-point Likert scale (1 = never, 5 = very often).

Study satisfaction

Study satisfaction was measured using a scale (Westermann et al., 2018) comprising nine items (e.g., “In general, I am satisfied with my current studies.”) that were answered on a four-point Likert scale (1 = total disagreement, 4 = total agreement; α = 0.84).

Academic achievement

As in the previous two studies, we used German high school graduation certificate GPA data (Abiturnote) as an indicator of scholastic achievement, ranging from 1 (the best grade) to 4 (the worst grade).

Data analysis

As in the first validation study, we inspected the descriptive statistics and psychometric properties of the new SKT-SRL. We ran all the analyses for the seven scenario subscales as well as for the component subscales on cognition, metacognition, and motivation, besides the overall scale. Convergent validity was examined by calculating the correlations between the SKT-SRL component subscales and the SRL questionnaire subscales as well as the corresponding whole scale (total) scores. Moreover, we examined the SRL–academic achievement correlations. We analysed the factorial structure of the SKT-SRL using the data from the first measurement point (t1), running a confirmatory factor analysis (CFA) with the scenario subscales as first-order latent variables, the cognition, metacognition, and motivation components as second-order latent variables, and the SRL strategy knowledge variable as the third-order latent variable. Mplus8 statistical software (Muthén & Muthén, 2012) with “type = general” and the robust maximum likelihood estimator (MLR) was used for the CFA calculations. To indicate the test–retest reliability of the scores, we examined the correlations of the component subscales between both measurement points. Lastly, we analysed the correlations between the SKT-SRL scores and the other included variables.

Results

Descriptive statistics for the SKT-SRL scenario and the component subscales are depicted in Tables 7 and 8. Inspection of the item difficulties revealed a substantial amount of variability; some items seemed difficult while others seemed less difficult (0.15 < p < 0.98) (Table 5). Following the 15% criterion by Terwee et al. (2007), a ceiling effect was only present for the metacognitive scenario for the planning phase. Regarding the reliability evidence, Cronbach’s alphas were respectable-to-very good for the scenario subscales, the component subscales, and the total scale score (Table 7).

Table 7 Validation Study 2: Correlations, means (M), standard deviation (SD), and internal consistency of the SKT-SRL subscales, SRL questionnaire subscales, and academic achievement (GPA)
Table 8 Validation Study 2: Correlations between the SKT-SRL component subscales for both measurement points (Time 1 and Time 2)

In order to examine the factor structure of the newly developed SKT-SRL, a model with seven first-order factors (one for each scenario), three second-order factors (cognition, metacognition, and motivation) and one third-order factor (conditional SRL strategy knowledge) was tested using CFA. Each first-order factor was built up using the nine manifest scores that resulted from pairwise comparisons. The pairwise comparisons showed dependencies (each useless strategy rating was subtracted from each useful strategy rating), so we allowed correlations of the corresponding residuals of the scores that were made up of the same rating. The loadings were restricted to 1 for the first indicator per factor. The first model calculation attempt resulted in a warning concerning the first-order factor for the metacognitive performance scenario, as it showed one factor loading greater than 1 and one negative residual variance indicating a Heywood Case. In order to address this, we constrained the loading of this scenario on the second-order metacognition factor to 1 and constrained its residual variance to 0.01. As these constraints influenced the third-order factor (general conditional SRL strategy knowledge), we had to correct the loading of the second-order motivational factor to 1 and the residual variances of the motivational and the metacognitive factors to 0.01. Subsequently, the results of the CFA indicated an acceptable model fit: χ2 (1758) = 2347.06, p < 0.01, χ2/df = 1.34, RMSEA = 0.040 [0.036 – 0.044], SRMR = 0.083, CFI = 0.945, TLI = 0.939. The (standardized) factor loadings for the first-order factors were 0.62 (planning), 0.95 (performance), and 0.30 (reflection) for the metacognitive second-order latent factor, and 0.43 (planning), 0.80 (performance), and 0.99 (reflection) for the motivational second-order factor. The (standardized) factor loadings for the second-order factors, respectively, the first-order factor for cognition on the SKT third-order factor were 0.79 (cognition), 0.95 (metacognition), and 0.96 (motivation) (see Appendix for Fig. 1). Concerning the test–retest reliabilities, the correlations between both measurement points for the three SKT-SRL component subscales and the overall score reached 0.52 < rtt < 0.62 (see Table 8).

Regarding the interrelations between the cognitive, metacognitive, and motivational subscales of the SKT-SRL, the corresponding correlations were of moderate-to-high magnitude (Table 7). Concerning the SRL questionnaire to measure self-reported strategy usage, in contrast to our hypotheses, we did not find consistently higher correlations for the convergent scales. For example, the SKT-SRL motivational subscales showed a numerically higher correlation with the metacognitive subscale (r = 0.37) of the SRL questionnaire than with the motivational subscale of the SRL questionnaire (r = 0.30).

Concerning the relationships between the new SKT-SRL and the three motivational variables (Table 9), we found moderate correlations between overall SKT-SRL and perceived usefulness of SRL and academic self-concept as well as a small correlation between overall SKT-SRL and academic self-efficacy. With regard to the SKT-SRL scores and the two affective variables (well-being and test anxiety), only he correlations between the SKT-SRL scores and the worry component of test anxiety differed statistically significantly from zero. In contrast, the correlations between the SKT-SRL scores and emotionality as well as well-being did not differ significantly from zero. We The SKT-SRL scores correlated positively with study satisfaction. Regarding the achievement relationships, the SKT-SRL scores correlated in part (metacognitive subscale, overall SKT-SRL scale) negatively with high school GPA (the GPA for the German high school graduation certificate is inversely coded; see Table 7).

Table 9 Validation Study 2: Correlations, means (M) and standard deviations (SD) for the SKT-SRL and other study-relevant constructs

Conclusions and consequences

Besides the aim to replicate the psychometric characteristics of the SKT-SRL demonstrated in the pilot study as well as to provide convergent validity evidence and examine the SKT-SRL–achievement relationship, Validation Study 2 aimed to investigate the factorial structure and test–retest reliability of the SKT-SRL. Moreover, we examined the relationships between the SKT-SRL scores and three motivational constructs (SRL usefulness, academic self-efficacy, academic self-concept), two affective constructs (well-being, test anxiety), and a subjective outcome variable (study satisfaction). With regard to the ceiling effects, reliability evidence, and convergent validity evidence, the results of Validation Study 1 were partially replicated. The CFA results confirmed the (correspondingly modified) factorial structure of the SKT-SRL. Concerning the metacognitive factor, the performance phase scenario showed the highest factor loading. Seemingly, students’ conditional metacognitive strategy knowledge was mainly influenced by conditional knowledge about metacognitive strategies during the actual learning process (e.g. self-observation). While the loading for the planning scenario was also moderately high, the loading for the reflection scenario was low. This suggests that conditional strategy knowledge about metacognitive strategies for the phase after actual learning has ended is less important for overall conditional metacognitive strategy knowledge. One reason for this could be that conditional metacognitive strategy knowledge for the reflection phase was not well established in this sample; this interpretation corresponds to the rather low mean scores for this phase (see Table 5).

The pattern of loadings on the motivational factor suggests that conditional motivational strategy knowledge is mainly driven by conditional motivational strategy knowledge for the performance and reflection phases, while conditional motivational strategy knowledge for the planning phase seems to be less important. The high loadings for both the metacognitive and motivational second-order factors on the overall SKT-SRL factor indicate that these aspects are more central to general conditional SRL strategy knowledge than cognitive strategy knowledge. Nevertheless, the results of the specified CFA model should be interpreted with caution, because we had to allow for the aforementioned constraints. Future studies should, therefore, conduct further CFA on the SKT-SRL with another (larger and even more heterogeneous) sample to identify if, for example, sampling fluctuations caused the Heywood Case in the initially specified model. Besides that, test–retest correlations showed moderate test–retest reliability for the SKT-SRL scores. These results indicate mostly good psychometric characteristics of the test.

Concerning the relationships between the SKT-SRL scores and academic achievement, we found significant low negative correlations with GPA (which is reversed coded in Germany) for the metacognitive subscales as well as the overall scale. This was in contrast to Validation Study 1, where we did not find any significant relationships with academic achievement. One explanation might be that these differing results were caused by sample differences, as the Validation Study 1 sample had a lower mean semester and included psychology students besides teacher students; however, we cannot conclusively determine how these differences affected the results. Nevertheless, the relationship found in Validation Study 2 is low. In addition, the relationships of the SKT-SRL scores with self-reported SRL strategy usage were of moderate size at most. These results are calling for further research to investigate the mechanisms underlying how conditional strategy knowledge is actually translated into strategy usage during the learning process and how conditional strategy knowledge influences academic achievement.

Regarding the relationships between the SKT-SRL and other study-relevant factors, we found low-to-moderate positive correlations with the motivational constructs usefulness of SRL strategies, academic self-efficacy, and academic self-concept, and the worry component of test anxiety, but no significant correlations with the emotionality component of test anxiety and well-being. Concerning emotional study-relevant constructs, it therefore seems that conditional SRL strategy knowledge is significantly related to cognitive, but not affective parts of such constructs. Interestingly, we found a significant positive correlation between the SKT-SRL and the worry component of test anxiety, indicating that students who obtain more conditional SRL strategy knowledge worry more about their future performance than students with less strategy knowledge. It could be hypothesized that students who know more (compared to students who know less) about an ideal self-regulated learning process are more likely to worry that they will not perform adequately with regard to this standard. Moreover, future studies should investigate how this finding is related to achievement differences. For the subjective outcome variable study satisfaction, we found moderate positive correlations. These results give first insight into the relevance of conditional SRL strategy knowledge besides SRL strategy usage for study-relevant factors. Future studies should investigate in more depth why motivational but not emotional constructs are related to conditional SRL strategy knowledge and how these constructs interact with each other.

General discussion

The aim of the sequence of four studies presented here was to develop a scenario-based Strategy Knowledge Test for Self-Regulated Learning (SKT-SRL) for college students to measure conditional knowledge with regard to relevant SRL components (cognition, metacognition, motivation; Boekaerts, 1999) and SRL phases (planning, performing, reflecting; Zimmerman, 2000). Currently, conditional strategy knowledge tests for SRL are limited and those that are available tend to focus on only one component – for example, metacognition – therefore, they do not depict the whole construct (e.g. Karlen, 2017). The SKT-SRL was developed based on theoretical considerations and the results of previous studies on conditional learning strategy knowledge. The SKT-SRL scenarios were based on a selection of scenarios from a former study (Dresel et al., 2015). In Study 1, we validated the scenarios as well as the strategy selection using an expert rating approach, showing a good fit of the selected strategies and scenarios for assessing SRL. Following this procedure, we assumed good content validity of the SKT-SRL. Across three further studies, we investigated the reliability and validity of the instrument. The studies indicated respectable-to-very good reliability for the Cronbach’s alphas and item difficulties (Studies 2 to 4), moderate test–retest reliability (Study 4), and moderate-to-high subscale interrelations (construct validity; Studies 2 to 4) and factorial validity (Study 4). With regard to the relationships between the SKT-SRL and other SRL instruments (convergent validity), the results showed (as expected) moderate correlations with the SRL questionnaire (Studies 2 to 4) and low correlations with SRL microanalysis (Study 3). Moreover, we gained first insight into the criterion validity of the instrument, finding moderate correlations to study-relevant constructs (Study 4). There were some significant correlations between academic achievement and the SKT-SRL (Studies 2 to 4). The following sections provide an in-depth discussion of the results and limitations of our multi-study approach.

Summary of studies and results

With regard to the psychometric properties of the newly developed SKT-SRL, the pilot and validation studies revealed at least partially acceptable assessments. We detected ceiling effects in the pilot study and therefore adapted the scoring mechanism to generate higher variance. In doing so, the ceiling effects in the validation studies were substantially reduced. Regarding the difficulty assessments, using the adapted scoring method, the mean difficulties for the scenario subscales ranged from p = 0.41 to p = 0.77 in both validation studies. This indicated satisfactory subscale difficulties and showed that some scales were more difficult (e.g. metacognitive reflection phase, p = 0.41/0.44) than others (e.g. metacognitive planning phase, p = 0.75/0.77) for the students in our samples. Concerning the reliability evidence for the subscales, both validation studies resulted in acceptably high Cronbach’s alpha values above α = 0.77 (max α = 0.89). The test–retest reliabilities of the subscales ranged from rtt = 0.52 to rtt = 0.62, and thus, were of moderate size. This may be due to the fact that knowledge is by definition seen as a changeable rather than a stable construct. Outside of a true change in strategy knowledge, the students might have thought about SRL strategies after having been confronted with the test and, therefore, they selected different answers at measurement point two. Such a reactivity effect has been found in SRL learning diary studies (e.g., Dörrenbächer & Perels, 2016a). Future research should investigate such changes in conditional SRL knowledge scores in more depth, as to date, no studies have examined this important aspect.

As SRL is often theoretically understood as a competence containing cognitive, metacognitive, and motivational components, we expected to find moderate relationships between the (sub-)scales of the SKT-SRL, providing evidence for construct validity. Indeed, both validation studies using the adapted coding scheme resulted in moderate-to-high correlations (0.32 < r < 0.56). These results show that the single components are related but they also underline that the components are to some extent discriminable from one another. This means that, for example, persons who show often, but not always high conditional metacognitive strategy knowledge also tend to show high cognitive and high motivational knowledge. These moderate correlations are in line with the results of an earlier study using a different (but not yet validated) conditional SRL strategy knowledge test (Dörrenbächer-Ulrich et al., 2021).

Concerning factorial validity, the CFA revealed a good fit for a model with seven latent first-order factors (scenarios), three latent second-order factors (components) and one latent third-order factor (general conditional strategy knowledge). Nevertheless, these results should be treated with caution as we had to modify the model specification of the CFA (see Validation Study 2 for details); therefore, future research is needed that replicates the CFA with another sample. In general, the factorial structure of conditional SRL strategy knowledge has rarely been previously investigated (but see Maag Merki et al., 2013), and while the factorial structure of SRL in general has been analysed more frequently using self-report questionnaires (Roth et al., 2016), these results cannot be easily transferred to SRL strategy knowledge. Moreover, SRL experts are not in complete agreement on what the components of SRL are (Panadero, 2017) and whether SRL is made up of components (Boekaerts, 1999), phases (Zimmerman, 2000), or both (Pintrich, 2000). Based on theoretical assumptions, there seem to be dependencies between SRL strategies that belong to different phases or components (e.g. if all components are present in all three phases, strategies belonging to the reflection phase of Zimmerman’s, 2000, model should by definition be related to strategies in the subsequent planning phase). Future research on the SKT-SRL could further develop the scenarios (e.g. three scenarios per component per phase) and specify a CFA model relying on the combination of SRL components and SRL phases. In doing so, researchers could gain further insight into the strengths and weaknesses of learners’ conditional SRL strategy knowledge and develop corresponding adaptive training programmes.

In the pilot study and both validation studies, we investigated the relationships between the newly developed SKT-SRL and SRL measures that capture strategy usage, such as an SRL questionnaire and SRL microanalysis. As strategy knowledge tests and questionnaires are both seen as offline measures (according to Wirth & Leutner, 2008, we expected to find moderate correlations (in line with previous studies, e.g., Dörrenbächer-Ulrich et al., 2021). In contrast, microanalysis is seen as an online measure that follows a qualitative standard. Therefore, as microanalysis represents a different assessment level, we expected to find somewhat low correlations between the SKT-SRL and SRL microanalysis. While (with one exception) the correlations between the SKT-SRL and SRL microanalysis were non-significant in our studies, we found significant correlations between the SKT-SRL and the SRL questionnaire in most instances (with the size of the correlation mostly varying around r = 0.30, indicating a moderate relationship). This results pattern was in line with our expectations and previous studies on the multimethod assessment of SRL (e.g., Dörrenbächer-Ulrich et al., 2021), and especially with studies on the relationship between conditional strategy knowledge and self-reported strategy usage; while the analyses of Händel et al. (2013) resulted in low-to-moderate relationships in fifth graders, Maag Merki et al. (2013) and Karlen (2017) found moderate relationships in higher secondary and college students. Nevertheless, those studies looked only at metacognitive conditional strategy knowledge and not the remaining SRL components or the more comprehensive SRL competence.

Foerst et al. (2017) analysed college students’ declarative and procedural SRL knowledge and examined the discrepancy with self-reported strategy use. These authors showed that knowledge was significantly higher than usage for all SRL components. Based on these results, we might conclude that SRL strategy knowledge is a necessary prerequisite for SRL strategy usage (Karlen, 2016), and in many cases, this discrepancy could indicate a production deficiency (Hasselhorn, 1996). Sufficient knowledge about strategies presumably does not necessarily lead to automatic and effective SRL strategy application. Additionally, one might hypothesize a usage deficiency, meaning that students could use the strategies with high effort but they might not recognize situations where the strategies would be helpful. As we found moderate correlations between the SKT-SRL and the usefulness of SRL strategies (Study 4), it is reasonable to hypothesize that usefulness plays a mediating role in the relationship between SRL strategy knowledge and actual usage. Only if students rate the strategies as helpful for their own learning, will they tend to use them in an actual learning situation (Rosário et al., 2013). However, these assumptions require more in-depth analyses in future research.

To assess the criterion validity of the newly developed SKT-SRL, we investigated the relationship between conditional SRL strategy knowledge and academic achievement. We took GPA from the German high school graduation certificate as the achievement indicator and found only two small and significantly negative correlations between the SKT-SRL metacognition subscale and the SKT-SRL overall score with GPA (which is reverse coded in Germany) in Validation study 2. For conditional metacognitive strategy knowledge, this is in line with Karlen (2017), who reported a moderate positive correlation with performance in a writing task, and Händel et al. (2013), who detected moderate positive correlations with (recoded) school grades in German and maths. Nevertheless, it remains unclear why we did not find significant correlations between GPA and conditional cognitive and motivational SRL strategy knowledge, or why we obtained the mentioned significant correlations only in Validation Study 2. We should remember that the German high school graduation certificate GPA is a very general achievement marker that combines the grades of several examinations and several school subjects. Moreover, this GPA certificate gives a retrospective account of academic achievement, which might not correspond with actual college performance; although previous studies have shown that the German high school graduation certificate is a good predictor of college performance in German students (e.g. Trapmann et al., 2007). Besides this, the aforementioned production deficiencies (Hasselhorn, 1996) could be one further reason why high SRL strategy knowledge alone does not result in higher grades or better achievements. Future research should investigate these ideas in more detail and analyse if other factors play a moderating role in the size of the relationship between the SKT-SRL scores and college achievements (e.g. motivation for strategy usage, performance level).

Regarding the relationship of the SKT-SRL scores to other study-relevant constructs, we observed moderate correlations between the SKT-SRL and the three motivational constructs examined here: SRL usefulness, academic self-efficacy, and academic self-concept. Students with higher SKT-SRL scores rate the usefulness of SRL strategies as higher, report stronger beliefs that they can cope with demanding situations during their studies, and show better achievements. Moreover, they report higher academic self-concepts. It seems that students with high conditional SRL strategy knowledge also have an adaptive motivational profile. In line with this, we found that students with higher SKT-SRL scores also have higher study satisfaction. Nevertheless, such students also worry more about their future performance, which could impede their achievements. In contrast to these significant relationships, no significant relationships for the two affective constructs – well-being and the emotionality component of test anxiety – were present. It could be that strategy knowledge, as a cognitive construct, shows stronger relationships with self-referring beliefs and weaker relationships with more affective constructs such as those inspected in this study. This should also be investigated in more depth in future studies.

Limitations and future research

Although the present study has provided initial evidence for the reliability and validity of the newly developed SKT-SRL instrument, there are several shortcomings that should be addressed in future studies. First, the sample sizes were relatively small and should be enlarged in future research. Conducting a study with a larger sample would also allow for factorial modelling with two measurement points to investigate measurement invariance and analyse and compare different factor models.

Second, we did not include instruments to investigate the discriminant validity of the SKT-SRL; it would have been interesting to include a knowledge test on a non-related construct (maybe mathematical knowledge) to address this issue. In this context it is somehow unequivocal what to expect for convergent validity with regard to other SRL instruments as SRL strategy knowledge tests, SRL self-report questionnaires, and SRL microanalysis should by definition not be related to a high extent (see Wirth & Leutner’s, 2008, categorization). Future studies could use another conditional SRL knowledge test (or at least a conditional metacognitive knowledge test) and inspect not just the convergent validities of the scores, but also investigate the pattern of the results in more detail. Specifically, we expect higher correlations between theoretically similar constructs and assessment approaches and lower correlations between theoretically different constructs and assessment approaches. Moreover, it would be interesting to include qualitative instruments (in the sense of non-numerical data, e.g., interviews, observational data concerning a specific learning task) and to investigate how these kinds of data are related to the SKT-SRL scores.

Third, as already discussed, we obtained only some significant relationships between the SKT-SRL scores and academic achievement. Our findings are not in line with previous research (Maag Merki et al., 2013) and seem contradictory, as it is assumed that SRL strategy knowledge is a prerequisite for SRL strategy usage, and that the usage of SRL strategies helps to improve performance. Consequently, a more detailed analysis of why the relationships between conditional SRL strategy knowledge and academic performance were sometimes very low is warranted, as SRL strategy knowledge seems theoretically to be a necessary prerequisite for high academic achievement. Using qualitative measures to assess how students have perceived the scenarios and to gather information on specific reasons why they have opted for each strategy to be useful or not could be helpful when investigating this research gap. Moreover, it should be analysed in future studies if the test results differ when specific contents (e.g., a specific course topic) or specific contexts (e.g., a lecture vs. self-study) are brought into focus as one might expect that context-specific SKT scores show higher validity evidence.

Lastly, one might ask if the student in the learning scenario should be of the same gender as the student who gives the strategy usefulness ratings. As the learning scenarios are not used to get information on behavioural intentions but should help to indicate knowledge about strategy usefulness, gender of the protagonist should be irrelevant. It could be hypothesized that a student of the opposite gender would even be helpful to differentiate knowledge from intended or even real strategic learning behaviour. This could be tested empirically by deliberately manipulating the gender of the learning scenario students. Moreover, it could be analysed if SKT-scores differ if the test used “I”-statements (“In the given situation, I would…”). These results could help to investigate the knowledge-usage gap.

Concluding the results of our multi-study approach, the newly developed SKT-SRL can be used in future research to investigate its relationship to other classes of SRL assessment (based on the classification by Wirth & Leutner, 2008) in more depth. Previous studies on the multimethod assessment of SRL have shown that there are only low-to-moderate relationships between different classes and measures of SRL (e.g. Dörrenbächer-Ulrich et al., 2021). It would be helpful to create an overall model that integrates these distinct assessment approaches and investigate how and why they (and the specific SRL components) are related to each other and which specific factors influence the size of the corresponding correlations. In this context, using different assessment methods for differing SRL components seems promising (Dörrenbächer-Ulrich et al., 2021). Besides that, a more person-oriented approach could be helpful when studying conditional SRL strategy knowledge. As it can be assumed that each individual learner has preferences for several learning strategies (Dörrenbächer & Perels, 2016b), a test in which students rate the usefulness of their favourite strategies might illuminate the construct of strategy knowledge from a more qualitative perspective. Additionally, future research should take a closer look at the aforementioned production or usage deficiency (Foerst et al., 2017; Hasselhorn, 1996) and examine how to overcome the gap between knowledge and usage, thereby transferring knowledge into actions and higher academic performance. Studies that investigate the congruence between SKT-SRL scores and real behaviour within comparable learning situations or at least between SKT-SRL scores and other instruments (questionnaires, microanalysis) that use the same scenarios and therefore show higher congruence between the methods would be especially fruitful. In this context, it would moreover be of interest to design SRL interventions that tackle SRL strategy knowledge deficits and investigate whether such SRL strategy knowledge interventions can improve the ability to learn in a self-regulated way from the perspective of SRL strategy use. This could be undertaken in the context of adaptive interventions that only focus on knowledge dimensions where students show deficiencies. Moreover, interventions using utility value approaches (Hulleman & Harackiewicz, 2021) could be one option for increasing the value of SRL strategy usage for college students and, therefore, help to transform knowledge into usage. Lastly, future research could try to investigate the transferability of the newly developed test to other student populations such as secondary school students (as the scenarios are relevant to their everyday learning as well) or even younger students by adopting the scenarios, which is already done by the authors at the moment (David et al., 2024). All things considered, the newly developed SKT-SRL for college students seems to be a promising assessment method in the context of SRL that could stimulate further research on the relationship between conditional SRL strategy knowledge and usage and help foster college students’ SRL in practice.