When reviewing the literature, we found that in terms of granularity, a general distinction can be made between studies that measure and compare the use of specific self-regulatory strategies, such as highlighting, note creation (fine grained), for at least one of the measures, versus studies measuring a global degree of self-regulatory strategy use (coarse grained), using total scores for self-regulatory activity or scales. Making this distinction led to different conclusions in terms of calibration, as described further below. We will first describe the studies comparing specific strategies, followed by a description of the studies comparing students’ global level of self-regulation. Tables 2 and 3 provide an overview of the findings of this review, separated by method of comparison.
Table 2 Schematic overview of studies comparing specific self-regulatory strategies, with + indicating high calibration, − indicating low calibration, and +/− indicating mixed results Table 3 Schematic overview of studies comparing global strategy use, with + indicating high calibration, − indicating low calibration, and +/− indicating mixed results Comparison of specific strategies
Ten studies were retrieved that made a comparison between self-reports and online measurements in terms of students’ use of specific SRL strategies. These studies can be clustered according to the form of online measurement that was used. We will discuss seven studies using trace data (with four studies focusing on specific learning strategies, and three other studies using goal theory as a starting point), one study using think-aloud protocols, one study using eye movements, and one study online forms of self-report, respectively.
One of the first studies since 2000 to compare offline self-report data with an online measure was conducted by Winne and Jamieson-Noel (2002). These researchers used traces of students’ behavior in a software program called PrepMate as an online measure to study the degree of calibration in terms of students’ achievement (alignment between students’ prediction of achievement and their actual achievement) and self-report of study tactics (alignment between self-reports and traces of study tactics). Students studied a chapter on lightning formation, with achievement being measured using six items addressing all levels of Bloom’s taxonomy (Bloom et al. 1956). Questions were worth either five or ten points. After answering a question, students were asked how many of these points they would give themselves, based on the answer they provided to this question. The self-report questionnaire asked students in how many of the seven paragraphs of the text they had used the respective study strategies (for a full list of strategies, see Winne and Jamieson-Noel 2002). Two items on planning were measured dichotomously and scored as no-planning = 0 or planning = 7. Calibration in study tactics was measured by comparing students’ responses on the questionnaire to their behavior in PrepMate, making a comparison between the number of paragraphs in which students reported using the specific study strategies, versus the number of paragraphs in which they were shown to have used these strategies in PrepMate. It was found that despite a consistent general tendency for overconfidence, students were quite well calibrated in terms of their achievement, with a median calibration of r = .88 (although quite some variability among different items was found). More importantly however, there was a higher degree of bias and low calibration in students’ reporting of their use of study tactics, with a median calibration of r = .34. Lowest calibration was found for students’ reports of setting objectives and planning a method for the learning task. Furthermore, calibration of study tactics was not related to achievement, while prior knowledge and calibration of achievement were in fact related to achievement. In other words, the degree to which students were able to accurately report their use of study tactics was not related to achievement, but students with higher achievement scores were better able to predict their achievement, when compared to lower achieving students. This indicates that these two forms of calibration tap into different constructs. Prior knowledge was not related to either form of calibration.
In a follow-up analysis, Jamieson-Noel and Winne (2003) again found significant differences between students’ self-reports of their study tactics and traces of their actual studying behavior. To investigate the predictive value of traces and self-reports on achievement, separate regression analyses were run for both measurement types. Interestingly, when constructing a measure of traces and self-reported overall SRL intensity by averaging the trace scores and responses to the self-report items respectively, results showed that self-reported SRL intensity (i.e. perceived effort spent with the application of study tactics) significantly predicted achievement (explaining 16% of the variance in achievement), while no contribution was found for traces. After clustering strategies to reflect the planning and learning phases in Winne and Hadwin’s (1998) model of SRL (planning, learning, reviewing and monitoring), traces again did not predict students’ achievement. For self-reported strategies, the monitoring phase did emerge as a significant predictor of achievement, explaining 23% of the variance in achievement. It is however important to note that there was no trace for the phase of monitoring, making it impossible for this phase to emerge as a traced predictor. When examining individual tactics, amount of note taking (operationalized as the number of paragraphs in which a student created at least one note) was the only tactic that was a significant predictor when using trace data (23% of variance in achievement explained). In the analysis of self-reports, the significant predictors were reviewing text and review of pictures. In a final analysis, the authors entered both the traces and the self-report items in one blocked regression analysis. In this analysis, the trace for amount of note taking remained a significant predictor of achievement, as well as the self-report items for reviewing text and reviewing pictures, explaining 17% and 26% of the variance in achievement, respectively. Principal component analyses also indicated that traces reveal different forms of SRL than self-reports, with trace data indicating a more active way of studying.
Another study that analyzed students’ online traces was conducted by Hadwin et al. (2007). Hadwin et al. (2007) used a similar software program called gStudy to compare eight students’ self-reports of self-regulated learning strategies on the MSLQ to their actual use of specific self-regulatory strategies as measured by the traces. Students studied a chapter in a course on introductory educational psychology, which would later be tested on a final exam (no information is given about the content of this exam or students’ achievement on this). They clustered students based on their responses to the MSLQ into High, Medium and Low self-regulators. They then tried to identify similarities within clusters in terms of traced study activities. It was found that there were few similarities between students within the same clusters (with even the most highly calibrated students showing good calibration on only 40% of studying activities), indicating that there may be a low calibration between students’ self-reports and their actual use of self-regulated learning strategies.
Finally, a study using online traces was conducted by Hadwin et al. (2004), who clustered eight students into the categories of High, Average, Low and Improved performers on the basis of their progression in test performance achievement from pretest to posttest. A software program called CoNoteS2 was used to collect traces of students’ studying activities while studying three chapters on sex differences in the context of an instructional psychology course. These trace data were compared with weekly self-report reflections that students wrote regarding their studying tactics. Achievement was measured by students’ recall at three levels (unistructural, multistructural and relational), thereby essentially covering text recall and comprehension. They found that High performers were better calibrated than Low performers. However, they also found that studying activities as identified by traces could not independently explain the students’ performance developments, indicating the need for additional measures to come to a complete picture.
Zhou and Winne (2012) investigated calibration of a different aspect of SRL, focusing on the comparison of specific achievement goals as measured by self-reports versus trace data. Self-report data were collected with the Achievement Goal Questionnaire (AGQ; Elliot and McGregor 2001). Trace data were collected in gStudy (Winne et al. 2006). In gStudy, participants studied an article about hypnosis, in which they were presented with a predefined set of hyperlinks and tags related to each of the four goal orientations (e.g. “I want to learn more about this” as an indicator of a mastery-approach goal). Goal orientations were inferred by counting the number of hyperlinks students clicked and the number of tags they used. Achievement was operationalized as text recall and text comprehension. For all goal orientations, there were significant differences between students’ self-reports of their goal orientations and the traces collected in gStudy, with effect sizes ranging between d = 1.39 and d = 3.94. A significant correlation with reading achievement posttest performance was found for traced goal orientations (correlation coefficients ranging between rτ = .17 and rτ = .23), but not for self-reports.
Also focusing on goal theory, Adesope et al. (2015) investigated whether achievement goals could influence the use of learning strategies, and whether these learning strategies could in turn influence students’ online learning behavior. The authors used the Goal Orientation Questionnaire (GOQ; Nesbit et al. 2008) to measure students’ goal orientation. Learning strategies were measured using the MSLQ. Students’ learning behavior was measured while studying an electronic chapter in gStudy (Winne et al. 2006). Although trace data were used in addition to the self-report questionnaire rather than the two measures being explicitly compared, it is interesting to note that there was a predictive relationship between the questionnaire subscales and learning behavior. Specifically, effort regulation and task value, as measured by the MSLQ, showed a positive predictive relationship with the number of notes and tags that were created in gStudy, as well as with duration of study and the total number of actions completed in gStudy. Furthermore, except for rehearsal, the different learning strategies measured by the MSLQ (elaboration, organization, and metacognitive self-regulation) showed positive correlations with learning behavior, with elaboration showing positive correlation with study duration, the total number of actions, and the total number of notes and tags created, organization showing positive correlations with the total number of actions and the total number of notes and tags created, and metacognitive self-regulation showing a positive correlation with the total number of actions and the number of tags created. Correlation coefficients ranged between r = .21 and r = .42. This predictive relationship between self-reported learning strategies and students’ actual behavior indicates that the MSLQ does in fact tap into an important construct and that students might actually be relatively successful in reporting their use or the importance they assign to these strategies.
Finally, Bernacki et al. (2012) used a trace methodology to examine possible relationships between students’ achievement goals, strategy use and comprehension performance. Although they did not make an explicit comparison between traces and self-reports in this study, a comparison was made to earlier studies answering the same research questions using self-report questionnaires. Students used nStudy to study texts on human development and ADHD, with achievement being operationalized as text comprehension. Goal orientation was measured with the Achievement Goals Questionnaire-Revised (Elliot and Murayama 2008). Trace data only replicated a portion of the relationships between goal orientations and learning strategies that were previously reported in self-report studies. Specifically, performance approach goals did not predict any learning strategies, while mastery goals predicted strategies associated with organization and elaboration (specifically note taking and information seeking), and marginally predicted metacognitive monitoring (specifically monitoring of progress), with effect sizes ranging between .13 and 2.75, leaving a general pathway from mastery goals to strategies. Performance avoidance orientation showed a negative relationship with note taking and information seeking behavior, with effect sizes of −1.34 and − .31, respectively. The results indicate incongruence between self-reports and trace data for goal orientations, calling into question the validity of self-reports for the measurement of this metacognitive construct. Situation model comprehension (but not text based comprehension) was predicted by traces of highlighting and progress evaluation, with effect sizes of .05 and .06, respectively.
Furthermore, self-reports of SRL were compared with think-aloud protocols. De Backer et al. (2012) used the prospective Metacognitive Awareness Inventory (MAI; Schraw and Dennison 1994) and a think-aloud protocol to investigate the effect of a reciprocal peer tutoring intervention on students’ metacognitive knowledge and strategy use. Students worked on authentic assignments in the context of instructional sciences, requiring critical thinking, problem solving, negotiating and decision making. The questionnaire data and think-aloud protocols showed diverging results. While MAI scores revealed no difference in metacognitive knowledge and regulation between pretest and posttest, think-aloud data showed an increase in the frequency of use of metacognitive skills, with effect sizes ranging between d = .45 and d = 3.12, as well as an increase in the variation of metacognitive skills.
Furthermore, we found one study that used eye movements as the online measure when making the comparison with offline self-reports. Susac et al. (2014) used eye-tracking data to study students’ strategies when rearranging algebraic equations. Eye-tracking data were compared to a self-report questionnaire in which students had to indicate which strategies they had used during the task. Results indicated incongruence between students’ self-reports and eye-tracking data. Eye-tracking scan paths revealed several strategies that students did not report in the self-report questionnaire. For example, of the 15 students who indicated that they never checked the provided answers, 51.5% of trials in fact showed a return in eye movements to the answers. In other words, students’ metacognitive calibration appeared to be limited, although considerable individual variability was found. Participants who showed higher accuracy in their metacognitive judgments were more successful in efficient equation solving, when compared to students with lower metacognitive accuracy. Furthermore, the eye-tracking data provided a more reliable prediction of equation difficulty, when compared with students’ self-reported difficulty rankings. Finally, these eye-tracking measures predicted students’ performance in terms of inverse efficiency. Inverse efficiency was operationalized as the ratio between response time and accuracy. Low efficient students showing a higher number returns from answers back to equations than high efficient students, a result which the authors explained by suggesting that high efficient students had better insight into where they should looking, thereby requiring fewer returns. However, the authors did not compare this result to the questionnaire data.
Finally, some studies have compared the use of offline self-report questionnaires to online forms of self-report. Cleary et al. (2015) compared students’ responses to the MSLQ to their responses to self-report micro-analytic questions delivered to the students by the examiner, assessing exam preparation. The relationship between students’ MSLQ scores and their responses to the micro-analytic strategy questions was not significant. Furthermore, the micro-analytic strategy questions were a better predictor of students’ academic performance than the MSLQ. Specifically, there were no significant correlations between exam scores and MSLQ scales, while the weighted micro-analytic strategy measure significantly predicted students’ grade on the final exam, with a correlation coefficient of r = .29.
Overall, studies that focus on the use of specific strategies when comparing self-report questionnaires with behavioral measures indicate low calibration between the two forms of measurement (Adesope et al. 2015; Bernacki et al. 2012; Cleary et al. 2015; De Backer et al. 2012; Hadwin et al. 2004, 2007; Jamieson-Noel and Winne 2003; Susac et al. 2014; Winne and Jamieson-Noel 2002; Zhou and Winne 2012). Traces tend to have a higher predictive value in terms of achievement than self-reports.
Comparison of global use of self-regulatory strategies
As opposed to the 10 studies comparing different types of measurement for specific self-regulatory strategies, four other studies have focused on a global measure of self-regulation, using total or subscale scores that aggregate different self-regulatory strategies. Three studies focused on problem-solving, while one study used an electronic portfolio system.
Cooper et al. (2008) developed a multi-method instrument to measure students’ metacognition in chemistry problem-solving across time. In order to do so, they compared students’ answers on the prospective self-report Metacognitive Activities Inventory (MCA-I; Cooper and Sandi-Urena 2009) to their study strategies in an online problem-solving environment called IMMEX. In IMMEX, students work on ill-defined chemistry problems while their problem-solving activities are recorded. For example, the number of relevant information pieces considered before trying to solve a problem is used as an indicator of planning. The researchers found convergence between their self-report instrument and students’ behavior in the online environment, in the sense that students who performed more metacognitive strategies in the online environment also had higher scores on the questionnaire, as compared to students who executed fewer metacognitive strategies. Furthermore, there was a significant correlation between students’ problem-solving performance and their strategy use in the online environment, as well as with their scores in the self-report questionnaire.
In a later study on problem-solving, Sandi-Urena et al. (2011) used the MCA-I and the IMMEX environment to assess the effects of a cooperative intervention on students’ metacognitive awareness and strategy use. In this study, the intervention led to a decrease in self-reported metacognitive strategy use (interpreted by the authors as an increase in metacognitive awareness) as measured by the MCA-I (with an effect size of d = .10 for the difference between the two groups at posttest), but no changes were observed in actual use of metacognitive strategies in the IMMEX environment. Regardless of the direction of the results and the interpretation of this (a decrease in metacognitive strategies versus an increase in metacognitive awareness), the inconsistency between the self-report questionnaire and the use of metacognitive strategies in the IMMEX environment points to an incongruence between students’ self-report and the trace data. As an explanation for this incongruence, the authors propose that the MCA-I might put a greater emphasis on reflection, rather than metacognitive skill application. However, we propose it could also be due to a greater sensitivity of the MCA-I to changes from pretest to posttest, or a lower validity of this instrument with students reporting socially desirable answers as a result of having been exposed to the intervention. Interestingly though, the intervention did lead to an increase in students’ problem-solving ability, suggesting that the increase in MCA-I scores might have tapped into an actual change in students’ strategies.
Finally, Wang (2015) used a multimethod approach to investigate the general and task-specific aspects of metacognition in different topics in chemistry problem solving (molecular polarity and thermodynamics). Self-reported metacognitive skill was measured with the Inventory of Metacognitive Self-Regulation (IMSR; Howard et al. 2000). Concurrent metacognitive skill was measured using a think-aloud protocol. Furthermore, confidence judgments and calibration accuracy values were obtained. Results indicated a significant association between self-report questionnaire scores and concurrent metacognitive skill use as measured by the think-aloud protocol (with a correlation coefficient of r = .36 for the thermodynamics task, and r = .49 for the molecular polarity task). For the task on molecular polarity, both the self-report questionnaire and the think-aloud protocols showed a significant correlation with performance (r = .39 and r = .55, respectively). For the thermodynamics task, only the think-aloud protocols showed a significant correlation with performance (with a correlation coefficient of r = .40). The author concludes that the self-report questionnaire assesses a context-independent, general and common aspect of metacognition, while think-aloud methodology assesses context-specific metacognition.
Nguyen and Ikeda (2015) developed and evaluated an electronic portfolio system to support SRL in students in the context of two university courses with ICT topics. They used the MSLQ to measure self-reported SRL strategies and examined traces in the ePortfolio environment to assess students’ actual use of strategies. Results indicated differences from pretest to posttest and between experimental groups for MSLQ scores, congruent with overall increases in SRL strategies observed in the trace data, which could be interpreted as calibration of self-reported study strategies.
Taken together, these studies (Cooper et al. 2008; Nguyen and Ikeda 2015; Sandi-Urena et al. 2011; Wang 2015) indicate that when studies examine the global level of self-regulation, students are relatively well able to report on their use of self-regulatory strategies. This is in contrast with the results from the studies comparing specific self-regulatory strategies, where low calibration is found between the two types of measurement. There appears to be individual value of self-reports of global self-regulation when predicting academic achievement. Self-reports of strategy use can predict achievement, over and above the predictive value of the trace data that were used in the studies. These differential results indicate that different types of measurement (self-report versus online measures) are appropriate for different types of research questions or interventions, a point further elaborated upon in the Discussion.