Introduction

While traditional, face-to-face education is still serving most students, online forms of education are growing rapidly (Allen and Seaman 2014). Massive Open Online Courses (MOOCs) are an example of these new forms of education. In most cases, these courses are free of charge and open for all; there often is no need for prior knowledge. MOOCs offer many opportunities. For example, they allow access to education for those in locations were high quality education is not available (Owston 1997; Walsh 2009). MOOCs also provide opportunities for professional development (e.g. employees can enrol in courses relevant to their careers). The rise of online education is, however, not without its challenges. As MOOCs are often not only open in access, but also in location, time and pace of completion, they allow students to study when and where they prefer. There is thus an increase in the autonomy provided to students attending a MOOC compared to students attending a traditional course. This presses MOOC students to take control of their own learning process (Garrison 2003) and to engage more and differently in strategies to regulate their study behaviour (Dillon and Greene 2003; Hartley and Bendixen 2001; Littlejohn et al. 2016). Students must actively plan their work, set goals, and monitor their comprehension and the time they spend on learning. These activities can together be defined as self-regulated learning (SRL).

Self-regulated learners are described as learners who are active participants in their learning process (Zimmerman 1986). Self-regulated learners are not only metacognitively and behaviourally active during the process of learning (performance phase), but also before (preparatory phase) and after the learning task (appraisal phase) (Puustinen and Pulkkinen 2001). SRL encompasses task strategies—the cognitive processes learners engage in—and the activities to regulate these cognitive processes (Winne and Hadwin 1998). An overview of the activities belonging to each of the three phases can be found in Fig. 1. This overview is adapted from a review of theoretical models of SRL conducted by Puustinen and Pulkkinen (2001). The overview presents the commonalities found in the review between theoretical models of SRL. Where general terms (e.g. control) were used by Puustinen and Pulkkinen (2001), the overview was complemented with the specific processes mentioned in the individual models (Pintrich 2000; Winne and Hadwin 1998; Zimmerman 2002).

Fig. 1
figure 1

Overview of SRL activities categorized into three phases

Before starting a task (Fig. 1, preparatory phase), self-regulated learners define the task at hand, set goals for themselves and construct a plan on how to conduct the task (Puustinen and Pulkkinen 2001). In traditional education, task definition and goal setting are generally carried out by the lecturer, for example by setting course goals and informing students of the aim of the lecture. In MOOCs, however, learning goals may be set less strictly. First of all, due to the openness in time found in MOOCs, students can decide for themselves when they want to study which parts of the course (Deal III 2002). Second, in MOOCs there is often no clear boundary between taking a course and not taking a course; students have autonomy over which parts of the course they want to master (Mackness et al. 2010). Third, course objectives are often not specific or clearly communicated in MOOCs (Margaryan et al. 2015). This requires additional goal setting and planning of students enrolled in MOOCs compared to students in traditional education.

Self-regulated learners are also actively engaged during the learning task (Fig. 1, performance phase). Activities students are involved in include environment and time management, task strategies to master the task content, comprehension monitoring, and help seeking (Pintrich 2000; Puustinen and Pulkkinen 2001; Winne and Hadwin 1998). Furthermore, self-regulated students also keep their motivation up to par (Pintrich 2000; Winne and Hadwin 1998; Zimmerman 2002). While students in traditional education also need to engage in these activities, they are more important in MOOCs as they encompass greater student autonomy (Garrison 2003).The openness in time and place makes students solely responsible for their time and environment management (Williams and Hellman 2004). Furthermore, students often do not have regular contact with fellow students in a MOOC; work is in most cases done individually (Toven-Lindsey et al. 2015). Without collaboration, there is also a lack of peer support, making it harder for students to stay motivated (Bank et al. 1990; Nicpon et al. 2006).

After finishing the task (Fig. 1, appraisal phase), self-regulating students reflect on their performance by comparing their achievements to the goals they set (Zimmerman 2002). Based on this evaluation, students adapt their study strategies in the—sometimes very near—future (Pintrich 2000; Winne and Hadwin 1998). Overall, the increase in student autonomy in a MOOC is what makes MOOCs accessible to larger groups of students than traditional courses. However, this increased autonomy makes self-regulation a necessity in MOOCs (Chung 2015; Dillon and Greene 2003; Garrison 2003; Hartley and Bendixen 2001; Littlejohn et al. 2016; Williams and Hellman 2004).

Measuring SRL

Previous studies have shown the importance of SRL for achievement in traditional education (Pintrich and de Groot 1990; Winters et al. 2008; Zimmerman and Martinez-Pons 1986). As student autonomy is greater in MOOCs than in traditional courses (Garrison 2003), it is likely that SRL is even more important for achievement in MOOCs. In order to study the importance of SRL and the relationship between SRL and achievement in MOOCs, an instrument is needed to measure students’ SRL in MOOCs. Existing questionnaires, however, are not fit for this purpose as they have not been validated for use in online education (including MOOCs). Furthermore, they do not measure the full range of SRL activities. In this paper, therefore, a self-regulated online learning questionnaire (SOL-Q) will be developed and validated in the context of MOOCs.

Several questionnaires are available to measure SRL. These include the Motivated Strategies for Learning Questionnaire (MSLQ; Pintrich et al. 1991), the Online Self-regulated Learning Questionnaire (OSLQ; Barnard et al. 2009), the Metacognitive Awareness Inventory (MAI; Schraw and Dennison 1994), and the Learning Strategies questionnaire (LS; Warr and Downing 2000). When comparing the aspects of SRL measured by the different questionnaires, as is done in Table 1, it becomes clear that the only aspect of SRL present in all four questionnaires is task strategies. Furthermore, it becomes clear that while all questionnaires measure some aspects of SRL, none of these questionnaires measure all aspects of SRL presented in Fig. 1. The MSLQ, for instance, which is the most widely used questionnaire in SRL research (Duncan and McKeachie 2005), covers a range of scales from the performance phase, but does not measure self-regulatory behaviour in the preparatory and appraisal phases. The MAI is the only questionnaire that includes scales from all three phases. The MAI, however, does not include time and environment management which are critical aspects of SRL in MOOCs due to the openness in time and place. The absence of an instrument that provides a comprehensive measurement of SRL is a first indication that there is a need for the development of a new SRL questionnaire.

Table 1 Overview of questionnaire scales

Another issue concerning the existing questionnaires is that their validity in online settings has not been established. Measures developed for traditional classrooms must be validated for use in online settings (Tallent-Runnels et al. 2006). The MSLQ, the MAI and the LS have been developed for measurement of SRL in traditional face-to-face education. A recent study has shown that the MSLQ could not be validated in an asynchronous online learning environment (Cho and Summers 2012). Additionally, the validity of the MAI and the LS in online settings has not yet been tested. The OSLQ is the exception as it has been specifically designed for use in online learning. This questionnaire is nevertheless limited in the aspects of SRL that it measures, as can be seen in Table 1. As the validity to use the existing questionnaires in an online setting—with the exception of the OSLQ—has not been established, this provides a second indication that there is a need for the development of a SRL questionnaire suitable for online education, in this study for MOOCs.

In conclusion, it can be stated that while all four questionnaires measure some aspects of SRL, no questionnaire is by itself suited and validated to measure all aspects of SRL in MOOCs, a form of online education. There is, however, need for such a questionnaire as SRL appears to be even more important for success in MOOCs than in traditional education. In the present study, a questionnaire to measure self-regulation in MOOCs will therefore be developed and validated. The questionnaire consists of items from the above mentioned questionnaires (i.e. MSLQ, OSLQ, MAI, LS). After administering this questionnaire in a MOOC, exploratory factor analysis will be conducted. Next, confirmatory factor analysis will be conducted on a second dataset collected in a different MOOC. With the confirmatory factor analysis, model fit of the exploratory found factors will be compared to model fit of the factors originally specified in the questionnaire.

Questionnaire development

The questionnaire to measure self-regulation in MOOCs was developed by combining items from the discussed questionnaires (MSLQ, OSLQ, MAI, and LS) into a single questionnaire that covered the whole range of SRL activities as stated in Table 1. The items in the questionnaires were categorized as belonging to one of the three phases and to one of the activities within these phases.

When items within a scale were highly similar between questionnaires, only one of the overlapping items was retained. For instance, overlap existed between the scale time and study environment in the MSLQ and the scale environment structuring and time management in the OSLQ. Only part of the items in these scales were therefore retained. Furthermore, the phrase “in this online course” was added to all items to define the focus of the questionnaire, thereby informing students of what context the questions related to. For example, the item “I think about what I really need to learn before I begin a task” from the MAI was changed into “I think about what I really need to learn before I begin a task in this online course”. In some items the phrase “in this class” was already present. In those cases, “in this class” was replaced with “in this online course”.

This final questionnaire contained 53 items divided over eleven scales. These scales are task definition, goal setting, strategic planning (preparatory phase), environmental structuring, time management, task strategies, help seeking, comprehension monitoring, motivation control, effort regulation (performance phase), and strategy regulation (appraisal phase). An overview of these scales and the number of items contained in each scale can be found in Fig. 2. The origin of the questionnaire items can be seen in Table 1. All items have to be answered on a 7-point Likert scale, ranging from “not at all true for me” (=1) to “very true for me” (= 7). This is in line with the answering format of the MSLQ, the questionnaire from which most items were obtained. The MAI, the OSLQ, and the LS employ a 5-point Likert scale.

Fig. 2
figure 2

Overview of the scales in the theoretical model

Exploratory factor analysis

Method

MOOC

The data for the exploratory factor analysis (EFA) was obtained from a MOOC on Marine Litter. This MOOC was offered by the United Nations Environment Programme (UNEP) and the Dutch Open University (OUNL). The MOOC ran from October 2015 until December 2015 and lasted eight weeks. A total of 6452 students registered for the MOOC. Their participation in the MOOC was voluntary. Each week consisted of two blocks on related topics. Each block consisted of 30 min of video, 1 h of studying background materials, and 30 min of tasks or assignments. Each week thus had a study load of 2 × 2 h. The MOOC was open in terms of costs, program and time. The pace of the MOOC was however fixed, as the start and end date were set.

Participants

Complete data on the questionnaire was gathered from 162 students (Mage = 38.2, 49 males). The sample included 92 different nationalities. These students responded voluntarily to the invitation to fill out the questionnaire.

Procedure

Students in the MOOC on Marine Litter were sent an invitation by email to fill out the SRL questionnaire. This invitation was sent in week 6 of the course to make sure students could reflect on their actual self-regulation behaviours, and not on their planned behaviour as would be the case when sending out the questionnaire at the start of the course. Before answering the questions, informed consent was obtained from all individual participants included in the study. All 53 items were then presented in random order. Filling out the questionnaire took 5–10 min. Students received no compensation for their participation. The procedures followed in this study, including those for the data collection and storage, were approved by the local ethics committee.

Analysis

EFA was conducted. The most commonly used methods to determine the number of factors to extract are the Kaiser criterion, which retains factors with an eigenvalue >1, and the examination of the screen plot for discontinuities. However, these methods result in an inaccurate number of factors to retain, as the Kaiser criterion is known to overfactor and the examination of the scree plot is highly subjective (Zwick and Velicer 1986). In their comparison of methods for factor retention, Zwick and Velicer 1986 found parallel analysis to be the most accurate procedure. With parallel analysis, random data matrices are created with the same sample size and the same number of variables as the gathered data. Factors are then extracted in each random data matrix and the found eigenvalues are averaged over all randomly created matrices. The final step is comparing the average eigenvalues with the eigenvalues found when extracting factors from the gathered data. The number of factors present in the gathered data is equal to the number of factors for which the eigenvalues from the gathered data are above the average eigenvalues from the random data (Hayton et al. 2004). The underlying rationale in parallel analysis is that components underlying real data should have higher eigenvalues than components underlying random data (Schmitt 2011). As parallel analysis is the most accurate measure to determine the number of factors to retain, parallel analysis was used as input for the number of factors to retain in the EFA.

Results

Data were removed from participants for whom the SD of their answers was below 1 to filter the data for outliers. Data from 154 participants remained for analysis. Data from reverse phrased items was then recoded.

Parallel analysis

Parallel analysis (n = 2000) was conducted to determine the number of factors present in the data (O’Connor 2000). Random data matrices were created by permutations of the raw data, as the data was not normally distributed. Five factors were found to be present.

Factor analysis

A factor analysis was conducted by using principal axis factoring with oblique rotation. The factor structure was specified to have five factors. The found distribution of items over the five factors was difficult to interpret. This was mostly due to items belonging to the scale task strategies that had scattered over all five factors. The eight items belonging to task strategies were therefore removed from the dataset.

A new parallel analysis (n = 2000) again indicated the existence of five factors in the gathered data, which now consisted of 45 items. Principal axis factoring with oblique rotation was repeated to determine the distribution of items across factors. The found model explains 46.58 % of the variance in the data. The pattern matrix was inspected to identify items that did not fit in the factor structure. Two types of items were removed: first, items for which the second highest factor loading was above .32 (Tabachnick and Fidell 2001). Second, items with a factor loading above .32 on two or more factors for which the difference between the highest and the second highest factor loading was below .15. The resulting division of items over factors is in line with the results from the structure matrix. The pattern and structure matrices can be found in ‘Appendix 1’. The resulting items were used to interpret and label the five factors. This was done by two researchers. The resulting factors are: metacognitive skills, help-seeking, time management, persistence, and environmental structuring. An overview of the factors, their reliability and the number of items in each factor can be found in Fig. 3. The original scales (top) as well as the scales emerging from the EFA (bottom) are displayed in this figure according to the three phases of self-regulation. The arrows indicate how items ‘moved’ from the original scales into scales resulting from the EFA. Reliability of the scales obtained from the EFA ranged between α = .68 and α = .91.

Fig. 3
figure 3

Overview of the scales in the theoretical model and in the exploratory model

Discussion

The EFA has resulted in a factor model different from the model theoretically specified. In the theoretical model eleven scales were specified, while only five were found with EFA (see Fig. 3). These five scales are labelled metacognitive skills, environmental structuring, time management, help seeking, and persistence. The models are similar when focusing on the scales environmental structuring, time management, and help seeking. The models differ in three important ways: the removal of task strategies, the large scale metacognitive skills, and the creation of the persistence scale to account for effort regulation and motivation control.

The scale task strategies was present in the theoretical model, but the items belonging to this scale were removed from the analysis to create the exploratory model. As mentioned in the results section, the items belonging to the task strategies scale scattered over all factors. This made it impossible to interpret the resulting factor structure. By removing this scale, a different factor structure emerged; the other items were now also grouped differently. From a theoretical point of view, the removal of task strategies from the questionnaire to measure SRL suits the distinction between the execution of learning activities (task strategies) and the regulation of these learning activities (e.g. strategic planning). This can be compared to the distinction often made between cognition and metacognition (Mayer 1998; Van Leeuwen 2015; Vermunt and Verloop 1999).

Second, items belonging to five different scales in the theoretical model are combined into one large scale in the exploratory model: metacognitive skills. Not only did items belonging to the same phase of self-regulation (task definition, goal setting, and strategic planning) cluster; items from the two other phases (comprehension monitoring and strategy regulation) were also incorporated. Students engaged to a similar extent in the different phases of metacognitive activities. There were no students who only engaged in for example task definition but not in comprehension monitoring. While theoretically different constructs, it was found that when students engage in metacognitive activities, they do so in all phases.

The third important difference between the theoretical and the exploratory model is the clustering of items belonging to motivation control and effort regulation into a single scale persistence. While motivation and effort are different constructs and the items came from different questionnaires, their merge into a single scale can be understood when inspecting the items. For instance, the item “When I begin to lose interest for this online course, I push myself even further” comes from motivation control. The comparable item “Even when materials in this online course are dull and uninteresting, I manage to keep working until I finish” comes from effort regulation. With similar items, it is likely that it was impossible to distinct between the scales, leading to their merge into the scale persistence.

Thus, the EFA yielded a model that differed from the theoretical model in significant ways. In the next step, a confirmatory factor analysis will be performed on a different data sample to compare different models. The model fit of four factor models will be compared: (1) the theoretical model with the scale task strategies, (2) the theoretical model without the scale task strategies, (3) the exploratory model and (4) an exploratory-theoretical model. This exploratory-theoretical model is created to combine the valuable empirical insights gathered from the EFA while acknowledging the phases present in SRL explicitely mentioned in all models of SRL (Puustinen and Pulkkinen 2001). The exploratory-theoretical model is created by using the exploratory model as a base. The theoretical perspective is then incorporated by splitting the large scale metacognitive skills into three scales, in line with the three phases of SRL: the preparatory, the performance, and the appraisal phase. In Fig. 4 the exploratory-theoretical model is presented in relation to the exploratory model. The items from task definition, goal setting and strategic planning are placed in the scale metacognitive preparatory, the items from comprehension monitoring are placed in the scale metacognitive performance, and the items from strategy regulation are placed in the scale metacognitive appraisal. This adaptation strengthens the link between the model and theory on SRL. A side effect is that it also makes the distribution of the number of items over scales more even.

Fig. 4
figure 4

Overview of the scales in the exploratory model and in the exploratory-theoretical model

A comparison of the theoretical model with the exploratory model also showed that three items had moved into a different scale with the EFA. For instance, an item that originally belonged to task definition was placed in the scale environmental structuring in the exploratory model. These three items were placed back in their theoretical scales in the exploratory-theoretical model.

Confirmatory factor analysis

Method

MOOC

The data for the confirmatory factor analysis (CFA) was gathered in the Dutch MOOC “The adolescent brain”. This MOOC was offered by the Open University of the Netherlands on the Emma European MOOC platform. The MOOC ran from April 2016 until June 2016 and lasted seven weeks. Approximately 1000 students registered for the MOOC. Their participation in the MOOC was voluntary. The study load of each week was approximately 4 h, excluding additional reading materials. Each week consisted of several video lectures, each linked to an assignment.

Participants

Complete data was gathered from 159 students. These students filled out the questionnaire as a voluntary assignment at the end of the third week of the course. Due to technical difficulties, demographic data of these students was unfortunately lost. Demographics of the participants in the pre-course survey are likely to be similar (Mage = 44.1, 18.6 % males). As the course was taught in Dutch, there was less diversity in nationalities than in the first dataset. Participants with 12 different nationalities participated in the pre-course survey.

Questionnaire

The questionnaire administered in this study was similar to the questionnaire described in the section Questionnaire Development. In this study, however, participants could choose between the original English version and a translated Dutch version. To create the Dutch version, two native Dutch speaking researchers (the first and second authors of this paper) independently translated the questionnaire. Differences in their translations were resolved by discussion.

Procedure

Videos and assignments for each week were posted on the MOOC website. The last assignment for week 3 was the invitation to fill out the SRL questionnaire. Before answering the questions, informed consent was obtained from all participants included in the study. All 53 items were then presented in random order. Filling out the questionnaire took 5–10 min. Students received no compensation for their participation. The procedures followed in this second data study were also approved by the local ethics committee.

Analysis

CFA was conducted with SPSS AMOS. Four models were analysed, the first being the theoretical model, including task strategies (53 items, 11 scales). The second model is the theoretical model without task strategies (45 items, 10 scales). The third model was the exploratory model (36 items, 5 scales). The fourth model, the exploratory-theoretical model, was constructed based on the outcomes of the EFA as well as the original theoretical model. It is thus a combination of the exploratory and theoretical models (see Fig. 4) The four models were compared based on the χ2, NC (normed Chi square), RMSEA (root mean square error of approximation), AIC (Akaike information criterion) and CFI (comparative fit index) scores (Hooper et al. 2008; Kline 2005).

Results

Data were removed from participants for whom the SD of their answers was below 1 to filter the data for outliers. Data from 153 participants remained for analysis. Data from reverse phrased items was then recoded.

An overview of the model fit statistics of the different models can be found in Table 2. The χ2, NC and the RMSEA are absolute fit indices, whereas the AIC and the CFI are relative fit indices (Schreiber et al. 2006). The χ2, NC, and the RMSEA are therefore not useful to compare the fit of the different models, but they provide an indication of the quality of the models tested. The χ2 test indicates the difference between observed and expected covariance matrices; smaller values therefore indicate better model fit (Gatignon 2010). The test should be non-significant for model acceptance, which it is not for any of the models tested in this study. Chi square is, however, highly dependent on sample size (Kline 2005). Therefore, normed Chi square (NC) is often considered instead of Chi square. For NC, Chi square is divided by the degrees of freedom. Smaller values are better and values of 2.0–3.0 are considered to indicate reasonable fit (Kline 2005; Tabachnick and Fidell 2001). All four models have NC values below 2.0, indicating acceptable fit. The RMSEA analyzes the difference between the population covariance matrix and the hypothesized model. Smaller values indicate better model fit; a value smaller than .08 is acceptable (Gatignon 2010). The exploratory model and the exploratory-theoretical model thus show acceptable fit, while the theoretical models are bordering acceptable fit. The RMSEA, however, often falsely indicates poor model fit with small samples (Kenny et al. 2015). Both absolute fit tests thus indicate that none of the models in itself provides a good fit to the data. Given this fact, the fit of the models may however still be compared between the four models.

Table 2 Model fit statistics CFA

The comparative fit indices CFI and AIC are used to determine which model best fits the data. The CFI compares the fit of the tested model to the fit of the independence model in which all latent variables are uncorrelated (Hooper et al. 2008). This statistic ranges between .0 and 1.0 and higher values indicate better model fit. A CFI value ≥.95 indicates good fit; none of the models meets this criterion. We are however using the CFI to determine which model best fits the data, and the exploratory model performs better than the exploratory-theoretical model, which in turn performs better than both theoretical models. The AIC scores do not have a criterion value, but the smaller the value, the better (Schreiber et al. 2006). These scores also indicate that the exploratory model shows the best fit, followed by the exploratory-theoretical model.

The reliability of the scales (see Table 3) provides further information to compare the fit of the different models. Most scales show good to reasonable reliabilities; strategy regulation/metacognitive-appraisal (.493) is the only exception. When combining this scale with the metacognitive scales from the preparatory and performance phase into one metacognitive skills scale (the exploratory model), reliability of the scale increases drastically (.902). Based on scale reliabilities, the exploratory model shows the best fit, followed by the exploratory-theoretical model and the theoretical models.

Table 3 Reliability of scales calculated from dataset 2

General discussion

A questionnaire to measure self-regulated learning in fully online courses was developed, the SOL-Q. This questionnaire was tested in the context of MOOCs by conducting an exploratory factor analysis (EFA) and a confirmatory factor analysis (CFA) on two separate datasets collected in two different MOOCs. The EFA resulted in a different factor model (the exploratory model) than the model that was theoretically specified beforehand (the theoretical model). The three major differences were the removal of the scale task strategies, the merge of effort regulation and motivation control into a single scale persistence, and the merge of the separate metacognitive scales into a single scale metacognitive skills. Based on the results of the CFA it was concluded that the exploratory model provided better fit than the theoretical models—with and without task strategies. A fourth model was also tested, the exploratory-theoretical model, which incorporated the theoretical separation of metacognitive skills into three separate phases. The exploratory model also provided better fit than the this exploratory-theoretical model. Based on the results of the CFA, it can be concluded that while none of the models provide absolute fit, the exploratory model clearly provides the best fit (see ‘Appendix 2’ for the SOL-Q based on the exploratory model). This conclusion is based on the comparative fit statistics, AIC and CFI, and the scale reliabilities.

When interpreting the results, a slight caution must be taken into account, which is the relatively small sample size and the high complexity of the models. The NC values are however all acceptable and the RMSEA values are acceptable for the exploratory and the exploratory-theoretical model and bordering acceptance for the two theoretical models. The results thus provide enough evidence to draw two important conclusions.

First, evidence was found both in the EFA and in the CFA that task strategies are different from the other aspects of SRL. In the EFA, the items belonging to the scale task strategies scattered over all factors. This indicates that the items did not form a coherent scale. The results of the CFA further confirmed this finding. Both the absolute and the comparative fit statistics show clearly better fit for the theoretical model without task strategies compared to the theoretical model with task strategies. Based on the present study, it is therefore not advisable to include task strategies as a separate scale in a SRL questionnaire because it could jeopardize the validity of the instrument. As indicated in the discussion of the EFA, the distinction between the execution of learning activities (task strategies) and the regulation of learning activities can be defended from a theoretical point of view as well, as it is in line with the distinction between cognition and metacognition (Mayer 1998; Van Leeuwen 2015; Vermunt and Verloop 1999). The execution and the regulation of learning activities are, however, closely intertwined. It is therefore advised to measure both when studying SRL, but with different instruments.

Based on the EFA and CFA results, it can further be concluded that metacognitive skills form a single factor when measuring SRL. Neither the theoretical separation into five scales (task definition, goal setting, strategic planning, comprehension monitoring, and strategy regulation), nor the separation into three phases (preparatory, performance, appraisal) could be replicated with the analyses. Students, thus, do not differ in their engagement with the different metacognitive activities. For example, students that set goals, also monitor their comprehension and students that do not set goals, also do not monitor their comprehension. Methodologically, the inclusion of separate scales for metacognitive skills thus does not add discriminatory power to the questionnaire to differentiate between types of students.

The finding that metacognitive activities cannot be measured in three separate phases does not imply that these three phases do not exist in SRL. It is still likely that students engage in different metacognitive activities during the preparation, performance, and appraisal of learning tasks. However, the resuls of our study indicate that students perform evenly on metacognitive activities across these phases. Students that, for instance, struggle with strategic planning are thus likely to also struggle with monitoring their comprehension and strategy regulation. Several studies have tried to support SRL of students by supporting a SRL activity within one particular phase (e.g. Taminiau et al. 2013; van den Boom et al. 2004). These interventions were found to be less effective than expected. Our findings provide the possible explanation that students who struggle with self-regulation are in need of support in all three phases of SRL. The provided support for a single SRL activity may therefore not have had the desired effect. We thus suggest instructional design aimed at supporting self-regulation in open online education to try and do so in all three phases. This also implies that an important direction for research is to examine to what extent metacognitive skills are transferable from one phase to the next.

Another direction for future research is to examine the transferability of the developed questionnaire to other contexts than MOOCs. The SOL-Q is developed for fully online courses with a focus on individual learning activities, and thus transferable to similar settings. Besides this type of education, the spectrum of online education also includes for example education with a focus on collaborative learning (i.e., CSCL, Stahl et al. 2006), and combinations of online and face to face activities (i.e., blended learning, Staker and Horn 2012). SRL in these forms of education may involve more aspects than measured with the SOL-Q. Collaboration for example also requires regulation of group processes (Hadwin et al. 2011). Blended learning may require specific regulatory activities related to the transition between face to face and online education (Staker and Horn 2012). With the inclusion of additional scales, we hypothesize that the SOL-Q can be extended to measuring SRL in these other types of online education as well.

Our goal for now has been to develop a questionnaire suitable for MOOCs. To conclude, the questionnaire that showed the best results after EFA and CFA, the SOL-Q, consists of 5 scales: metacognitive skills, environmental structuring, help seeking, time management, and persistence (‘Appendix 2’). The present study not only provides an instrument which can be used and further refined in future research, but also indicates theoretical and practical implications concerning SRL. Theoretically, the role of task strategies and the temporal aspects of metacognitive activities were discussed. Practically, the results of this paper provide indications for support of SRL in online education. As SRL is increasingly important in settings of open online education, valid measurement and adequate support of SRL are of vital importance. With the present study, steps have been made to contribute to these goals.