Designing an Online Self-assessment for Informed Study Decisions: The User Perspective

. This paper presents the results of a study, carried out as part of the design-based development of an online self-assessment for prospective students in higher online education. The self-assessment consists of a set of tests – predictive of completion – and is meant to improve informed decision making prior to enrolment. The rationale being that better decision making will help to address the ongoing concern of non-completion in higher online education. A prototypical design of the self-assessment was created based on an extensive literature review and correlational research, aimed at investigating validity evidence concerning the predictive value of the tests. The present study focused on investigating validity evidence regarding the content of the self-assessment (including the feedback it provides) from a user perspective. Results from a survey among prospective students ( N = 66) indicated that predictive validity and content validity of the self-assessment are somewhat at odds: three out of the ﬁ ve tests included in the current prototype were considered relevant by prospective students. Moreover, students rated eleven additionally suggested tests – currently not included – as relevant concerning their study decision. Expectations regarding the feedback to be provided in connection with the tests include an explanation of the measurement and advice for further preparation. A comparison of the obtained scores to a reference group (i.e., other test-takers or successful students) is not expected. Implications for further development and evaluation of the self-assessment are discussed.


Introduction
The number of students not completing a course or study program in higher online education remains problematic, despite a range of initiatives to decrease noncompletion rates [30,34,35,37]. It is in the interest of both students and educational institutions to keep non-completion at a minimum [37]. One way to address this problem is by taking action prior to student enrolment, ensuring that the study expectations of prospective students are realistic [27,37]. Adequate, personalized information has been shown to help prospective students make informed study decisions [9,16] and, by extension, reduce non-completion [15,39]. A self-assessment (SA) can provide such information [25,26].
The current study contributes to the development of such a SA at an open online university. This SA will be available, online, for prospective students and inform them about the match between their characteristics (knowledge, skills, and attitudes) on the one hand, and what appears to be conducive to (read: predictive of) completion in higher online education on the other hand. The aim of the SA is not to select, but to provide feedback for action, so that prospective students can make a well-considered study choice [9,15,16], based on realistic expectations [27]. By following up on feedback suggestions (e.g. for remedial materials) they can start better prepared. However, as Broos and colleagues [3, pp. 3] have argued: "…advice may contribute to the study success of some students, but for others, it may be more beneficial to stimulate the exploration of other (study) pathways. It may prevent (…) losing an entire year of study when faster reorientation is possible". Nonetheless, the SA will be offered as an optional, and (in accordance with the open access policy of the institution) nonselective tool to visitors of the institutional website.
A first prototypical design of the SA (i.e., its constituent tests) was created, based on two prior studies: an extensive literature review and subsequent correlational research [6,7]. Both studies were carried out to collect evidence concerning the predictive value of constituent tests regarding completion. However, the predictive value is only one of the five sources of validity evidence, as identified in the Standards for Educational and Psychological Testing [4,5,31]. Another important source of validity evidence is the content of the SA [31], which is the main concern of the present investigation.
There are various reasons to investigate content validity, in addition to the predictive value of the constituent tests. The most important one is that, although previous research may have indicated that a certain test (variable) is a relevant predictor of completion, this does not necessarily mean that users perceive it as useful in the context of their study decision. When it is not perceived as useful, it becomes less likely that prospective students complete the test(s) and use the information they can gain from it [14]. The previous argument applies not only to each separate test but also to the overarching SA, i.e., whether the SA is perceived as a useful, coherent and balanced set of tests. Second, validity evidence based on the content of a test is not limited to the content of the actual test but includes the feedback provided in relation to obtained scores. Regarding this feedback, several design questions remain unanswered.
In short, the general research question addressed in this paper is: 'What are user expectations regarding the tests included in a SA prior to enrolment, including the feedback provided on obtained test scores?' The next sections will provide some theoretical background regarding the SA and the feedback design, before elaborating on the more specific research questions and the methods used.  38]. The Figure illustrates that users attain a score on a predictor (i.e., a test, like basic mathematical skills or a single indicator, like the number of hours occupied in employment). A predictor included in the SA represents either a dispositional characteristic (i.e., pertaining to the student, like discipline) or a situational characteristic (i.e., pertaining to student's life circumstances, e.g., social support) [7]. The score a user attains on a test falls within a particular score range (labeled e.g., unfavorable, sufficient or favorable odds for completion). The exact score ranges (their cut-off points) of the current SA depend on parameters, which are set in the predictive model [7]. For this paper, it suffices to understand that feedback is designed in relation to the score ranges, rather than particular scores. With respect to the exact constituent content elements of the feedback (apart from the obvious score, cf. Sect. 1.2) the current study is designed to fill in the existing gaps as indicated by the empty boxes in the lower right part of Fig. 1. These gaps will be discussed in more detail in Sect. 1.2. Figure 2 shows the tests as presented to prospective students in the first prototypical design of the SA. Tests relating to dispositional variables are presented under the headers 'knowledge/skills' and 'attitude'. Situational variables are presented under the header 'profile information'. These headers were chosen, instead of research jargon, to align with the users' frame of reference. The review study that was carried out to make this first selection of tests was inconclusive regarding a number of predictors and appeared biased towards a face-toface educational context [6]. This means that, in addition to the tests validated in our previous research [6,7], other tests might be relevant as well. For instance, recent research, not available at the time of the first prototypical design of the self-assessment, has demonstrated that technological skills (e.g., computer skills and information literacy) might be relevant, especially in the context of higher online education [19]. Furthermore, it has been argued that measures of actual behavior should be considered next to self-report measures, to enhance the validity of the SA [22,24]. Actual behavior might be measured for instance, through a content sample test which involves studying course literature and/or watching video-lectures, followed by a short exam. Such a content sample test has also been shown to predict first-year academic achievement [24]. All in all, these are sufficient reasons to collect further validity evidence on the content of the SA so far, and to do so from the perspective of prospective users: if they consider the tests to be useful, they are more likely to complete the SA and use the feedback to help them make an informed decision [14].

Feedback
Feedback during the transition to new educational contexts has been considered pivotal regarding student motivation, confidence, retention, and success [20,28]. Feedback on test scores in a study decision process can be designed in various ways [2,3,11,25,26]. However, with a view on transparency, it is evident that the attained score and an explanation of this score should be part of the feedback. Because the feedback provided on a score is connected to a particular score range ( Fig. 1), it makes sense to provide and explain the score in this context, as the example presented in Fig. 3 illustrates. The attained score is visualized through an arrow in a bar. The bar represents the score ranges. Visualization of feedback data has several benefits as evidenced by research in the field of learning analytics: clearly illustrating a point, personalization, and memorability of feedback information [33]. Furthermore, the visualization in a bar representing score ranges is in line with other SAs prior to enrolment [11,26].
Besides this basic information, additional feedback needs -previously (Sect. 1.1) referred to as gaps -are explored in this study. Current practices illustrate the broad variety of possibilities. For instance, the feedback that is provided in two Flemish selfassessment instruments entailed a comparison of the attained scores to the scores of a reference group consisting of other test-takers [2,3,11] or (successful) first-year students [2,3]. In an online SA used in Germany [25,26] the feedback was focused on assisting prospective students in interpreting their scores, independent of comparison to a reference group. What is best, does not become clear when studying the literature. For instance, social comparison theory suggests that in times of uncertainty, individuals evaluate their abilities by comparing themselves to others, to reduce that uncertainty [10]. However, others suggest that information on success or failure in comparison to peers might have an adverse impact on students' motivation and self-esteem [8,21].
Another possible feedback component is an indication of the odds for completion, as described by Fonteyne and Duyck [11]. In this case, odds are based on multiple test scores and visualized by a traffic light system. Though students appeared curious about the odds for completion, they also perceived them as quite confronting.
Furthermore, regarding transparency and feedback for action [12], the feedback might contain a description of what was measured [25,26] and information for action including tips to improve or a reference to advisory services [2,3,25,26]. Regarding feedback for action, Broos and colleagues [2,3] have demonstrated that consultation of a feedback dashboard was related to academic achievement. However, a definite causal relationship with the received feedback (i.e., a change in students' beliefs and study behavior) could not be established. Broos and colleagues [3] conclude that dashboard usage may qualify as an early warning signal in itself.
Again, it is paramount that prospective students perceive the feedback as relevant since this will affect their intention to use it, and thereby ultimately, the effectivity of the SA [14]. The present study, therefore, investigates prospective students' expectations regarding the feedback provided in the SA.

Research Questions
In the present study, we aim to complement the evidence for (predictive) validity of the SA with validity evidence based on the content of the SA, as perceived by prospective users. To that end, we chose to perform a small-scale user study, addressing the following research questions: 1. Which tests do prospective students consider relevant in the study decision process? 2. To what extent do tests considered relevant by prospective students overlap with tests included in the current SA prototype? 3. What are prospective students' expectations regarding the feedback provided in relation to the tests?

Context
The SA is designed, developed, and evaluated in the context of the Open University of the Netherlands (OUNL), provisioning mainly online education, occasionally combined with face-to-face meetings. Academic courses to full bachelor and master programs are provided in the following domains: law, management sciences, informatics, environmental sciences, cultural sciences, educational sciences, and psychology. The open-access policy of OUNL means that for all courses, except courses at master degree level, the only entry requirement is a minimum age of 18 years.

Design
The present study is part of a design-based research process that typically comprises iterative stages of analysis, design, development, and evaluation [17,32]. More particularly this study is part of the design stage, reporting on a small-scale user study for further content validation of the SA. This study involves a survey design, examining prospective students' opinions [5].

Materials
Participants' view on the SA content was investigated via two questions. In the first question, a list of 17 tests, including those already incorporated in the prototypical design, was presented. Tests presented in addition were selected based on a consultation of the literature [e.g., 19,22,24] as well as experts in the field. Respondents were asked to rate the perceived usefulness of each test for their study decision on a 5-point Likert scale (completely useless (1), somewhat useless (2), neither useless, nor useful (3), somewhat useful (4), and completely useful (5)).
In the second question, it was explained that the feedback on each test contains the obtained score and an explanation of this score. Participants were asked to indicate which of the following feedback elements they would expect in addition (multiple answers possible): an explanation of what was measured [25,26], their score compared to the score of successful students [3], their score compared to the score of other test-takers [2,11], an indication of their odds for completion [11], and advice on further preparation for (a) course(s) or study program, when relevant [2, 3, 25, 26].

Participants and Procedure
In total 73 prospective students were approached to participate and complete the online survey, resulting in 66 valid responses. Participants constituted a convenience sample [5] of prospective students who signed up for a 'Meet and Match' event for their study of interest, i.e., law or cultural sciences. We opted for this convenience sample, as it consists of prospective students with a serious interest in following a course or study program at the OUNL (as demonstrated by signing up to the Meet and Match event, for which a fee was charged).

Analysis
Survey data was analyzed in Jamovi 1.1.8.0. [29,36]. For the usefulness of the tests (research questions 1 and 2), both the mean (the standard measure of central tendency) and the mode were presented. As the measurement level of the data for the first two research questions was ordinal, we based our conclusions on the mode. A mode of 4 (somewhat useful) or 5 (completely useful) was considered indicative of perceived usefulness. In answering research question 3, frequencies were reported for each answer option (see Sect. 2.3).

Perceived Usefulness of Self-assessment Tests
The first two research questions were aimed at gaining insight into the perceived usefulness of tests. Table 1 provides an overview of prospective students' ratings of the tests. The scores (modes) are ranked from high to low. The tests that are included in the current prototype of the SA are indicated by a checkmark in the first column, to facilitate exploration of the overlap between 'ratings of usefulness' and 'currently included' (second research question).
A content sample test and tests on interests, learning strategies, motivation, academic self-efficacy, career perspectives, information literacy, intelligence, language skills, perseverance, prior knowledge, procrastination (discipline), study goals and intentions, and writing skills are considered useful (Mode ! 4). Not all currently included tests are considered useful by prospective students. Two tests (basic mathematical skills and social support) yielded a mode of 3.00, which was below our threshold. On the other hand, academic self-efficacy, study goals and intentions, and procrastination (discipline) were perceived as useful (Mode = 4.00).

Feedback Content
The third research question aimed at gaining insight into prospective students' expectations regarding the feedback provided in relation to the SA tests. Table 2 presents an overview of the potential feedback elements, ranked by the percentage of students that listed each element (high to low). Next to the obtained score and an explanation of this score (i.e., the minimal feedback), 78.8% of the prospective students expect an explanation of what was measured, and 78.8% of the prospective students expect advice on further preparation for (a) course(s) or study, when relevant. Furthermore, 75.8% of the students expect an indication of the chances of completing a course or study. Finally, a comparison with a reference group is not expected by prospective students, as becomes clear from the relatively low frequencies for both comparisons with scores of other testtakers (40.9) and scores of successful students (39.4%).

Discussion
The present study aimed to collect evidence for the content validity of the SA by gaining insight into prospective students' opinions and expectations of a SA prior to enrolment and the feedback it provides.

Self-assessment Content
In terms of content validity, further evidence is obtained by the present study for three tests that were already included in the current SA: academic self-efficacy, study goals and intentions, and procrastination (discipline). In line with our previous studies [6,7], these tests appear useful for prospective students as well. Furthermore, the results of the present study show that prospective students find information on specific knowledge (i.e., prior knowledge), skills (i.e., language skills, information literacy, learning strategies, and writing skills), and experience (i.e., a content sample test) useful in the process of their study decision. Although such tests did not appear as relevant predictors of completion in our previous studies [6,7], it might be beneficial to (re)consider and further investigate (e.g., their predictive value in the current context) these as possible tests for the SA. Especially since previous research has also stressed the relevance of, for instance, a content sample test (i.e., providing video lectures on a general academic topic, followed by a short exam) to support students in making wellinformed study decisions [22,24]. Finally, our results show that two tests (i.e., basic mathematical skills and social support)which proved to be relevant for completion in the online higher education context in our previous studies [6,7] are not necessarily perceived as useful by prospective students. Part of this result (basic mathematical skills) is likely to be an artefact of the specific sample, i.e., prospective students interested in law or cultural sciences. However, bearing in mind that prospective students need to recognize the usefulness of the tests [6,7,14], this also means due attention should be paid to clarifying the relevance of tests included in the SA to prospective students.

Feedback Content
Regarding the content of the feedback, results show that potential users of the SA expect an explanation of what was measured, as well as advice on further preparation for a course or study program at the OUNL, when relevant. Prospective students do not expect a comparison of their score to the score of a reference group (i.e., other testtakers or successful students). Overall, these results are in line with evaluations of feedback in LADs. For instance, Jivet and colleagues [12] have shown that transparency (i.e., explanations of the scales used, and why these are relevant) and support for action (i.e., recommendations on how to change their study behavior) are important for students to make sense of a LAD aimed at self-regulated learning. Following these results, the feedback in the SA domain model (Fig. 1) is complemented with information on what was measured and why, and advice for further preparation for a course or study program in the current context. This information is presented under the headers 'Measurement' and 'Advice', respectively.
'Measurement' contains information on the test and the relevance of this test in relation to studying in online higher education [25,26]. Yang and Carless [40] have stated that introducing students to the purpose(s) of the feedback is important for feedback to be effective. 'Advice' provides information on potential future actions that prospective students may take to start better prepared [2,3,25,26]. In that regard, feedback literature has suggested that good feedback practices inform students about their active role in generating, processing, and using feedback [21].
Based on the results of the present study we decide not to include a comparison of the attained score to a reference group in the current prototype of the feedback. Furthermore, the odds for completion is not included in the prototypical feedback, even though a majority of prospective students appears to expect this. Calculating an indication of the odds for completion requires predictive models capturing the combined effects of predictors for each program within a specific field [11]. In the current context, where students do not necessarily commit to a specific study program, but can also decide to enroll in a combination of courses of different study programs, including an indication of the odds for completion appears infeasible. Nevertheless, these results provide input for managing expectancies regarding the self-assessment.

Limitations and Future Directions
Several limitations are noteworthy in regard to the present study, as they point out directions for future development and evaluation of the self-assessment and the feedback it provides. First, the present study involves a relatively small, convenience sample. Participants were interested in specific study domains (i.e., law or cultural sciences), which is likely to have had an impact on certain results (e.g., perceived usefulness of a basic mathematical test). Thus, it would be valuable to extend the current sample with results of prospective students in other fields. Nevertheless, small-scale user studies can be considered part of the rapid, low-cost and low-risk pilot tests, which are an increasingly important instrument in contemporary research, enabling adjustments and refinements in further iterations of the self-assessment and feedback [3].
Second, future development of the self-assessment and its feedback should take into account opinions of other stakeholders, most notably student advisors, as their work is affected by the SA when prospective students call on their help and advice as a follow-up on attained test results and feedback [2].
A third recommendation is to further investigate the extension of the content of the SA, by including measurements of actual behavior through a content sample test [22,24]. Interestingly, research has shown that a content sample test is not only predictive of academic achievement but apparently, this experience of the content and level of a study program also has an effect on the predictive value of other tests. For instance, Niessen and colleagues [23], have demonstrated that scores on other tests (i.e., procrastination and study skills tests), taken after the first course (i.e., an introductory course), more strongly predict academic achievement than scores on the same tests taken prior to enrolment. As the SA is meant to be a generic, rather than a domainspecific instrument, we aim to develop a program-independent content sample test (e.g., on academic integrity), in the near future.
Finally, the prototypical feedback merits further investigations of e.g., language and tone [1], the framing of the score (i.e., focus on what goes well vs. focus on points of improvement) [13], possible visualizations [1,33], and last but not least impact, i.e., consequential validity [7]).